Techniques for Implementing Fault Tolerance in Modern Software Systems to Enhance Availability, Durability, and Reliability

Techniques for Implementing Fault Tolerance in Modern Software Systems to Enhance Availability, Durability, and Reliability

Authors

  • Poonam Sharma Pt. Ravishankar Shukla University, Amanaka, G.E. Road, Bilaspur, Chhattisgarh, India.
  • Rajendra Prasad Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu, India.

Keywords:

Availability, Durability, Fault Tolerance, Monitoring & Recovery, System & Service Resilience

Abstract

The rising demand for highly available and reliable software systems has elevated the significance of fault tolerance mechanisms. Fault tolerance refers to a software system's capability to maintain operational effectiveness in the presence of partial system failures. This study aims to investigate commonly used techniques for implementing fault tolerance in modern software systems and categorize them into four key areas: Data Redundancy & Protection, System & Service Resilience, Monitoring & Recovery, and Operational & Design Practices. In the area of Data Redundancy & Protection, methods like data replication, backup and restore, RAID, erasure coding, and data sharding are pivotal. These techniques serve to prevent data loss and offer a basis for system recovery. System & Service Resilience techniques such as hardware and software redundancy, load balancing, failover, rolling upgrades, canary releases, and checkpoints focus on maintaining service availability and performance. Monitoring & Recovery strategies involve continuous observation of system health and performance metrics, utilizing tools like circuit breakers for failure detection and rate limiting to prevent resource exhaustion. Transaction management aids in either the successful completion or rollback of operations to maintain system integrity. Finally, Operational & Design Practices include employing idempotency to guarantee repeatable operations without negative side effects and function replication for running multiple instances of services. This study provides a structured overview of these techniques, aiming to serve as a guide for software architects and developers in choosing the most appropriate fault tolerance mechanisms for different system requirements.

Author Biographies

Poonam Sharma, Pt. Ravishankar Shukla University, Amanaka, G.E. Road, Bilaspur, Chhattisgarh, India.

Poonam Sharma
Pt. Ravishankar Shukla University, Amanaka, G.E. Road, Bilaspur, Chhattisgarh, India.

Rajendra Prasad, Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu, India.

Rajendra Prasad
Tamil Nadu Agricultural University, Coimbatore, Tamil Nadu, India.

Downloads

Published

2023-09-18

How to Cite

Sharma, P., & Prasad, R. (2023). Techniques for Implementing Fault Tolerance in Modern Software Systems to Enhance Availability, Durability, and Reliability. Eigenpub Review of Science and Technology, 7(1), 239–251. Retrieved from https://studies.eigenpub.com/index.php/erst/article/view/33

Issue

Section

Articles
Loading...