Skip to content
Menu  
  • Home
  • Certification
  • Courses
  • Services
  • Contact Us
  • Blog

SRE School

Master SRE. Build Resilient Systems. Lead the Future of Reliability

  • Home
  • Certification
  • Courses
  • Services
  • Contact Us
  • Blog

SRE School

Uncategorized

MySql CPU Consumtions Monitoring

Posted on September 13, 2025September 13, 2025 | by Rajesh Kumar

This is the key question 👍. With MariaDB/MySQL the trick is: “high CPU” doesn’t always show up as long queries […]

Uncategorized

Kafka: Consumer Group vs Worker vs Thread vs Consumer Instance vs Topic vs Partitions

Posted on September 3, 2025September 3, 2025 | by Rajesh Kumar

1) Quick definitions (the mental model) 2) How messages land in partitions Rule of thumb: choose a key that spreads […]

Uncategorized

What is Fault tolerance?

Posted on September 2, 2025September 2, 2025 | by Rajesh Kumar

Fault tolerance is a system’s ability to keep meeting its SLOs despite expected failures—machines dying, networks flaking, processes crashing, disks […]

Uncategorized

What is Redundancy?

Posted on September 2, 2025September 2, 2025 | by Rajesh Kumar

Redundancy is the deliberate duplication of critical components or paths so that a failure doesn’t violate your SLOs. Put simply: […]

Uncategorized

Comprehensive Tutorial on Production Readiness Review (PRR) in Site Reliability Engineering

Posted on August 29, 2025August 29, 2025 | by priteshgeek

Introduction & Overview In the fast-evolving landscape of Site Reliability Engineering (SRE), ensuring that software systems are reliable, scalable, and […]

Uncategorized

Comprehensive Tutorial on Platform Engineering in the Context of Site Reliability Engineering

Posted on August 29, 2025August 30, 2025 | by priteshgeek

Introduction & Overview Platform Engineering is an evolving discipline that focuses on designing, building, and maintaining internal platforms to streamline […]

Uncategorized

Comprehensive Tutorial on SLIs as Code in Site Reliability Engineering

Posted on August 29, 2025August 30, 2025 | by priteshgeek

Introduction & Overview What is SLIs as Code? SLIs as Code refers to the practice of defining, managing, and monitoring […]

Uncategorized

DevOps vs. Site Reliability Engineering (SRE): A Comprehensive Tutorial

Posted on August 29, 2025August 30, 2025 | by priteshgeek

Introduction & Overview In the fast-evolving landscape of software development and IT operations, DevOps and Site Reliability Engineering (SRE) have […]

Uncategorized

Comprehensive Tutorial on Reliability Culture in Site Reliability Engineering

Posted on August 29, 2025August 30, 2025 | by priteshgeek

Introduction & Overview Site Reliability Engineering (SRE) is a discipline that blends software engineering with IT operations to build and […]

Uncategorized

Comprehensive Tutorial on Engineering Productivity in Site Reliability Engineering

Posted on August 29, 2025August 30, 2025 | by priteshgeek

Introduction & Overview What is Engineering Productivity in Site Reliability Engineering? Engineering Productivity in the context of Site Reliability Engineering […]

Posts pagination

1 2 … 29 Next

Popular Blogs

  • What is SLO?
  • What is an SLA
  • Mastering SLIs: The Complete Guide to Service Level Indicators for SRE and DevOps
  • Top Free Tools for Synthetic Testing
  • Complete Guide to Upptime: Uptime Monitoring with GitHub Actions
  • What is Obserbability?
  • Chaos Engineering: A Complete Beginner-to-Advanced Guide
  • Blameless Postmortem: A Complete Beginner-to-Advanced Tutorial
  • Capacity Planning – Scaling Resources for Future Demand
  • Auto Remediation – Building Self-Healing Systems via Automation
  • What is Toil?
  • Argo CD vs Flux CD: A Comprehensive GitOps Comparison
  • How a CDN Works?
  • Healing Beyond Borders: The Future of Global Medical Tourism and the Platforms Leading It
  • Service Level Indicators (SLI) – A Complete Guide
  • Error Budgets – A Complete Guide
  • Toil – A Complete Guide
  • Incident Management. – Complete Handbook & Tutorials
  • Complete Handbook & Tutorials on Observability
  • Digital Asset Management 101: The Ultimate Beginner’s Guide

Recent Blogs

  • MySql CPU Consumtions Monitoring
  • Kafka: Consumer Group vs Worker vs Thread vs Consumer Instance vs Topic vs Partitions
  • What is Fault tolerance?
  • What is Redundancy?
  • Comprehensive Tutorial on Production Readiness Review (PRR) in Site Reliability Engineering
  • Comprehensive Tutorial on Platform Engineering in the Context of Site Reliability Engineering
  • Comprehensive Tutorial on SLIs as Code in Site Reliability Engineering
  • DevOps vs. Site Reliability Engineering (SRE): A Comprehensive Tutorial
  • Comprehensive Tutorial on Reliability Culture in Site Reliability Engineering
  • Comprehensive Tutorial on Engineering Productivity in Site Reliability Engineering
  • Comprehensive Tutorial on Service Ownership in Site Reliability Engineering
  • Comprehensive Tutorial on Elimination of Toil in Site Reliability Engineering
  • Comprehensive Tutorial on Toil in Site Reliability Engineering
  • Comprehensive Tutorial on Error Budget Policy in Site Reliability Engineering
  • Managing Zombie Processes and Services in Site Reliability Engineering: A Comprehensive Tutorial
  • Comprehensive Tutorial on Health Checks in Site Reliability Engineering
  • Comprehensive Tutorial on Load Shedding in Site Reliability Engineering
  • Comprehensive Tutorial on Graceful Degradation in Site Reliability Engineering
  • Comprehensive Tutorial on Retry Logic in Site Reliability Engineering
  • Chaos Monkey: A Comprehensive Tutorial for Site Reliability Engineering

Recent Comments

No comments to show.

Archives

  • September 2025
  • August 2025
  • July 2025
  • June 2025
  • May 2025
  • April 2025
  • February 2025
  • January 2025

Categories

  • SRE Concept
  • Uncategorized

SRE School

  • Email
  • Home
  • Certification
  • Courses
  • Services
  • Contact Us
  • Blog
© Copyrights 2025, SRE School A theme by MintTM
Proudly powered by WordPress