Table of Contents
ToggleThis is not a tutorial.
This is how real incidents are handled in production.
When a server is slow, you are not experimenting โ you are diagnosing under pressure.
This guide will take you from zero understanding to real-world debugging capability.
The Real Situation
You log in.
The complaint is simple:
โServer is slow.โ
But that tells you nothing.
Your job is to convert symptoms into facts.
The Only Correct Debugging Flow
Never jump randomly between commands.
Follow this exact order:
- Load โ confirms problem exists
- CPU โ checks if system is busy
- Memory โ checks pressure
- Disk / I/O โ uncovers hidden bottleneck
- Processes โ identifies culprit
- Process states โ explains behavior
๐ This order is not optional. This is how production debugging works.
Step 1: Check Load (Symptom Detection)
uptime
Example:
load average: 12.20, 10.50, 8.90
Interpretation
- If you have 4 cores โ load should be ~4 or less
- If load = 12 โ system is overloaded
๐ Load is NOT CPU usage. It includes waiting processes.
Step 2: Check CPU (Is It Actually Busy?)
top
Look at:
- %us (user CPU)
- %sy (system CPU)
- %id (idle CPU)
Key Scenarios
Case A: High CPU usage
- CPU ~90%+ โ CPU is bottleneck
Case B: Low CPU but high load โ ๏ธ
- CPU idle is high
- Load is high
๐ This means processes are waiting (NOT CPU issue)
This is where beginners get it wrong.
Step 3: Check Memory (Pressure vs Usage)
free -m
Correct Interpretation
Ignore โused memoryโ โ focus on:
- available
- swap usage
Case A: Low available memory
โ system under pressure
Case B: High swap usage โ ๏ธ
โ severe slowdown likely
Step 4: Disk & I/O (Where Most Real Issues Exist)
iostat -x 1
What to Look For
- %util near 100%
- await high (e.g., >50ms)
Example Interpretation
%util = 99%
await = 120ms
๐ Disk is saturated ๐ This will slow EVERYTHING
Step 5: Identify the Culprit Process
htop
What Experts Do Here
- Sort by CPU โ find heavy CPU users
- Sort by MEM โ find memory leaks
- Switch to tree view (F5 / Fn+F5 on Mac)
๐ Understand parent-child structure
Step 6: Check Process States (The Truth Layer)
ps -eo pid,stat,cmd | grep D
Interpretation
If many processes are in:
๐ D state (uninterruptible sleep)
Then:
- They are waiting on I/O
- You CANNOT kill them
๐ Root cause is almost always disk or storage
Step 7: Deep Investigation (Expert Layer)
Now you move from observation โ root cause
Check what process is doing
strace -p PID
If you see repeated reads/writes โ disk issue
Check open files
lsof -p PID
Useful for:
- File locks
- Stuck file handles
Full Real-World Example (This Is What Experts Actually Do)
Situation
- Website slow
- Users complaining
Step 1: Load
uptime โ load = 18
โ confirmed issue
Step 2: CPU
top โ CPU idle = 70%
โ NOT CPU problem
Step 3: Memory
free -m โ available OK
โ NOT memory problem
Step 4: Disk
iostat โ %util = 100%
await = high
โ DISK bottleneck
Step 5: Processes
htop โ many processes waiting
Step 6: States
ps โ many D-state processes
Final Diagnosis
๐ Disk I/O bottleneck causing system-wide slowdown
NOT CPU NOT memory
Time-Based Thinking (Expert Mindset)
Ask:
- Did this happen suddenly?
- Or gradually?
Sudden issue
โ traffic spike, disk failure, bad deploy
Gradual issue
โ memory leak, log growth, database bloat
Common Mistakes (Reality Check)
โ โHigh load = CPU issueโ
Wrong in many real cases
โ Killing processes blindly
You may kill symptoms, not cause
โ Ignoring disk
Most real-world slowdowns are I/O related
When This Matters in Production
This workflow applies to:
- VPS servers
- Dedicated servers
- Cloud servers
If you are running real workloads, this is not optional knowledge.
๐ Infrastructure options:
Related Linux Guides
- How to Check CPU Usage in Linux
- How to Check Memory Usage in Linux
- How to Check Server Load in Linux
- How to Check Running Processes in Linux
- Linux Process States Explained
Final Takeaway
A beginner runs commands.
An intermediate user reads metrics.
An expert:
๐ Connects symptoms โ metrics โ root cause
That is what keeps systems stable.
