Table of Contents
ToggleThis guide is not just about commands.
It is about understanding how Linux processes actually work โ so you can debug real production issues with confidence.

By the end of this article, you should be able to:
- Identify problematic processes
- Understand why they behave that way
- Control or terminate them safely
- Debug stuck or broken processes in real servers
Understanding Linux Processes (From Zero to Expert)
Every running program in Linux is a process.
But internally, a process is created using:
fork()โ duplicates parent processexec()โ loads a new program into memory
This is why every process has:
- PID (Process ID)
- PPID (Parent Process ID)
You can see this relationship:
ps -ef --forest
This hierarchy matters in debugging because:
- Killing parent may kill children
- Orphaned processes can behave unexpectedly
Process States (Critical for Real Debugging)
Linux processes exist in different states:
| State | Meaning |
|---|---|
| R | Running |
| S | Sleeping |
| D | Uninterruptible sleep (I/O wait โ ๏ธ) |
| Z | Zombie |
| T | Stopped |
โ ๏ธ D State (Most Important)
If a process is in D state, it is waiting on disk or I/O.
๐ You CANNOT kill it even with kill -9
This usually indicates:
- Disk issues
- NFS/network storage problems
- Kernel-level blocking
Using ps (Deep Understanding)
Basic:
ps aux
Expert usage:
ps -eo pid,ppid,cmd,%mem,%cpu,state --sort=-%cpu
This gives:
- Sorted CPU usage
- Process state
- Parent-child relationships
Using top (Real-Time Monitoring)
top
Key advanced usage:
- Press
Pโ sort by CPU - Press
Mโ sort by memory - Press
1โ show per-core CPU
What Experts Look For
- High CPU but low load โ normal
- High load but low CPU โ blocked processes (โ ๏ธ investigate D state)
/proc Filesystem (Expert-Level Insight)
Every process has a directory:
/proc/PID/
Example:
cat /proc/1234/status
This reveals:
- Memory usage
- State
- Threads
๐ This is how Linux internally tracks processes
Killing Processes (Correct Way)
Step 1: Graceful kill
kill PID
Step 2: Force kill (last resort)
kill -9 PID
Step 3: Kill by name
pkill nginx
Signals (Beyond Basics)
| Signal | Purpose |
| SIGTERM (15) | Graceful shutdown |
| SIGKILL (9) | Force kill |
| SIGSTOP | Pause process |
| SIGCONT | Resume process |
| SIGHUP | Reload config |
๐ Example (pause process):
kill -STOP PID
Controlling CPU Usage (nice & renice)
Set priority:
nice -n 10 command
Change running process:
renice 10 -p PID
Lower value = higher priority
Real Debugging Tools (What Experts Actually Use)
strace (system call tracing)
strace -p PID
Shows what the process is doing internally.
lsof (open files)
lsof -p PID
Useful for:
- Checking file locks
- Debugging stuck services
Real Server Scenarios (Production-Level)
Scenario 1: High Load but CPU is Low
- Check
top - Look for D state processes
๐ Likely disk or I/O issue
Scenario 2: Zombie Processes
ps aux | grep Z
Solution:
- Kill parent process
- Or restart service
Scenario 3: PHP-FPM Exhaustion
Symptoms:
- High process count
- Slow website
Check:
ps aux | grep php-fpm
Scenario 4: MySQL Hanging Queries
- High load
- Low CPU
Investigate:
lsof -p MYSQL_PID
Common Mistakes
Overusing kill -9
This can:
- Corrupt data
- Break services
Killing SSH process
You will disconnect yourself.
Ignoring process states
State tells you the real problem โ not just CPU usage.
When This Matters in Production
Process management is critical when:
- Server becomes unresponsive
- Load spikes unexpectedly
- Applications hang
- Resource exhaustion occurs
On production infrastructure like:
- VPS servers
- Dedicated servers
- Cloud environments
Improper handling can lead to downtime.
๐ Explore infrastructure options:
Related Linux Guides
- How to Check CPU Usage in Linux (top, htop, uptime Explained)
- How to Check Memory Usage in Linux (free, vmstat, htop Explained)
- How to Check Server Load in Linux (Load Average Explained)
- How to Check Server Disk Usage in Linux (df, du, and ncdu Explained)
Final Takeaway
A beginner usesย psย andย top.
An expert understands:
- Why a process is stuck
- What state it is in
- What system resource is blocking it
That difference is what keeps production systems stable.



