Check SSD and HDD Health in Linux with smartctl
Posted on
Hard drive failures are one of the most common causes of unexpected data loss, downtime, and costly interruptions. According to real-world operational statistics, annual failure rates can exceed 5%* for certain drive capacities and models — especially under heavy or continuous load.
*The AFR value depends on the model, manufacturing year, and workload. The figure is based on Backblaze’s public statistics and is not universal for all HDD/SSD devices.

⚠️ Note: For SSDs, the AFR can vary significantly.
To avoid sudden disk failures, the powerful built-in S.M.A.R.T. diagnostics (Self-Monitoring, Analysis, and Reporting Technology) for monitoring drive health is available through the smartmontools package in Linux. In this guide, you’ll learn how to use smartctl and smartd to analyze both HDD and SSD reliability, catch early signs of failure, and automate routine checks
PRODUCTS THAT MIGHT INTEREST YOU:
Benefit from the best server plans and related services, competitive prices, coupled with personalized attention to each client. Supported by top-notch technical assistance that remains consistently accessible to address all your inquiries.
-
go to full server list
GPU Servers
from $238.00 /month
-
go to full server list
Amsterdam 10G servers
from $680.00 /month
-
go to full server list
USA Storage Servers
from $398.00 /month
What Is S.M.A.R.T. and How smartmontools Helps
S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) is a monitoring system embedded in modern HDDs and SSDs. It tracks internal attributes such as temperature, wear level, or error rates to predict drive failure.
smartmontools provides two core tools:
- smartctl: A command-line tool for querying S.M.A.R.T. data and running self-tests.
- smartd: A daemon that schedules automatic checks and alerts.
These tools support SATA, NVMe, and USB-connected drives. Note that for some USB-SATA adapters, you may need to manually specify the device type using the -d sat option, as S.M.A.R.T. data is not always passed through correctly.
Installing smartmontools: Quick Start on Any Linux Distro
⚠️ Note: Availability depends on the hypervisor and the configuration of the host machine.
Debian/Ubuntu:
sudo apt update
sudo apt install smartmontools
RHEL/CentOS:
sudo yum install smartmontools
Arch Linux:
sudo pacman -S smartmontools
Verify installation:
smartctl --version
⚠️ Note: On virtual machines (VPS), S.M.A.R.T. may be inaccessible due to abstracted hardware. In KVM, SMART passthrough is possible if the hosting provider allows it, but in OpenVZ/LXC, SMART is always
Essential smartctl Commands for Drive Health Checks
| Command | Description |
| sudo smartctl -i /dev/sda | Basic drive info (model, serial number, SMART support) |
| sudo smartctl -H /dev/sda | Quick health check summary (“PASSED” or “FAILED”). *Note: This check is often insufficient; use -A for a full analysis.* |
| sudo smartctl -A /dev/sda | Full list of SMART attributes |
| sudo smartctl -t short /dev/sda | Initiate short self-test (2–5 minutes) |
| sudo smartctl -t long /dev/sda | Start extended self-test (10–60+ minutes) |
| sudo smartctl -l selftest /dev/sda | View past test results and history |
S.M.A.R.T. Attributes Explained: Key Metrics for HDD and SSD
S.M.A.R.T. attributes vary by device, but key metrics are consistent. Here’s a breakdown of important ones for SSDs and HDDs:
Common S.M.A.R.T. Attributes (HDD Focused)
| Attribute | Meaning | Normal Range | Action if abnormal |
| Reallocated_Sector_Ct | Remapped bad sectors | 0 | Replace disk if growing |
| Current_Pending_Sector | Sectors awaiting re-check | 0 | Backup immediately |
| Offline_Uncorrectable | Unfixable errors | 0 | Data may be lost |
| Temperature_Celsius | Drive temperature | 30–50°C | Ensure cooling if >55°C |
Critical S.M.A.R.T. Metrics for SSD Longevity
| Attribute | Description | Normal Behavior |
| Percentage Used | Indicates wear level (100 = fresh) | Approaching 100% signals EOL |
| Available Spare | Remaining spare blocks | Should be >5% |
| Available Spare Threshold | Warning limit for spares | If met, drive is near failure |
| Wear Leveling Count | Cycles of NAND erase/program operations | Monitor the trend/rate of increase rather than the raw value; interpretation varies by vendor. |
| Power-On Hours | Total hours powered | Useful for lifecycle planning |
| Power Cycle Count | On/off events | Monitors hardware stress |
| Unsafe Shutdowns | Sudden power-offs | Frequent = file system risk |
| Media/Data Integrity Errors | ECC correction failures | Backup data immediately |
⚠️ Note: different manufacturers interpret “Percentage Used” differently. For Intel, it may represent a wear-level indicator, while Samsung uses a different algorithm.
Automate S.M.A.R.T. Monitoring with smartd Daemon
Edit the smartd configuration:
sudo nano /etc/smartd.conf
Example:
DEVICESCAN -a -o on -S on -s (S/../.././02|L/../../7/03) -m root -M exec /usr/share/smartmontools/smartd-runner
⚠️ Note: The path for the script used with ‘-M exec’
Explanation of flags:
- DEVICESCAN — auto-detects all available drives
- -a — enables all SMART checks and attribute logging
- -o on — enables offline data collection
- -S on — enables attribute autosave
- -s — schedules tests (S = short daily @ 2 AM, L = long weekly @ 3 AM Sunday)
- -m — email recipient for alerts
- -M exec — triggers a script for notifications
Enable the service:
sudo systemctl enable smartd
sudo systemctl start smartd
Check status and logs:
sudo systemctl status smartd
sudo journalctl -u smartd
Configure alerting:
- Email via sendmail or Postfix
Example smartd.conf line for email:
-m admin@vsys.host -M daily
⚠️ Note: The email-based alerting options (e.g., `-m admin@vsys.host -M daily`) are an **alternative** to the script execution method (`-M exec`). You should use one or the other, not both
The test schedule format depends on the smartmontools version and may differ across distributions. An installed MTA is required to send notifications.
- Webhook Alert via Custom Script:
For sending notifications to platforms like Slack or Microsoft Teams, you must use the -M exec flag in your smartd.conf to execute an external script upon drive failure. The following curl command is an example of the code to be placed INSIDE that custom script to send a JSON-formatted alert to a Webhook URL:
curl -X POST -H "Content-Type: application/json" -d '{"text":"SMART alert triggered"}' [https://hooks.slack.com/services/XXX/YYY/ZZZ](https://hooks.slack.com/services/XXX/YYY/ZZZ)
- Logs to /var/log/syslog
- Custom scripts using -M exec
S.M.A.R.T. Monitoring in Cloud & VPS: What You Can (and Can’t) Do
SMART access may not work in VPS (e.g., OpenVZ, KVM with virtio; KVM with virtio can support SMART if passthrough is enabled):
Common errors:
SMART support is: Unavailable
Alternatives:
- Request reports from your hosting provider
- Monitor I/O latency, kernel logs (dmesg), and SMART pass-through
- Use tools like iostat, nvme-cli, or cloud-native monitoring solutions
Using nvme-cli to Read NVMe SSD Health
For NVMe SSDs, nvme-cli provides additional low-level diagnostics:
sudo nvme smart-log /dev/nvme0
Sample output:
Critical Warning: 0x0
Temperature: 37 Celsius
Percentage Used: 4%
Data Units Read: 7,983,283
Data Units Written: 3,812,913
Power Cycles: 123
Power On Hours: 1,023
Unsafe Shutdowns: 3
Media Errors: 0
Key fields:
- Percentage Used — wear level (lower is better)
- Temperature — operating temp in °C
- Critical Warning — 0 means normal; non-zero = alert
- Media Errors — should remain 0 in healthy SSDs
“Percentage Used ≤ 10%” for new NVMe drives is normal and not a cause for concern.
Best Practices for Reliable Disk Monitoring in Linux
Recommendations:
- Check SMART attributes monthly or integrate into CI/monitoring stack
- Combine with cron or Prometheus exporters
- Act immediately if reallocated sectors or pending sectors rise
- Replace drives before “Percentage Used” reaches 100% (SSD)
- Use RAID or backup systems as safety nets — SMART is not foolproof. SMART does not detect sudden controller failures, so having backups is essential.
Mini Summary: Disk Monitoring Checklist
- Monitor SMART data monthly (or automate)
- Use smartd with logging and alerts
- Replace drives before failure signs (e.g., high Pending Sectors, 100% SSD usage)
- Keep backup systems active (RAID, snapshots, offsite)
Rotating smartd Logs Automatically
To prevent log accumulation:
# Rotate SMART logs older than 30 days
find /var/log -name ‘smartd.log*’ -mtime +30 -delete
smartd log files are not always located in /var/log — they are often written to syslog. The command applies only in cases where smartd maintains its own dedicated logs.
Final Thoughts: Proactive Monitoring Prevents Data Loss
smartmontools empowers Linux administrators to monitor the physical health of disks and anticipate failures before they disrupt services. For both SSDs and HDDs, regular health checks with smartctl, combined with automation via smartd, help maintain system integrity and data reliability.
HAVE A QUESTION OR WANT TO GET A CUSTOM SOLUTION?
CONTACT SALESFAQs
It’s a command-line utility for querying and analyzing S.M.A.R.T. data from storage devices.
Yes, smartctl supports both SSDs and HDDs across SATA and NVMe interfaces.
At least once per month, or integrate into scheduled monitoring.
It shows wear level — approaching 100% means the drive is near its end of life.
Some virtual environments don’t expose hardware monitoring; contact your hosting provider.
No, smartctl can be used standalone, but smartd adds automation and alerting features. smartd uses smartctl in the background.
No, it only reports health. Use it for diagnostics and planning replacements.
Usually in /var/log/syslog or available via journalctl -u smartd.