Check SSD and HDD Health in Linux with smartctl

Hard drive failures are one of the most common causes of unexpected data loss, downtime, and costly interruptions. According to real-world operational statistics, annual failure rates can exceed 5%* for certain drive capacities and models — especially under heavy or continuous load.

*The AFR value depends on the model, manufacturing year, and workload. The figure is based on Backblaze’s public statistics and is not universal for all HDD/SSD devices.

Backblaze: Annualized Failure Rates by Drive Size (2021–2023)

⚠️ Note: For SSDs, the AFR can vary significantly.

To avoid sudden disk failures, the powerful built-in S.M.A.R.T. diagnostics (Self-Monitoring, Analysis, and Reporting Technology) for monitoring drive health is available through the smartmontools package in Linux. In this guide, you’ll learn how to use smartctl and smartd to analyze both HDD and SSD reliability, catch early signs of failure, and automate routine checks

PRODUCTS THAT MIGHT INTEREST YOU:

Benefit from the best server plans and related services, competitive prices, coupled with personalized attention to each client. Supported by top-notch technical assistance that remains consistently accessible to address all your inquiries.

What Is S.M.A.R.T. and How smartmontools Helps

S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology) is a monitoring system embedded in modern HDDs and SSDs. It tracks internal attributes such as temperature, wear level, or error rates to predict drive failure.

smartmontools provides two core tools:

  • smartctl: A command-line tool for querying S.M.A.R.T. data and running self-tests.
  • smartd: A daemon that schedules automatic checks and alerts.

 These tools support SATA, NVMe, and USB-connected drives. Note that for some USB-SATA adapters, you may need to manually specify the device type using the -d sat option, as S.M.A.R.T. data is not always passed through correctly.

Installing smartmontools: Quick Start on Any Linux Distro

⚠️ Note: Availability depends on the hypervisor and the configuration of the host machine.

Debian/Ubuntu:

sudo apt update
sudo apt install smartmontools

RHEL/CentOS:

sudo yum install smartmontools

Arch Linux:

sudo pacman -S smartmontools

Verify installation:

smartctl --version

⚠️ Note: On virtual machines (VPS), S.M.A.R.T. may be inaccessible due to abstracted hardware. In KVM, SMART passthrough is possible if the hosting provider allows it, but in OpenVZ/LXC, SMART is always

Essential smartctl Commands for Drive Health Checks

CommandDescription
sudo smartctl -i /dev/sdaBasic drive info (model, serial number, SMART support)
sudo smartctl -H /dev/sdaQuick health check summary (“PASSED” or “FAILED”). *Note: This check is often insufficient; use -A for a full analysis.*
sudo smartctl -A /dev/sdaFull list of SMART attributes
sudo smartctl -t short /dev/sdaInitiate short self-test (2–5 minutes)
sudo smartctl -t long /dev/sdaStart extended self-test (10–60+ minutes)
sudo smartctl -l selftest /dev/sdaView past test results and history

S.M.A.R.T. Attributes Explained: Key Metrics for HDD and SSD

S.M.A.R.T. attributes vary by device, but key metrics are consistent. Here’s a breakdown of important ones for SSDs and HDDs:

Common S.M.A.R.T. Attributes (HDD Focused)

AttributeMeaningNormal RangeAction if abnormal
Reallocated_Sector_CtRemapped bad sectors0Replace disk if growing
Current_Pending_SectorSectors awaiting re-check0Backup immediately
Offline_UncorrectableUnfixable errors0Data may be lost
Temperature_CelsiusDrive temperature30–50°CEnsure cooling if >55°C

Critical S.M.A.R.T. Metrics for SSD Longevity

AttributeDescriptionNormal Behavior
Percentage UsedIndicates wear level (100 = fresh)Approaching 100% signals EOL
Available SpareRemaining spare blocksShould be >5%
Available Spare ThresholdWarning limit for sparesIf met, drive is near failure
Wear Leveling CountCycles of NAND erase/program operationsMonitor the trend/rate of increase rather than the raw value; interpretation varies by vendor.
Power-On HoursTotal hours poweredUseful for lifecycle planning
Power Cycle CountOn/off eventsMonitors hardware stress
Unsafe ShutdownsSudden power-offsFrequent = file system risk
Media/Data Integrity ErrorsECC correction failuresBackup data immediately

⚠️ Note: different manufacturers interpret “Percentage Used” differently. For Intel, it may represent a wear-level indicator, while Samsung uses a different algorithm.

Automate S.M.A.R.T. Monitoring with smartd Daemon

Edit the smartd configuration:

sudo nano /etc/smartd.conf

Example:

DEVICESCAN -a -o on -S on -s (S/../.././02|L/../../7/03) -m root -M exec /usr/share/smartmontools/smartd-runner

⚠️ Note: The path for the script used with ‘-M exec’

Explanation of flags:

  • DEVICESCAN — auto-detects all available drives
  • -a — enables all SMART checks and attribute logging
  • -o on — enables offline data collection
  • -S on — enables attribute autosave
  • -s — schedules tests (S = short daily @ 2 AM, L = long weekly @ 3 AM Sunday)
  • -m — email recipient for alerts
  • -M exec — triggers a script for notifications

Enable the service:

sudo systemctl enable smartd
sudo systemctl start smartd

Check status and logs:

sudo systemctl status smartd
sudo journalctl -u smartd

Configure alerting:

  • Email via sendmail or Postfix
    Example smartd.conf line for email:
    -m admin@vsys.host -M daily

⚠️ Note: The email-based alerting options (e.g., `-m admin@vsys.host -M daily`) are an **alternative** to the script execution method (`-M exec`). You should use one or the other, not both

The test schedule format depends on the smartmontools version and may differ across distributions. An installed MTA is required to send notifications.

  • Webhook Alert via Custom Script:
    For sending notifications to platforms like Slack or Microsoft Teams, you must use the -M exec flag in your smartd.conf to execute an external script upon drive failure. The following curl command is an example of the code to be placed INSIDE that custom script to send a JSON-formatted alert to a Webhook URL:
curl -X POST -H "Content-Type: application/json" -d '{"text":"SMART alert triggered"}' [https://hooks.slack.com/services/XXX/YYY/ZZZ](https://hooks.slack.com/services/XXX/YYY/ZZZ)
  • Logs to /var/log/syslog
  • Custom scripts using -M exec

S.M.A.R.T. Monitoring in Cloud & VPS: What You Can (and Can’t) Do

SMART access may not work in VPS (e.g., OpenVZ, KVM with virtio; KVM with virtio can support SMART if passthrough is enabled):

Common errors:

SMART support is: Unavailable

Alternatives:

  • Request reports from your hosting provider
  • Monitor I/O latency, kernel logs (dmesg), and SMART pass-through
  • Use tools like iostat, nvme-cli, or cloud-native monitoring solutions

Using nvme-cli to Read NVMe SSD Health

For NVMe SSDs, nvme-cli provides additional low-level diagnostics:

sudo nvme smart-log /dev/nvme0

Sample output:

Critical Warning:       0x0
Temperature:            37 Celsius
Percentage Used:        4%
Data Units Read:        7,983,283
Data Units Written:     3,812,913
Power Cycles:           123
Power On Hours:         1,023
Unsafe Shutdowns:       3
Media Errors:           0

Key fields:

  • Percentage Used — wear level (lower is better)
  • Temperature — operating temp in °C
  • Critical Warning — 0 means normal; non-zero = alert
  • Media Errors — should remain 0 in healthy SSDs

“Percentage Used ≤ 10%” for new NVMe drives is normal and not a cause for concern.

Best Practices for Reliable Disk Monitoring in Linux

Recommendations:

  • Check SMART attributes monthly or integrate into CI/monitoring stack
  • Combine with cron or Prometheus exporters
  • Act immediately if reallocated sectors or pending sectors rise
  • Replace drives before “Percentage Used” reaches 100% (SSD)
  • Use RAID or backup systems as safety nets — SMART is not foolproof. SMART does not detect sudden controller failures, so having backups is essential.

Mini Summary: Disk Monitoring Checklist

  • Monitor SMART data monthly (or automate) 
  • Use smartd with logging and alerts 
  • Replace drives before failure signs (e.g., high Pending Sectors, 100% SSD usage) 
  • Keep backup systems active (RAID, snapshots, offsite)

Rotating smartd Logs Automatically

To prevent log accumulation:

# Rotate SMART logs older than 30 days
find /var/log -name ‘smartd.log*’ -mtime +30 -delete

smartd log files are not always located in /var/log — they are often written to syslog. The command applies only in cases where smartd maintains its own dedicated logs.

Final Thoughts: Proactive Monitoring Prevents Data Loss

smartmontools empowers Linux administrators to monitor the physical health of disks and anticipate failures before they disrupt services. For both SSDs and HDDs, regular health checks with smartctl, combined with automation via smartd, help maintain system integrity and data reliability.

HAVE A QUESTION OR WANT TO GET A CUSTOM SOLUTION?

CONTACT SALES

FAQs

It’s a command-line utility for querying and analyzing S.M.A.R.T. data from storage devices.

Yes, smartctl supports both SSDs and HDDs across SATA and NVMe interfaces.

At least once per month, or integrate into scheduled monitoring.

It shows wear level — approaching 100% means the drive is near its end of life.

Some virtual environments don’t expose hardware monitoring; contact your hosting provider.

No, smartctl can be used standalone, but smartd adds automation and alerting features. smartd uses smartctl in the background.

No, it only reports health. Use it for diagnostics and planning replacements.

Usually in /var/log/syslog or available via journalctl -u smartd.