I recently fixed my mother-in-law’s laptop. Long story short, the hard disk was toast. When I repaired the computer, I decided it would be best to avoid the frustration and confusion of a failing hard drive in the future. So, how does one know if his or her mother-in-law’s hard disk is reaching senility? SmartMonTools appears to be the best tool for the job. It’s a tool for monitoring and reporting hard disk health with Self-Monitoring, Analysis, and Reporting Technology (SMART) which is built into most hard drives. SmartMonTools is even cross-platform and available in package repositories everywhere.

Tutorial

This tutorial describes the steps required to setup automated hard disk health checks and email notifications using SmartMonTools 7.1 on Ubuntu 20.04. If you’re configuring a desktop like me or otherwise configuring a system which doesn’t have a MTA (Mail Transport Agent) or MUA (Mail User Agent) setup already and wish to send emails externally, I recommend following my tutorial on setting up an OpenSMTPD Relay on Ubuntu. Emails sent straight from a willy-nilly desktop user account to an online email provider are unlikely to be accepted. With SMTP, your system can relay emails through your online email provider to remedy this.

Install

First, install SmartMonTools on Ubuntu.

$ sudo apt -y install smartmontools

Configure

The tool to monitor your system is, of course, smartd. Configuration is done in /etc/smartd.conf. Consult the smartd.conf manpage for more details. The smartd.conf below provides a complete configuration example. It checks the SATA disk /dev/sda for various types of failures, schedules regular self-tests, reports any errors via email, and avoids consuming excessive energy by frequently waking the disk.

/etc/smartd.conf
/dev/sda \ (1)
    -d sat \ (2)
    -o on \ (3)
    -S on \ (4)
    -H \ (5)
    -l error \ (6)
    -l selftest \ (7)
    -f \ (8)
    -n standby,15,q \ (9)
    -s (L/../(01|16)/./03|S/../.././01|O/../.././(00|06|12|18)) \ (10)
    -m jdoe@gmail.com \ (11)
    -M exec /usr/share/smartmontools/smartd-runner (12)
1 Run for the device /dev/sda.
2 Specify that the device uses a SCSI to ATA Translation interface.
3 Enable SMART Automatic Offline Testing.
4 Automatically save SMART attributes.
5 Check the health status of the disk for failing health status.
6 Report if there are any new SMART errors.
7 Report if there are any new SMART errors for any self-tests.
8 Check for failure of any Usage Attributes.
9 Check the device unless it is in SLEEP or STANDBY mode. Wake it up after 15 skipped checks. Don’t log the skipped test, which could wake-up the disk.
10 Schedule long self-tests for the first and sixteenth days of the month at 3 AM, short self-tests daily at 1 AM, and Offline Immediate Tests four times each day at midnight, 6 AM, noon, and 6 PM.
11 Email jdoe@gmail.com if any errors are detected.
12 Execute /usr/share/smartmontools/smartd-runner instead of the default mail command when sending emails.

Enable smartd on system startup.

$ sudo systemctl enable smartd

Verify

You will probably want to double check the scheduling and emailing behavior, at the very least.

Scheduling

Audit the self-test schedule with smartd -q showtests. This will show the next five tests scheduled for each type of self-test. It also shows the total number of tests for each type of self-test for the next ninety days.

$ smartd -q showtests
smartd 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-31-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

Opened configuration file /etc/smartd.conf
Configuration file /etc/smartd.conf parsed.
Device: /dev/sda, opened
Device: /dev/sda, CT2000MX500SSD1, S/N:000000000001, WWN:0-000000-000000000, FW:M3CR023, 2.00 TB
Device: /dev/sda, found in smartd database: Crucial/Micron MX500 SSDs
Device: /dev/sda, WARNING: This firmware returns bogus raw values in attribute 197
Device: /dev/sda, enabled SMART Attribute Autosave.
Device: /dev/sda, enabled SMART Automatic Offline Testing.
Device: /dev/sda, is SMART capable. Adding to "monitor" list.
Device: /dev/sda, state read from /var/lib/smartmontools/smartd.CT2000MX500SSD1-000000000001.ata.state
Monitoring 1 ATA/SATA, 0 SCSI/SAS and 0 NVMe devices

Next scheduled self tests (at most 5 of each type per device):
Device: /dev/sda, will do test 1 of type O at Mon May 25 12:25:20 2020 CDT
Device: /dev/sda, will do test 2 of type O at Mon May 25 18:25:20 2020 CDT
Device: /dev/sda, will do test 3 of type O at Tue May 26 00:25:20 2020 CDT
Device: /dev/sda, will do test 1 of type S at Tue May 26 01:25:20 2020 CDT
Device: /dev/sda, will do test 4 of type O at Tue May 26 06:25:20 2020 CDT
Device: /dev/sda, will do test 5 of type O at Tue May 26 12:25:20 2020 CDT
Device: /dev/sda, will do test 2 of type S at Wed May 27 01:25:20 2020 CDT
Device: /dev/sda, will do test 3 of type S at Thu May 28 01:25:20 2020 CDT
Device: /dev/sda, will do test 4 of type S at Fri May 29 01:25:20 2020 CDT
Device: /dev/sda, will do test 5 of type S at Sat May 30 01:25:20 2020 CDT
Device: /dev/sda, will do test 1 of type L at Mon Jun  1 03:25:20 2020 CDT
Device: /dev/sda, will do test 2 of type L at Tue Jun 16 03:25:20 2020 CDT
Device: /dev/sda, will do test 3 of type L at Wed Jul  1 03:25:20 2020 CDT
Device: /dev/sda, will do test 4 of type L at Thu Jul 16 03:25:20 2020 CDT
Device: /dev/sda, will do test 5 of type L at Sat Aug  1 03:25:20 2020 CDT

Totals [Mon May 25 10:25:20 2020 CDT - Sun Aug 23 10:25:20 2020 CDT]:
Device: /dev/sda, will do   6 tests of type L
Device: /dev/sda, will do  90 tests of type S
Device: /dev/sda, will do   0 tests of type C
Device: /dev/sda, will do 360 tests of type O

Further verification can be done to make sure self-tests are running at the scheduled times. After an amount of time where smartd is expected to run some self-tests, check the self-test log with smartctl. The log shows that one short offline test has been run without error.

$ sudo smartctl -d sat -l xselftest,25,selftest /dev/sda
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-31-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       460         -

Email Alerts

To test the email functionality, you can tell smartd to send a test email.

/etc/smartd.conf
/dev/sda \
    -d sat \
    -o on \
    -S on \
    -H \
    -l error \
    -l selftest \
    -f \
    -n standby,15,q \
    -s (L/../(01|16)/./03|S/../.././01|O/../.././(00|06|12|18)) \
    -m jdoe@gmail.com \
    -M test \ (1)
    -M exec /usr/share/smartmontools/smartd-runner
1 Send a test email when smartd starts.

Restart smartd so that it sends the test email.

$ sudo systemctl restart smartd

If everything works, you should receive an email at the designated address.

Make sure to remove the -M test directive from the file so you don’t spam yourself.