10 Point Base Server Maintenance Checklist
Server Maintenance Checklist
Servers (we'll use the term 'server' to represent dedicated & virtual environments) are awesome. They do their thing 24/7, usually without issue, but like any machine, automobile or human body they do require some maintenance.
Simple maintenance and monitoring can often prevent a server failure from turning into a server disaster. For example, I’ve had people call in a panic that there server has crashed. We begin to investigate to discover that their RAID failed last year, their backups stopped three months ago and their disk reached 100% capacity, corrupting their database.
If you use Hostway's server management solutions, you don’t have to worry about these things. We monitor, review and maintain things 24/7, but if you are managing your own server, here are ten basic items that should be part of your server maintenance checklist.
10 Base Server Maintenance Tips
1. Verify your backups are working.
Before making any changes to your production system, be sure that your backups are working. You may even want to run some test recoveries if you are going to delete critical data. While focused on backups, you may want to make sure you have selected the right backup location.
2. Check disk usage.
Don’t use your production system as an archival system. Delete old logs, emails, and software versions no longer used. Keeping your system free of old software limits security issues. A smaller data footprint means faster recovery. If your usage is exceeding 90% of disk capacity, either reduce usage or add more storage. If your partition reaches 100%, your server may stop responding, database tables can corrupt and data may be lost.
3. Monitor RAID Alarms.
All production servers should use RAID. More importantly, you should be monitoring your RAID status. We have worked on countless systems where the RAID failed. As a result, a single disk failure caused a complete system failure. At Hostway, we use direct RAID monitoring in addition to other server monitoring tools. I'd guesstimate RAID failures in about 1% of servers per year. One percent may seem small, but a complete server failure can turn a simple drive replacement into a multi-hour disaster recovery scenario.
4. Update Your OS. Control Panel and Applications.
Updates for Linux systems are released frequently. Staying on top of these updates can be challenging. This is why we use automated patch management tools and have monitoring in place to alert us when a system is out of date. If you are updating your server manually (or not at all), you may miss important
security updates. Hackers often scan for vulnerably systems within hours of a issue being disclosed. So rapid response is key. If you cannot automate your updates, then create a schedule to update your system. I recommend weekly at a minimum for current versions and perhaps monthly for older OS versions. I would also monitor release notices from your distribution so you are aware of any major security threats and can respond quickly.
If you are using a hosting or server control panel such as Plesk, cPanel or Virtualmin, be sure to update it as well. Sometimes this means updating not only the control panel itself, but also software it controls. For example, with WHM/cPanel, you must manually update PHP versions to fix known issues. Simply updating the control panel does not also update the underlying Apache and PHP versions used by your OS.
Web applications account for more than 95% of all security breaches we investigate. Be sure to update your web applications, especially popular programs like WordPress, Joomla, Drupal, WooCommerce, etc.
5. Check remote management tools.
If your server is co-located or with a dedicated server provider, you will want to check that your remote management tools like Dell OMSA work. Remote console, remote reboot and rescue mode are what I call the 3 essential tools for remote server management. You want to know that these will work when you need them.
6. Check for hardware errors.
You may want to review the logs for any signs of hardware problems. Overheating notices, disk read errors, network failures could be early indicators of potential hardware failure. These are rare but worth a look, especially if the system has not been working within normal ranges.
7. Check server utilization.
Review your server’s disk, CPU, RAM and network utilization. If you are nearing limits, you may need to plan on adding resources to your server or migrating to a new one. If you are not using a performance monitoring tool, you can install systat on most Linux servers. This will provide you some baseline performance data.
8. Review user accounts.
If you have had staff changes, client cancellations or other user changes, you will want to remove these users from your system. Storing old sites and users is both a security and legal risk. Depending on your service contracts, you may not have the right to retain a client’s data after they have terminated services.
9. Change passwords.
I recommend changing passwords every 3 to 6 months, especially if you have given out passwords to others for maintenance. You strong passwords !
10. Check system security.
A periodic review of your server’s security using a remote auditing tool such as Nessus, openVAS, Metasploit, etc . Regular security audits serve as a check on system configuration, OS updates and other potential security risks. I suggest this at least 4 times a year and preferably monthly.