IT & Engineering
Running your infrastructure in a secure configuration is a daunting task even for security professionals. This guide provides practical advice to help engineers build up infrastructure following security best practices so that they can confidently deploy their services to the public Internet and lower their chances of being compromised. This guide specifically targets Linux based systems; however, the best practices apply to all computer systems.
Running your infrastructure in a secure configuration is a daunting task even for security professionals. This guide provides practical advice to help engineers build up infrastructure following security best practices so that they can confidently deploy their services to the public Internet and lower their chances of being compromised. This guide specifically targets Linux based systems; however, the best practices apply to all computer systems.
Part of confidentially running infrastructure is understanding what and whom you are protecting your infrastructure against. This guide will eventually have three versions, Basic, Intermediate, and Advanced, with each version focused on defending your infrastructure against a different class of attacker.
You are reading the Basic version which aims to protect against automated attacks and script kiddies that understand exploitation tools rather than exploitation techniques. This class of attacker is opportunistic rather than targeted and quickly moves on to easier targets. If you are running a side project or starting a company, this is the best place to start and will help build a solid foundation to build upon.
While reading this guide, consider the type of attacker and types of attacks you want to defend against. The best practices that you follow and do no follow depend on what you are trying to defend and whom you are trying to defend against.
This guide follows these guiding principles in its discussion of software security:
Aggressively applying security updates for software you didn’t write might seem like a poor way to protect your infrastructure and perhaps even pointless. However, it’s one of the best time investments you can make, from a security perspective. Following are two examples of recent security issues that unsophisticated attackers using automated tools can exploit if you have not updated your servers with the latest security patches:
These two issues alone would give an attacker complete control of your entire infrastructure. Luckily mitigating these bugs is not difficult.
Consistently apply security updates provided by your operating system vendor. Most vendors have an automated method. For example, for Debian based systems, you can use Unattended Upgrades,and for Red Hat based systems, you can use AutoUpdates.
Automated patching is great, however, it does have a potential downside (to your business) if you don’t test your software before you apply patches to production servers: things can unexpectedly break. As much as package maintainers try and ensure that security updates don’t contain breaking changes, package maintainers can not test every combination that may be running somewhere before release. That’s why it’s important to either have a staging Continuous Integration/Continuous Deployment (CI/CD) system or manually test security updates before rolling them out to production servers.
Just applying these security updates is not enough, however. If the issue is in a shared library, you will be using the old version of the library and still be vulnerable to exploitation until you restart the process that is linked to it. To check whether you have any binaries that need to be restarted, you can use checkrestart for Debian based systems and needs-restarting for Red Hat based systems.
Hardening your application by using OS-level facilities is an effective approach for limiting the scope of damage that attackers can do after they exploit a vulnerability in your application. This section focuses on using traditional Unix Access Control facilities that most users are familiar with to restrict your application to the minimal set of access it needs to operate. The facilities are permissions on files, user identifier (UID), and root access.
The goal of this section is not to harden your application to the extent that an attacker can not compromise your application. That is an almost impossible goal. To goal is to limit what an attacker can do once your application has been compromised. After an attacker has exploited your application, the attacker will be able to perform actions as your application, and possibly even elevate their privileges to root which allows them to have full and complete access to your operating system. The goal is instead to restrict the actions that your application can perform to the limited set that it needs to operate, which in turn restricts the attacker.
You want to restrict your application such that even if an attacker has exploited your process and can execute code as that user account, the user has limited access rights on the file system. The same concept applies to the process under which the account is executing: restrict the CPU time, memory, and file descriptor count to mitigate DOS style attacks where the attacker exhausts your resources. The goal is to force the attacker to use a privilege escalation attack (exploit another part of your operating system to elevate their privileges higher than the running application) to do anything meaningful on your system.
To restrict the account on which your application runs, use the following guidelines:
0777
; instead, use a value like 0660
.foo
. Create a user called fooapp
make its home directory /var/appdata/fooapp
: sudo useradd -r -s /bin/false --home /var/appdata/fooapp fooapp sudo mkdir /var/appdata/fooapp sudo chown fooapp:fooapp /var/appdata/fooapp
To restrict the process that runs your application, use the following guidelines:
/etc/security/limits.conf
file. For example, if you want to limit the number of open file descriptors to 10 and limit memory to 1 GB, add the following lines to the /etc/security/limits.conf
file:fooapp hard nofile 10 # 10 open file descriptor limit fooapp hard as 1000000 # 1 GB limit
/opt/rproxy
, you can set its capabilities as follows:setcap 'cap_net_bind_service=+ep' /opt/rproxy
chroot
, but be aware that there some maintenance overhead is required. chroot
allows you to limit the scope of what a process can see on the file system; specifically, it changes your root directory to a directory of your choosing. For example, if you define /var/chroot
as your new root directory, processes see files under /var/chroot
as /
. Although this is more secure, it means that any shared libraries that your process might use must either be copied over and reside within /var/chroot
, which in turn means that whenever you apply security updates, you also need to re-copy any updated shared libraries. You can avoid this maintenance with hard links, but then you are offering a path outside that an attacker can potentially exploit. Other approaches (cgroup-based approaches) that you can take to gain similar benefits will be discussed in the intermediate version of this guide.Strong firewall rules enable you to define what inbound and outbound communication is allowed from your servers. Starting with a default deny policy and allowing only specific traffic in and out forces you to think about the minimal set of services that you want to expose, which in turn can lower your risk of attack. An errant process cannot expose your entire infrastructure to the general public unless you specifically allow it to.
This section focuses on inbound firewall rules and TCP/IP stack settings. Although outbound firewall rules are very effective in limiting how far an attacker can go after they have gotten inside your infrastructure, the next version of this guide will focus on them.
First firewall rules. When building a script for firewall rules, use the following guiding principles.
Following is a commented script that accomplishes all of these goals:
Following is a small script for IPv6 traffic:
These rules are now running in memory, and you need to ensure that they are loaded the next time your operating system restarts. For Debian based systems, that means either adding your firewall rules to /etc/network/ip-pre-up.d/ or adding a pre-up command to /etc/network/interfaces. For Red Hat systems, this is typically done by using the /sbin/service iptables save command.
In addition, the following TCP/IP stack hardening/tuning is recommended:
You can try out all the above settings with the following script:
To persist these settings across a reboot, update /etc/sysctl.conf
:
For remote login, you want to ensure not only that communication with your servers is encrypted but also that only authorized users have access to your servers. Following are typical goals when you are securing remote login:
sudo
) to log the actions performed.Failure to realize any one of these goals can be a security risk. Weak (or no) cryptography can allow an attacker to view your communication. Weak authentication can allow unauthorized users access to your systems.
Luckily, Secure Shell (SSH) mitigates most of these risks, and with a few minor tweaks to your systems, all of them can be mitigated.
To start with, generate your SSH key correctly by ensuring that you are using a key size that is large enough and your key is passphrase protected. You can do that by using ssh-keygen
:
ssh-keygen -t rsa -b 4096 -C foo@example.com
Then when prompted, enter a passphrase! A passphrase ensures that even if someone steals your key they cannot use it without also knowing your passphrase.
OpenSSH has a reasonable default configuration that is quite secure. However, some distributions might weaken these defaults to make OpenSSH interoperability with legacy servers. The following configuration simply ensures that those reasonable defaults are applied by your version of OpenSSH. For more detailed information about OpenSSH configuration, see Mozilla’s Configuration guide for OpenSSH and the Securing SSH page for CentOS. Both are excellent resources and we’ll build on those configurations in future versions of this guide.
On the server, ensure that you have the following lines in the /etc/ssh/sshd_config
file:
This configuration achieves the following goals:
PasswordAuthentication no
and PubkeyAuthentication yes
force you to use public key cryptography, and not passwords, to authenticate to your servers. Although you can have a strong password, if you have randomly generated 2048-bit password that’s encoded in ASCII, more passwords are bad and password lengths that are commonly used have a much smaller search space than a large key.PermitRootLogin no
disables the ability remotely log in as the root user. Although this is not a directly exploitable issue, disabling this remote login helps you keep good audit logs to understand what is happening on your servers. The root account acts as a shared administrative account, which limits your ability to audit which user is performing which privileged action. If you force all users to go through their own accounts, you will have an auditable trail of which user performed what action. Details about how to set up audit logging are provided in a later section.LogLevel VERBOSE
logs the user and key fingerprint that made an attempt to authenticate. Again, this setting does not directly mitigate an exploit, but is good for auditing.On the client, ensure that you have the following lines in the /etc/ssh/ssh_config
file:
This configuration achieves the following goals:
HashKnownHosts yes
hashes host names and addresses in your ~/.ssh/known_hosts
file. Even if an attacker steals your known hosts file, they can’t simply enumerate the hosts you connect to with your key.StrictHostKeyChecking ask
checks the key presented to you against the one in your ~/.ssh/known_hosts
file and, if it has changed (or it is the first time you are visiting that host), asks you if you will accept that key. This helps mitigate man-in-the-middle attacks.Lastly, give users limited access to your infrastructure. For example, not all users need access to your backup servers, only give access to the users that actually know how to restore from backups. This ensures that even if a non-backup capable user account is compromised, the integrity of your backups is not in question.
To accomplish this, two of the common approaches are:
newusers
and userdel
commands to accomplish this and automate/orchestrate this using configuration management tools like Chef or Ansible.Trust boundaries are a common place where security vulnerabilities occur. The boundary between the outside world and your internal infrastructure is sacred, and you should do as much as possible to defend it and ensure that only authorized users can traverse it.
You should be concerned about two main trust boundaries. The first is the boundary between the public Internet and your API endpoint; this is the boundary that your customers will cross every day when using your service. The second is an access point for your developers and system administrators that will be used to deploy and service your application.
For the API trust boundary that your customers will traverse, you want to consider all user input as hostile and assume that every request that is made is actually an attempt to exploit your infrastructure. When you think about user requests in that manner, it becomes clear that you need to minimize the attack surface that you provide your users and isolate the damage that can occur when a user does eventually exploit your service.
For the trust boundary that you cross to service your applications, you want to isolate all your services so that they are not exposed to the public Internet and then force users who access them from the Internet to traverse a well-guarded access point that you can defend (let’s call this a bastion host). You can concentrate all your resources on that one access point and be less concerned about how your services communicate with each other when they are within that trust boundary.
To mitigate these trust boundary issues, you need envision the distinct boxes into which you can place the public Internet and your infrastructure (see the following diagram). After you have that, you can start to think about how you can defend your infrastructure.
The first large box is the public Internet. You should never trust anyone or anything on the public Internet. In fact, you should consider all actors on the public Internet as hostile even when they are your own employees SSH’ing into your servers.
The second large box is your internal infrastructure. These are your trusted hosts. Services that run on these servers should listen only on private network interfaces if possible and not be directly exposed to the public Internet.
The two boxes that span both are you jump host and your API hosts. These hosts should have access to both the public Internet and your internal infrastructure. Because they are directly exposed to the public Internet, they should be hardened and run the minimal set of services that are required to execute their tasks.
Although we can’t expect any service to be bug-free, we can limit how much an attacker can exploit your infrastructure if they do exploit your service. That’s why we recommend isolating the services that accept inbound requests and running them on their own dedicated servers.
You can accomplish this by splitting up incoming requests into two parts: load balancing and Transport Layer Security (TLS) termination of incoming requests, and the handing of the request by your service itself. Both should be, at the very minimum, their own processes, if not run on different servers, with the load balancing and termination of TLS on the Untrusted/Trusted boundary. This section will focus on load balancing and termination (application hardening was covered in a previous section).
When you separate load balancing and TLS termination from your application, you are limiting the possibility of a bug in either your load balancer or TLS software from escalating into exploitation of your entire application, which will typically have sensitive information loaded in memory. It also gives you a single point of maintenance (and failure) to patch when a vulnerability is found and you need to upgrade your TLS library, which is becoming an increasingly common task.
For example, let’s say an attacker has a Remote Code Execution (RCE) bug like GHOST or an arbitrary memory read bug (like Heartbleed) in your HTTP server or TLS software. If your HTTP server, TLS termination, and application logic are all within the same process, a bug in any one of them gives the attacker access to sensitive information within the other parts. For example, a bug in OpenSSL can give an attacker access to sensitive keys that your application has loaded in memory. Conversely, a bug in your application can potentially give an attacker access to your SSL certificates. However, if you separate these parts out, if one is exploited, the other is not, and you lose only some sensitive data.
For load balancing, common choices are NGINX, HAProxy, and Apache. TLS termination is typically done with OpenSSL; however, alternatives like LibreSSL and Mozilla NSS exist. Another alternative is to use something like vulcand, which acts as a load balancer and uses the Go TLS library for TLS termination.
You can strengthen your service endpoint by restricting access to your servers from the public Internet and forcing all authentication to go through a jump host. This restriction is typically achieved by not directly exposing your infrastructure to the public internet, and instead building some kind of internal network that can be accessed only through the bastion host.
There are many ways to build an internal network, and your approach will largely depend on how your infrastructure is configured by your service provider and your preferences.
For example, say you host your servers on Amazon Web Services (AWS). Then you can start with a Virtual Private Cloud (VPC) with a single public subnet. Your servers will be isolated from other servers on AWS and will reside within their own 10.0.0.0/16
CIDR block. However, they will still have unfettered access to the Internet. To restrict access from the Internet create Security Groupsthat both isolate the ports that are open as well as the servers that can access those ports. For example, you would configure your worker hosts to accept connections on ports 22 and 80, but only from your jump host and load balancer respectively. Your jump server, however, would accept connections on port 22 from any server on the public internet.
If your service provider does not provide these tools, you can accomplish the same things as long as they support some kind of private network, either shared or dedicated, that allows you to isolate public and private traffic. This feature is typically offered by most vendors: as mentioned before Amazon calls it VPC, Rackspace calls it ServiceNet, and Digital Ocean calls it Private Networking. All offer essentially the same ability: when you build your virtual server, you can bind it to the public interface, the private interface, or both. If your service provider does not have this ability at build time, you can enable and disable these interfaces yourself in the /etc/network/interfaces
file on a Debian based system and in the /etc/sysconfig/network-scripts/ifcfg*
file in a Red Hat based system.
Once you have servers that have public and private interfaces, use iptables
to restrict inbound traffic to publicly accessible interfaces to the servers that either act as jump hosts or run the publicly accessible API. The servers that handle all internal services, like your application and database server, disallow any inbound traffic to public interfaces and restrict inbound traffic on the private interfaces to the trusted set of servers.
Once you have accomplished this, the only way for an attacker to exploit your infrastructure from the public Internet is to enter from your hardened bastion host or exploit your API in some manner.
Finally, to access these servers, don’t use ssh-agent
; instead, use ProxyCommand
. Although ssh-agent
has its purposes, it is not good for this particular use case. If you used it, anyone who has a local privilege escalation exploit for your bastion server could access any server on your infrastructure by impersonating anyone whose keys are currently loaded into memory by ssh-agent
. By contrast, with ProxyCommand
, your keys will not be left in memory for someone to steal, and your private key will live exclusively on your local workstation; only your public key will be copied over to each server you need access to.
To use ProxyCommand
, copy your public key to ~/.ssh/authorized_hosts
to all the server you need access to. Then on your workstation, update your ~/.ssh/config
file with the following information:
This configuration allows you to access server1.example.com
and server2.example.com
from workstation.example.com
via SSH on their private interfaces by “jumping” through jump.example.com, which has access to both public and private interfaces. All you have to do to connect is to type ssh server1.example.com
or ssh server2.example.com
.
Every security measure can and will be circumvented at some point in time. Because no practical security measures can provide ironclad security guarantees, it’s important to have strong monitoring and logging facilities to help you understand where and how your systems were compromised. The better you understand how and what is running on your systems, the better you will be at detecting anomalous behavior. The same way a bank installs security cameras even though it secures its vaults, having good monitoring tools is critical to catch those clever hackers that have defeated your security measures.
Monitoring and logging take two forms. The first is live monitoring, which enables you to see what is happening on your system at any given moment. This encompasses everything from the network sockets that are open to the processes that are currently running. The second is the log data of actions that have already been taken. This covers everything from application logic logging to system logs.
In this section, we will cover looking at system logs on individual servers themselves. In the intermediate version of this guide, we will talk about log aggregation and alerting.
The following tools come bundled with most UNIX based operating systems. They are useful to use when you suspect that a security incident is occurring, but are critical to use beforehand as well so that you can understand their normal output.
Following are some commands and their expected output under normal operating conditions. These examples illustrate what the output should look like on y
who
– Shows you who is logged in at the moment.
last -a
– Shows you a list of the last few logged in users. Prints the username, logged in time, as well as the IP addresses logged in from.
netstat -plntu
– Shows process names and the ports they are listening for connections on.
netstat -ap
– Shows a live stream of all connections including established outbound connections.
find / -mtime -1 -ls | head -n 20
– List top 20 files modified within the last 24 hours.
faillog -a
to see a summary of login failures. This is also useful to limit the number of maximum failed logins that a user has.
tcpdump -i eth1 -s 0 -A tcp port http
– Dumps all HTTP traffic on interface eth1
. This is useful if you have found something suspicous using netstat
and want to dig in deeper. This guide can not give you all the ins-and-outs of tcpdump
, but there are a variety of resources on the internet to help you understand tcpdump
.
Following are a few general rules for how to handle application logging and pointers to important system logs.
Aggregate your application logs to a central location, be that a single log file or directory. The common approach to this is to use syslog for this ability. Using syslog makes shipping the logs to a central logging server easier in the future.
Keep your logs as long as disk space allows. Keeping logs for up to 90 days on disk is not unreasonable if you have the space for it.
Like with live monitoring, it’s a good idea to watch the following system log files on a regular basis to develop a good baseline of expected output. Having a baselines makes spotting suspicious behavior that much easier in the future. Following is a partial list of interesting system logs to watch:
/var/log/auth.log
– System authentication logs./var/log/syslog
– If you are not sending logs to a particular syslog facility, they will be located here./var/log/messages
– General system log messages.~/.bash_history
– List of Bash commands that were executed by the user. This log can be easily manipulated or wiped by sophisticated attackers./var/log/utmp
and /var/log/wtmp
– These logs contain the current logged-in users and the history of all logged-in users. Use last -f
to view these files.who
, last
, lsof
, netstat
, faillog
, and find
./var/log/auth.log
, /var/log/syslog
, and /var/log/messages
.Cryptography is a complex topic that should be covered in its own right. Even slight oversights or mistakes can lead to the complete compromise of the security of a product. This is why the “don’t roll your own cryptography” mantra is so often repeated. Two good sources to read before you start working with cryptography are Crypto101 written by Laurens Van Houtven (lvh) and the matasano crypto challenges.
That being said, keeping your infrastructure safe requires some use of cryptography, and there are common patterns that can be safely used. This section covers one of those patterns: how to store sensitive data in source code (or on disk).
When your store credentials, either in source control or on disk, don’t store them unencrypted. You might think that your passwords are secure if you use a GitHub private repository, but you don’t want to rely on GitHub to keep your entire infrastructure safe from attackers. If you encrypt your credentials, you can maintain your security even if GitHub is compromised.
When looking for a tool or library to encrypt small amounts of data, consider the following recommendations:
/dev/urandom
to obtain random numbers used in keys, salts, and nonces.You can build an encryption tool yourself, however as mentioned, this can be tricky and is not recommended. However, if you insist on building it, use a library like NaCl or cryptography.io that will at least get the cryptography right for you. However, it’s even better to use a “recipe” someone has built to do this like lemma or Fernet which both expose a simple API you can use the encrypt and decrypt data in a safe manner.
/dev/urandom
to generate random material.Although backups may not seem to be in the same category as the other topics discussed in this guide, they matter for infrastructure security just as much. Backups serve two primary purposes: restoring in-case of some non-malicious hardware failure and restoring in the case of an attacker compromising your infrastructure. Remember, in the case of a compromise, it’s better to wipe your server and create a fresh one than try and remove malware which can be difficult to impossible for a novice. This is why backups are critical if the case of a compromise in bringing your infrastructure back up in a trusted state.
The following approaches are a good general strategy to follow when working with backups: