Reverse proxies are a useful tool in any system administrator’s toolkit. They have a lot of uses, including load balancing, protection from DDOS attacks
What Are Reverse Proxies?
A regular proxy, called a Forward Proxy, is a server through which a user’s connection is routed through. In many ways, it’s like a simple VPN, which sits in front of your internet connection. VPNs are a common example of these, but they also include things like school firewalls, which may block access to certain content.
A reverse proxy works a little differently. It’s a backend tool used by system administrators. Instead of connecting directly to a website serving content, a reverse proxy like NGINX can sit in the middle. When it receives a request from a user, it will send forward, or “proxy,” that request to the final server. This server is called the “origin server” since it’s what will actually be responding to requests.
While a user will probably know if they’re being routed through a forward proxy like a VPN or firewall, reverse proxies are backend tools. As far as the user knows, they’re just connecting to a website. Everything behind the reverse proxy is hidden, and this has numerous benefits as well.
This effect also happens in reverse though. The origin server does not have a direct connection to the user and will only see a bunch of requests coming from the reverse proxy’s IP. This can be a problem, but most proxy services like NGINX will add headers like
X-Forwarded-For to the request. These headers will inform the origin server of the client’s actual IP address.
What Are Reverse Proxies Used For?
Reverse proxies are pretty simple in concept but prove to be a surprisingly useful tool with many unexpected use cases.
One of the main benefits of a reverse proxy is how lightweight they can be. Since they just forward requests, they don’t have to do a ton of processing, especially in situations where a database needs to be queried.
This means the bottleneck is often the origin server, but with a reverse proxy in front of it, you can easily have multiple origin servers. For example, the proxy could send 50% of requests to one server, and 50% to another, doubling the capacity of the website. Services like HAProxy are designed to handle this well.
This is a very common use case, and most cloud providers like Amazon Web Services (AWS) will offer load balancing as a service, saving you the trouble of setting it up yourself. With cloud automation, you can even automatically scale the number of origin servers up in response to traffic, a feature called “auto-scaling.”
Load balancers like AWS’s Elastic Load Balancer can be set up to automatically reconfigure themselves when your origin servers go up and down, all made possible by a reverse proxy under the hood.
Since a reverse proxy is often much faster at responding than the origin server, a technique called caching is commonly used to speed up requests on common routes. Caching is when the page data is stored on the reverse proxy, and only requested from the origin server once every few seconds/minutes. This reduces the strain on the origin server dramatically.
For example, this article you’re reading now was served by WordPress, which needs to talk to a SQL database to fetch the article content and metadata. Doing that for every page refresh is wasteful considering the page doesn’t really change. So, this route can be cached, and the reverse proxy will just send back the last response to the next user, rather than bothering WordPress again.
A dedicated network of reverse proxies that cache your content is called a Content Delivery Network, or CDN. CDNs like CloudFlare or Fastly are very commonly used by large websites to speed up global delivery. The servers around the world that cache the content are called “edge nodes,” and having a lot of them can make your website very snappy.
Network Protection & Privacy
Since the user doesn’t know what’s behind the reverse proxy, they won’t be able to easily attack your origin servers directly. In fact, reverse proxies are commonly used with origin servers in private subnets, meaning they have no incoming connections to the outside internet at all.
This keeps your network configuration private, and while security through obscurity is never foolproof, it’s better than leaving it open to attack.
This inherent trust can also be useful when planning out your network. For example, an API server that talks to a database is similar to a reverse proxy. The database knows it can trust the API server in the private subnet, and the API server acts as the firewall for the database, only allowing the right connections through it.
One of benefits of reverse proxies like NGINX is how highly configurable they are. Often, they’re useful to have in front of other services just to configure how users access those services.
For example, NGINX is able to rate limit requests to certain routes, which can prevent abusers from making thousands of requests to origin servers from a single IP. This doesn’t stop DDOS attacks, but it’s good to have.
NGINX is also able to forward traffic from multiple domain names with configurable “server” blocks. For example, it could send requests to
example.com to your origin server, but send
api.example.com to your special API server, or
files.example.com to your file storage, and so on. Each server can have its own configuration and rules.
NGINX is also able to add extra features on top of existing origin servers, like centralized HTTPS certificates and header configuration.
Sometimes, it’s useful just to have NGINX on the same machine as another local service, simply to serve content from that service. For example, ASP.NET web APIs use an internal web server called Kestrel, which is good at responding to requests, but not much else. It’s very common to run Kestrel on a private port and use NGINX as a configurable reverse proxy.
This one is pretty simple, but having most of your traffic going through one service makes it easy to check logs. NGINX’s access log contains lots of useful info about your traffic, and while it doesn’t beat the features of a service like Google Analytics, it’s great info to have.