Load balancers

#system-design

Large-scale systems opt to scale horizontally because it would be impossible to find a single server powerful enough to support hundreds of millions of RPS and grow affordably at their size. Additionally, distributed systems want to appear as a single endpoint to clients.

Key takeaways

Load balancers are servers with the single purpose of routing requests downstream to backend servers.
They maximize system availability by distributing requests as evenly as possible.
Load balancers are extremely simple applications and are heavily optimized for directing traffic.
Many load-balancing algorithms are available, each with its own trade-offs and suitable situations.

Dynamic load balancing algorithms

Least connection: Checks which servers have the fewest connections open at the time and sends traffic to those servers. This assumes all connections require roughly equal processing power.
Weighted least connection: Gives administrators the ability to assign different weights to each server, assuming that some servers can handle more connections than others.
Weighted response time: Averages the response time of each server, and combines that with the number of connections each server has open to determine where to send traffic. By sending traffic to the servers with the quickest response time, the algorithm ensures faster service for users.
Resource-based: Distributes load based on what resources each server has available at the time. Specialized software (called an "agent") running on each server measures that server's available CPU and memory, and the load balancer queries the agent before distributing traffic to that server.

Static load balancing algorithms

Round robin: Round robin load balancing distributes traffic to a list of servers in rotation using the Domain Name System (DNS). An authoritative nameserver will have a list of different A records for a domain and provides a different one in response to each DNS query.
Weighted round robin: Allows an administrator to assign different weights to each server. Servers deemed able to handle more traffic will receive slightly more. Weighting can be configured within DNS records.
IP hash: Combines incoming traffic's source and destination IP addresses and uses a mathematical function to convert it into a hash. Based on the hash, the connection is assigned to a specific server.

L4 vs L7 load balancer

L7

The most common type of load balancer you’ll likely encounter in practice, and what we covered previously, are application load balancers. These are sometimes referred to as L7 load balancers because they operate on the 7th layer (known as the “application” layer) of the OSI model.

What does it mean to operate on the application layer? Well, these load balancers will receive HTTP requests from clients. They then parse the contents of the request, and based on the contents of the request — such as path or headers — and their load balancing algorithm, they will forward the request to an appropriate application server.

(Those of you familiar with HTTPS may be wondering how these could possibly work if the requests are encrypted. To do this, L7 load balancers decrypt requests in a process known as HTTPS termination or SSL termination.)

Some examples of using request content in practice include:

L7 load balancing can utilize information about the session, like which user is being served, in order to make decisions about where to send traffic. This can be used to route all of a single user's traffic on a given session to the same machine.
Sending all requests from one user session to the same app server. This is done using headers (specifically, cookies in headers) and is known as sticky sessions. You can read more about this feature in Amazon’s application load balancer here.
Forwarding HTTP requests based on the path (e.g., sending subdomain.mysite.com to one set of app servers while sending subdomain2.mysite.com to another set). Amazon also provides this functionality in their application load balancer out of the box; you can read about it here for more info.

Of course, if desired, they can also be configured to ignore the contents of the requests and just simply round-robin between servers. The bigger point is that they can access this information to make routing requests if desired.

L4

An alternative to L7 load balancers is network load balancers or L4 load balancers. As you can probably deduce, L4 load balancers operate on the 4th layer of the OSI model [1|1]. This fourth layer on the web refers to TCP, so network load balancers perform load balancing on a TCP level.

Because they’re operating on a lower-level layer of the OSI model, these load balancers do not have any access to HTTP information when making routing requests! This is a key differentiator of network load balancers. This means that, unlike application load balancers, network load balancers will not have any information about HTTP paths or headers when selecting application servers.

What information do they have access to? Because they operate on TCP, they will be much more limited. In particular, they will have access to the client’s IP address and port (and, of course, the IP address and port the client is trying to reach). That’s it!

Because of the limited information available to network load balancers, generally, the routing that they can perform is simpler. This creates a tradeoff in that network load balancers tend to be faster and cheaper than application load balancers.

OSI model

The hierarchy of protocols is organized in something called the OSI (Open System Interconnection) model.

https://www.youtube.com/watch?v=vv4y_uOneC0

Shortcomings of the OSI model

One of the biggest shortcomings of the OSI model is that in practice, many protocols don’t fit so neatly into so many boxes. It is for this reason that the OSI model is generally regarded as an academic, rather than practical, model for computer interconnectivity.

The most notable criticisms you’re likely to encounter are:

Many protocols span multiple layers of the model or “collapse” the model into fewer layers. Bluetooth is a particular notable example of this.
The model itself is verbose and has too many layers. The TCP/IP model is intended to be a more concise version of this, specifically focused on TCP/IP.
HTTP being on the application layer fails to capture protocols that are built on top of HTTP (e.g. gRPC).

At the end of the day, the key takeaway of the OSI model is that protocols are built on top of other protocols, and this is true no matter where you look. It is a useful abstraction to be familiar with because of how often it shows up, be aware of its many exceptions.

Ref:

https://www.cloudflare.com/learning/performance/types-of-load-balancing-algorithms/