SCALABILITY & PERFORMANCE
What is scalability? Scalability is an overloaded term that has been perverted by technical marketing to confuse potential customers. Every system today would be advertised as scalable although it is often not true. Scalability is the ability of a system to deliver better performance when the size of the system is increased with more resources. But what does better performance mean? Performance is also an overloaded term with different meanings depending on the context. The term is also misused in technical marketing. In databases, the two most important performance metrics are throughput and response time. It is important to define them in order to understand how they can be improved.
THROUGHPUT
Throughput is the number of operations per time unit a system can make (see Figure 1). In the case of a database, typical throughput measures are transactions/second, inserts/second, queries/second. It is important to understand that a throughput metric is given for a particular workload, software system and underlying hardware. Changing the workload might have a dramatic effect on the throughput. For instance, a workload might be limited by one resource, say CPU, and when the workload changes, it might become limited by another resource, say IO bandwidth. This workload change can typically slow down by more than an order magnitude the database throughput.
RESPONSE TIME
Response time is the time from submitting an operation until receiving the answer (see Figure 2). It is important to define in which conditions the response time is measured. For operational databases, response time only makes sense to be measured while we inject a particular workload and the system is in steady state, delivering a stable throughput (see our blog post on measuring scalability and performance). For instance, one can measure the average response time of the transactions. If the workload contains different kinds of transactions, which is most common, averaging the response time per kind of transaction is more informative than just the global time. What is more, the average is not sufficient. What you actually need to know is the actual distribution of the response time. The average plus the percentiles (90%, 95% and 99%) provide a good insight on how the database behaves. However, you also need both throughput and response time for any given workload to understand how response time evolves with an increasing throughput.
VERTICAL VS. HORIZONTAL SCALABILITY (SCALE UP VS. SCALE OUT)
Coming back to the definition of scalability [Özsu & Valduriez 2020], we said that is the ability to deliver more throughput when we use more resources, but what does more resources mean? It depends on whether the system is centralized or distributed. Thus, scalability can be classified between vertical or horizontal, depending on what we mean by more resources. In a centralized system we can add more CPUs, more memory, more storage devices to increase computational resources (see our blog post on database architectures). We say a database scales vertically when it is able to provide more throughput with a bigger computer in terms of CPUs, memory and IO devices (see Figure 1).
Now, consider a distributed database running on a set of computers connected by a network that share nothing (see our blog post on shared nothing), i.e., on a computer cluster. In this case, we talk about horizontal scalability. A database scales horizontally when adding more nodes to the cluster yields more throughput (see Figure 2, each column is a cluster with each box being a server within the cluster).
SCALABILITY GRAPH
Figure 3 depicts a sample horizontal scalability of a distributed database. The scalability graph has in the x axis the cluster size in number of nodes and in the y axis the throughput. One node delivers a throughput of 500 txn/sec. By increasing the cluster size, we can observe how the total throughput of the cluster increases. With 2 nodes is almost 1,000 txn/sec, and with 9 nodes is around 2,600 txn/sec.