Words matter. Especially when discussing architecture with business analysts and stake holders. As Martin Fowler astutely points out, when it comes to software/system performance, several terms are used inconsistently.
Response time: From user’s perspective, It is the amount of time it takes for a system to process a request. It may be an UI action or API call.
Responsiveness: This is the amount of time the system takes to acknowledge a request. Let’s stop here and understand the difference between Response Time and Responsiveness. Last year I was involved in a fairly large project which had a complicated reporting process using very large data-sets and several rules. This example jumps out at me because the difference between Response time and Responsiveness of the system was significant here. We adopted a ‘Fire and Forget’ calling scheme from client end. The server would push status updates to client at regular intervals. In this architecture the responsiveness of the system was rather good. The server was able to respond and let the client know of its status. The total response time however was significant. An UI example would be – providing a progress bar during a file copy improves responsiveness of the system but not the response time.
Latency: Minimum time required to get any response, even if the work done is nonexistent. This becomes an issue where clients and servers are on physically separate machines. When everything is on the same machine and if the code is running correctly latency should be insignificant or nonexistent. A recommendation for systems running on separate machines – minimize remote calls.
Throughput: How much stuff a system can do over a given amount of time. Measurements of throughput - Bytes per second, transactions per second. The unit of measurement is contextual. The important thing is to talk in terms of measurements when talking about throughput.
Load: Measure of stress the system is under. Load is usually in context of other measurements, like response time. In fact, for large systems a plot of load against response time can give an interesting visualization of this aspect of performance. Such graphs often expose trends that might not have been obvious otherwise.
Efficiency: Performance divided by resources. I have always had qualms about using the term Efficiency when talking about performance of a system. Mainly because it seemed very abstract. An example in Patterns of Enterprise Architecture: A system that gets 30 tps on two CPUs is more efficient than a system that gets 40 tps on four identical CPU. Most appropriate, however on more than one occasion, customers use the term efficiency, response time and responsiveness interchangeably. Things can get confusing. When talking architecture to customers I feel safe sticking to Response Time and Responsiveness of a system.
Capacity: Maximum effective throughput or load. This might be the point below which performance is unacceptable.
Scalability: A system is scalable when more hardware increases performance (throughput). Vertical Scalability or scaling up means adding more resources to single server, like memory. Horizontal Scalability or scaling out means adding more servers.
“When building enterprise software systems, it often makes sense to build for hardware scalability rather than capacity or even efficiency. Scalability gives you the option of better performance if you need it. Scalability can also be easier to do. Often designers do complicated things that improve the capacity on a particular hardware platform when it might actually be cheaper to buy more hardware”
- Martin Fowler in his book Patterns of Enterprise Architecture