I want to identify three methods in the industry under planning for capacity in the cloud.
SRE principals include capacity planning in order to ensure production readiness for a release possibly using the current performance as a baseline. Many books go in depth about why you should deal with capacity, performance with tools and ways for your cloud infrastructure.
The first method is using monitoring tools to help guide your capacity plan when you understand the system’s needs. Here is a resource on the current landscape of observability. I’m not saying monitoring is observability, but here are some observability focused tools that help you achieve monitoring for everyone. We also get monitoring out of the box when we use cloud resources from a provider such as Amazon’s web services such as cloudwatch.
The second method typically known is benchmark testing with tools such as perfkitbenchmarker.
Branden Greg’s system performance books discuss different methods for benchmarking in detail.
Here are some free resources on these topics
USE Methodhttp://www.brendangregg.com/USEmethod/use-linux.html
Benchmarking Methods https://link.springer.com/article/10.1186/2192-113X-2-6
Benchmarking tools https://github.com/GoogleCloudPlatform/PerfKitBenchmarker
One reason not to use either of these methods which I believe method monitoring your application is the most important of the three and no excuse not to. The expenditures needed for the first two methods may be expensive. If your organization is new or is still in the process of maturing your delivery cycle, this may feel overwhelming to commit to this work or even pay for increased features from a cloud offering.
Another reason is the vendor has explicit rules about benchmark testing on their platform.
Benchmark testing may not be practical for distributed systems in the cloud when resources will vary for different instance sizes such as databases or a FIFO queuing service.
Another solution is to set quotas for your application. This method is established in Cloud providers. Google presents some well enough high level docs to reference. They identify two types of quota setting.
- Rate quota such as API requests per day. This quota resets after a specified time, such as a minute or a day.
This may be possible for your inhouse built application if you design it well. If it is too complex to add it or the application you are servicing doesn’t support it, then consider an open source proxy such as envoy which has circuit breaking or rate limiting built in.
- Allocation quota such as the number of virtual machines or load balancers used by your project. This quota does not reset over time but must be explicitly released when you no longer want to use the resource, for example by deleting a GKE cluster.
The ease and affordability of the allocation solution is attractive. Let’s say we have an application that stores data in memory, and periodically flushes memory to disk for recovery, and provides users to perform reads on the data in memory.
A thought for using the monitoring method to monitor the size in memory of the machine. Allocate enough for realtime, the snapshot process and room for expensive queries. We want to ensure clear objectives if memory never goes over 70% during peak hours.
WIth or without monitoring in place setting quotas will allow you to move from unplanned routine work of increasing storage for a database at the last possible minute to avoid downtime. Say a particular application is using 40% of your resources and they are instantaneously growing usage now you have an opportunity to speak with your teams about expansion plans and if there is an acceptable budget for both parties to work out of.
These all may be complex ways to design a system internally so you may think about controlling how many users to onboard for a service.