Use Dynamic Application Sizing
Enterprise Only
The functionality described here is available only in Nomad Enterprise with the Multi-Cluster & Efficiency module. To explore Nomad Enterprise features, you can sign up for a free 30-day trial from here.
Prometheus Required
Currently, Prometheus is the only APM supported for Dynamic Application Sizing
Using a Vagrant virtual machine, you will deploy a simple environment containing:
- An APM, specifically Prometheus, to collect metric data.
- Nomad Autoscaler Enterprise.
- A sample job, which will be configured to enable DAS recommendations with:
- one NGINX instance used as a TCP load balancer.
- three Redis instances to service requests.
- A sample dispatch job to create load on the Redis nodes.
Prerequisites
Familiarity with the Dynamic application scaling concepts tutorial.
This Vagrantfile to create a suitable environment to run the demonstration.
This Vagrantfile provisions:
one Ubuntu 20.04 VM preinstalled with:
- Nomad Enterprise v1.0.0 beta 2
- The current version of Consul installable via package
- The current version of Docker installable via package
Start and connect to the Vagrant environment
Download the Vagrantfile. Start the test-drive environment by running
vagrant up
.
Once the environment is provisioned and you are returned to your command prompt, connect to the Vagrant instance.
Once you are at the vagrant@ubuntu-focal:~$
prompt, you are ready to continue.
Verify Nomad telemetry configuration
Nomad needs to be configured to enable telemetry publishing. You need to enable
allocation and node metrics. Since this tutorial also uses Prometheus as its APM,
you need to set prometheus_metrics
to true.
The configuration for the Nomad inside the test-drive already has the
appropriate telemetry configuration. View the configuration using
cat /etc/nomad.d/nomad.hcl
file and note the following stanza is included.
Given this configuration, Nomad generates node and allocation metrics and make them available in a format that Prometheus can consume. If you are using this test-drive with your own Nomad cluster, add this telemetry block to the configuration for every Nomad node in your cluster and restart them to load the new configuration.
Return to the vagrant user's home directory if you changed away from it.
Start Prometheus
The autoscaler configuration in this test-drive uses Prometheus to retrieve historical metrics when starting to track a new target. In this beta, Prometheus is also used for ongoing monitoring metrics, but this is currently being shifted to using Nomad's metrics API. The first step is to run an instance of Prometheus for the Nomad Autoscaler to use. The simplest way to do this is to run Prometheus as a Nomad job. The environment contains a complete Prometheus job file to get started with.
You can create a file called prometheus.nomad
with the following content, or
you can copy prometheus.nomad
from the ~/nomad-autoscaler/jobs
folder when
logged into a vagrant user's shell inside the VM.
Run the job in Nomad.
Start the autoscaler
The next step is to run the Nomad Autoscaler. For the beta, an enterprise version of the Nomad Autoscaler is provided that includes the DAS plugins. The simplest approach is to run the autoscaler as a Nomad job; however, you can download the Nomad Autoscaler and run it as a standalone process.
This test-drive Vagrant environment comes with Consul. The supplied Nomad job specifications uses this Consul to discover the Nomad and Prometheus URLs. Should you want to use this specification in a cluster without Consul, You can supply the URLs yourself and remove the checks.
You can create a file called das-autoscaler.nomad
with the following content, or
you can copy das-autoscaler.nomad
from the ~/nomad-autoscaler/jobs
folder when
logged into a vagrant user's shell inside the VM.
Run the job in Nomad.
Upon starting, the autoscaler loads the DAS-specific plugin and launches workers
to evaluate vertical policies. You can see the logs using the Nomad UI or nomad alloc logs ...
command:
If there are already jobs configured with vertical policies, the autoscaler begins dispatching policy evaluations from the broker to the workers; otherwise, this occurs when vertical policies are added to a job specification:
Note
The autoscaler does not immediately register recommendations.
The evaluate_after
field in the autoscaler configuration indicates the
amount of historical metrics that must be available before a recommendation
is made for a task. The purpose is to prevent recommendations with
insufficient historical information; without representative data,
appropriate recommendations cannot be made, which could result in
under-provisioning a task. For the purpose of evaluating the feature, this
can be reduced. For more production-like environments, this interval should
be long enough to capture a representative sample of metrics. The default
interval is 24 hours.
Deploy the sample job
Create a job named example.nomad.hcl with the following content.
Add DAS to the sample job
In order to enable a Nomad job task for sizing recommendations, the following job specification contains a task scaling stanza for CPU and one for memory. These stanzas, when placed within a job specification's task stanza, configure the task for both CPU and memory recommendations.
To enable application-sizing for multiple tasks with DAS, you need to add this
scaling block to every new or additional task in the job spec. Inside both the
cache-lb
and the cache
tasks, add the following scaling policies. You can
verify your changes against the completed example.nomad.hcl
file in the
~/nomad-autoscaler/jobs
directory.
Note
These scaling policies are extremely aggressive and provide
"flappy" recommendations, making them unsuitable for production. They are
set with low cooldown
and evaluation_interval
values in order to
quickly generate recommendations for this test drive. Consult the
Dynamic Application Sizing Concepts tutorial for how to determine
suggested production values.
Reregister the example.nomad.hcl file by running the nomad job run example.nomad.hcl
command.
Once the job has been registered with its updated specification, the Nomad autoscaler automatically detects the new scaling policies and start the required internal processes.
Further details on the individual parameters and available strategies can be found in the Nomad documentation, including information on how you can further customize the application-sizing block to your needs (percentile, cooldown periods, sizing strategies).
Review DAS recommendations
Once the autoscaler has generated recommendations, you can review them in the Nomad UI or using the Nomad API and accept or dismiss the recommendations.
Select the Optimize option in the Workload section of the sidebar. When there are DAS recommendations they appear here.
Clicking Accept applies the recommendation, updating the job with resized tasks. Dismissing the recommendation causes it to disappear. However, the autoscaler continues to monitor and eventually makes additional recommendations for the job until the vertical scaling policy is removed from the job specification.
Click the Accept button to accept the suggestion.
You also receive a suggestion for the cache-lb
task.
Click the Accept button to accept the suggestion.
Use curl to access the List Recommendations API.
You should receive two recommendations: one for the cache task and one for the cache-lb task.
You can accept them by using the Apply and Dismiss Recommendations API endpoint. Replace the Recommendation IDs in the command with the recommendation IDs received when you queried the List Recommendations API.
Verify recommendation is applied
Watch for the deployment to complete and then verify that the job is now using
the recommended values instead of the ones initially supplied. You can do this
with in the Nomad UI or using the nomad alloc status
command for a cache
and a
cache-lb
allocation listed from the nomad job status example
command.
Navigate to the example job's detail screen in the Nomad UI
Note that the Task Groups section shows the updated values for Reserved CPU and Reserved Memory given by the autoscaler.
List out the allocations for the example job by running nomad job status example
.
From the job status output, a cache
allocation has allocation ID 5a35ffec.
Run the nomad alloc status 5a35ffec
command to get the Task Resources
information about this allocation.
Note that the Task Resources section shows the updated values for memory and CPU given by the autoscaler.
From the earlier job status output, a cache-lb
allocation has allocation ID
8ceec492. Run the nomad alloc status 8ceec492
command to get the Task
Resources information about this allocation.
Here, also, the Task Resources section shows the updated values for memory and CPU given by the autoscaler.
Generate load to create new recommendations
Create a parameterized dispatch job to generate load in your cluster. Create a
file named das-load-test.nomad
with the following content. You can also copy
this file from the ~/nomad-autoscaler/jobs
folder in the Vagrant instance.
Register the dispatch job with the nomad job run das-load-test.nomad
command.
Now, dispatch instances of the load-generation task by running the following:
Each run of this job creates 100,000 requests against your Redis cluster using 50 Redis clients.
Once you have run the job, watch the Optimize view for new suggestions based on the latest activity.
Exit and clean up
Exit the shell session on the Vagrant VM by typing exit
. Run the vagrant destroy
command to stop and remove the virtual box instance. Delete the Vagrantfile once
you no longer want to use the test-drive environment.
Learn more
If you have not already, review the Dynamic Application Sizing Concepts tutorial for more information about the individual parameters and available strategies.
You can also find more information in the Nomad Autoscaler Scaling Policies documentation, including how you can further customize the application-sizing block to your needs (percentile, cooldown periods, and sizing strategies).