Data Integration Elastic Administration > Data Integration Elastic on Microsoft Azure > Minimum resource requirements
  

Minimum resource requirements

To run elastic jobs successfully, you must provision a minimum amount of resources for the Secure Agent machine and elastic cluster components.

Secure Agent machine

The following table lists the minimum resource requirements on the Secure Agent machine:
Component
Minimum requirement
Cores per CPU
At least four
Memory
16 GB
Disk Space
100 GB

Worker nodes

The following table lists the default resource requirements on the worker nodes:
Component
Default memory requirement
Default CPU requirement
Kubernetes system
1 GB per worker node
0.5 CPU per worker node with an additional 0.5 CPU across the cluster
Spark shuffle service
2 GB per worker node
1 CPU per worker node
Spark driver
4 GB
0.75 CPU
Spark executor
6 GB, or 3 GB per Spark executor core
1.5 CPUs, or 0.75 CPU per Spark executor core
Based on the default resource requirements, the minimum resources that are needed to run one elastic job are 13 GB of memory and 4.25 CPUs.
The CPU options for instance types are 2, 4, 8, or 16. A cluster that has one worker node must use a worker instance type that has at least 32 GB of memory and 8 CPUs as the memory is proportional to the CPUs.
When worker nodes are added to the cluster, each worker node reserves an additional 3 GB of memory and 1.5 CPU for the Kubernetes system and the Spark shuffle service. Therefore, a cluster that has two worker nodes must use worker instance types that have at least 16 GB of memory and 4 CPUs.

Reconfiguring resource requirements

If you cannot provision enough resources to fulfill the default requirements, you can reconfigure some of the requirements.
You can reconfigure the requirements for the following components:
Spark shuffle service
If you disable the shuffle service, the Spark engine cannot use dynamic allocation. For more details, contact Informatica Global Customer Support.
Spark driver
To reconfigure the amount of memory for the Spark driver, use the Spark session property spark.driver.memory in the mapping task. To set the memory in terms of GB, use a value such as 2G. To set the memory in terms of MB, use a value such as 1500m.
For information about reconfiguring the CPU requirement for the Spark driver, contact Informatica Global Customer Support.
Spark executor
To reconfigure the amount of memory for the Spark executor, use the Spark session property spark.executor.memory in the mapping task. Similar to the memory value for the Spark driver, you can specify the memory in GB or MB.
You can also change the number of Spark executor cores using the Spark session property spark.executor.cores. By default, the number of cores is 2.
If you edit the number of cores, you change the number of Spark tasks that run concurrently. For example, two Spark tasks can run concurrently inside each Spark executor when you set spark.executor.cores=2.
For information about reconfiguring the CPU requirement for Spark executors, contact Informatica Global Customer Support.
Note: If you reduce the memory too low for the Spark driver and Spark executor, these components might encounter an OutOfMemoryException.
You cannot edit the resource requirements for the Kubernetes system. The resources are required to maintain a functional Kubernetes system.
For more information about the Spark session properties, see Tasks in the Data Integration help.

Resource requirements example

You have an elastic cluster with one worker node. The worker node has 16 GB of memory and 4 CPUs.
If you run an elastic job using the default requirements, the job fails. The Kubernetes system and the Spark shuffle service reserve 3 GB and 2 CPUs, so the cluster has a remaining 13 GB and 2 CPUs to run jobs. The job cannot run because the cluster requires 10 GB of memory and 2.25 CPUs to start the Spark driver and Spark executor.
If you cannot provision a larger instance type, you can reduce the CPU requirement by setting the following advanced session property in the mapping task:
When the number of Spark executor cores is 1, the Spark executor requires only 0.75 CPUs instead of 1.5 CPUs.
If you process a small amount of data, the Spark driver and executor require only a few hundred MB, so you might consider reducing the memory requirements for the driver and executor as well. You can reduce the requirements in the following way:
After you reconfigure the resource requirements, the cluster must have at least 5 GB of memory and 3.5 CPUs. One worker node with 16 GB and 4 CPUs fulfills the requirements to run the job successfully.