Data Integration Elastic Administration > Elastic configurations > Microsoft Azure properties
  

Microsoft Azure properties

To configure properties in an elastic configuration, click New Elastic Configuration or click the name of the configuration that you want to edit on the Elastic Clusters page.
The basic properties describe the elastic configuration and define the cloud platform to host the elastic cluster. To configure the cluster, configure the platform, advanced, and runtime properties.

Basic configuration

The following table describes the basic properties:
Property
Description
Name
Name of the elastic configuration.
Description
Description of the elastic configuration.
Runtime Environment
Runtime environment to associate with the elastic configuration. The runtime environment can contain only one Secure Agent. A runtime environment cannot be associated with more than one configuration.
Cloud Platform
Cloud platform that hosts the elastic cluster.
Select Microsoft Azure.

Platform configuration

The following table describes the platform properties:
Property
Description
Region
Region in which to create the cluster. Use the drop-down menu to view the regions that you can use.
Master Instance Type
Instance type to host the master node. Use the drop-down menu to view the instance types that you can use.
The list of available instance types is filtered based on the minimum number of resources that the cluster requires.
Worker Instance Type
Instance type to host the worker nodes. Use the drop-down menu to view the instance types that you can use.
The instance types that you can use depend on your Azure account.
For information to verify that the instance type that you select from the drop-down menu is supported on your account, refer to the Microsoft Azure documentation.
Number of Worker Nodes
Number of worker nodes in the cluster. Specify the minimum and maximum number of worker nodes.
Enable High Availability
Indicates whether the cluster is highly available. You can enable high availability only if the region has availability zones 1, 2, and 3. One master node is created in each availability zone.
Availability Zones
List of availability zones where cluster nodes are created. The list of availability zones is populated automatically based on the region.
If the region has availability zones 1, 2, and 3, worker nodes are created across the zones.
Azure Disk Size
Size of the Azure disk to attach to a worker node for temporary storage during data processing. The disk size scales between the minimum and maximum based on job requirements. The range must be between 50 GB and 16 TB.
By default, the minimum and maximum disk sizes are 100 GB.
Note: When the disk size scales down, the jobs that are currently running on the cluster might take longer to complete.
Cluster Shutdown
Cluster shutdown method. You can select one of the following cluster shutdown methods:
  • - Smart shutdown. The Secure Agent stops the cluster when no job is expected during the defined idle timeout, based on historical data.
  • - Idle timeout. The Secure Agent stops the cluster after the amount of idle time that you define.
Mapping Task Timeout
Amount of time to wait for a mapping task to complete before it is terminated. By default, a mapping task does not have a timeout.
If you specify a timeout, a value of at least 10 minutes is recommended. The timeout begins when the mapping task is submitted to the Secure Agent.
Resource Group (Storage)
Storage resource group that holds the staging and log storage accounts.
If you specify an initialization script path, the storage account that holds the init script must be part of the same resource group.
Staging Location
Location on Microsoft Azure Blob Storage for staging data. Use the format: wasb(s)://container@storageAccount.blob.endpointSuffix/folder.
If encryption is enabled on Blob Storage, specify the WASBS protocol. Otherwise, specify the WASB protocol.
Log Location
Location on Microsoft Azure Blob Storage to store logs that are generated when you run an elastic job. Use the format: wasb(s)://container@storageAccount.blob.endpointSuffix/folder.
If encryption is enabled on Blob Storage, specify the WASBS protocol. Otherwise, specify the WASB protocol.

Advanced configuration

The following table describes the advanced properties:
Property
Description
Resource Group (Cluster)
Cluster resource group that holds cluster resources. If you do not specify a resource group, the agent creates a resource group to populate with cluster resources.
Service Principal Client ID
Service principal that the agent uses to manage Azure resources.
Key Vault
Key vault that stores the service principal credentials.
Secret Name
Name of the secret that stores the service principal credentials.
VNet
Azure VNet in which to create the cluster. Use the format: resourceGroup/VNet. The VNet must be in the specified region.
If you do not specify a VNet, the agent creates a VNet on your Azure account based on the region that you select.
Subnet
Required when a VNet is specified. Subnet in which to create cluster nodes.
IP Address Range
CIDR block that specifies the IP address range that the cluster can use. The IP address range cannot overlap with the IP addresses of the subnets.
For example: 10.0.0.0/24
Initialization Script Path
Microsoft Azure Blob Storage file path of the initialization script to run on each cluster node when the node is created. Use the format: https://storageAccount.blob.core.windows.net/container/folder/file.sh. The script can reference other init scripts in the same folder.
The script must be a bash script.
Azure Tags
Tags on Microsoft Azure to apply to cluster nodes. Each tag has a key and a value.
You can list a maximum of 30 tags.
The Secure Agent also assigns default tags to cloud resources. The default tags do not contribute to the limit of 30 tags.
Note: Issues can occur when you override default tags. For more information, see Default tags for cloud resources.

Runtime configuration

The following table describes the runtime properties:
Property
Description
Encrypt Data
Indicates whether temporary data on the cluster is encrypted.
Note: Encrypting temporary data might slow down job performance.
Runtime Properties
Custom properties to customize the cluster and the jobs that run on the cluster.

Validating the configuration

You can validate the information needed to create or update an elastic configuration before you save the configuration properties.
The validation process performs the following validations:
When you use managed identity as a Secure Agent credential, you need to add the key ccs.azure.k8s.prevalidation.agent.clientid to the runtime property in the elastic configuration.

High availability

An elastic cluster can become highly available to eliminate a single point of failure when the master node goes down. If you enable high availability and one master node goes down, other master nodes will be available and jobs on the cluster can continue running.
When a cluster is highly available, watch out for job failures in the following scenarios:

Propagating tags to cloud resources

The Secure Agent propagates tags to cloud resources based on the Azure tags that you specify in an elastic configuration.
The agent propagates tags to the following resources:
If your enterprise follows a tagging policy, make sure to manually assign tags to other cloud resources.
Note: The Secure Agent propagates tags only to cloud resources that the agent creates. For example, if you create a VNet and specify the VNet in an elastic configuration, the agent does not propagate Azure tags to the VNet.

Default tags for cloud resources

In addition to the cloud platform tags that you specify in an elastic configuration, the Secure Agent assigns several default tags to cluster resources. Do not override the default tags.
The following table describes tags that the agent assigns to cluster resources:
Cloud platform tag
Description
infa:ccs:hostname
The host name of the Secure Agent machine that started the cluster.
If the Secure Agent machine stops unexpectedly and the Secure Agent restarts on a different machine, the host name is the original Secure Agent machine.
infa:k8scluster:configname
Name of the elastic configuration that is used to create the cluster.
infa:k8scluster:workdir
Staging directory that the cluster uses.
InfaInternalInitDone
Used internally.
KubernetesCluster
Identifies an elastic cluster.
Some default tags do not have a namespace and can conflict with the user-defined tags that you specify in an elastic configuration, such as KubernetesCluster. If you specify a user-defined tag with the same name, you might override the tag and issues can occur on the elastic cluster.

Initialization scripts

Cluster nodes can run an initialization script based on an init script path that you specify in an elastic configuration. Each node runs the script when the node is created, and the script can reference other init scripts.
You might want to run an init script to install additional software on the cluster. For example, your enterprise policy might require each cluster node to contain monitoring and anti-virus software to protect your data.
Consider the following guidelines when you create the init script:
The init script path must be in cloud storage. You can place the scripts in a unique path on the cloud storage system, or you can place the scripts in the staging location.

Initialization script failures

When an initialization script fails on a cluster node, it can have a significant impact on the elastic cluster. An init script failure can prevent the cluster from scaling up or cause the Secure Agent to terminate the cluster.
Note the impact that an init script failure can have in the following situations:
Failure during cluster creation
If the init script fails on any node during cluster creation, the Secure Agent terminates the cluster.
Resolve the issues with the init script before running a job to start the cluster again.
Failure during a scale up event
If the init script fails on a node that is added to the cluster during a scale up event, the node fails to start and the cluster fails to scale up. If the cluster attempts to scale up again and the node continues to fail to start, it adds to the number of accumulated node failures until the Secure Agent terminates the cluster.
Failure while recovering a master node
If you enable high availability in an AWS environment and the init script fails on a recovered master node, the node fails to start and contributes to the number of accumulated node failures over the cluster lifecycle.
Accumulated failures over the cluster lifecycle
During the cluster lifecycle, the Secure Agent tracks the number of accumulated node failures that occur due to an init script within a certain time frame. If the number of failures is too high, the agent terminates the cluster.
Find the log files for the nodes where the init script failed and use the log files to resolve the failures before running a job to start the cluster again.

Updating the runtime environment or the staging location

Update the runtime environment or the staging location in an elastic configuration based on the status of the Secure Agent and the elastic cluster.
To update the runtime environment or the staging location, perform one of the following tasks based on the status of the Secure Agent and the elastic cluster:
The Secure Agent and the elastic cluster are running.
If the agent and the cluster are running, complete the following tasks:
  1. 1. Update the runtime environment or the staging location in the elastic configuration.
  2. 2. Stop the cluster when you save the configuration.
The Secure Agent is unavailable or the elastic cluster cannot be reached.
If the agent is unavailable or the cluster cannot be reached, complete the following tasks:
  1. 1. Run the command to delete the cluster or make sure that all cluster resources are deleted by logging in to your account on the cloud platform. For information about commands, see Command reference.
  2. 2. Update the runtime environment or the staging location in the elastic configuration.
  3. 3. Disable the cluster when you save the configuration.
Note: If you update the runtime environment, the new Secure Agent will create a new elastic cluster with a different cluster ID.

Accessing a new staging location

If you plan to use a new staging location, the Secure Agent must be able to access the location before you update the location in the elastic configuration.
To use the new staging location, complete the following tasks:
  1. 1. Update the permissions of the managed identity that is assigned to the Secure Agent machine.
  2. 2. Edit the staging location in the elastic configuration.