Data Integration Elastic Administration > Data Integration Elastic on AWS > Security
  

Security

Different forms of security protect the data that is used for processing, including access to the account and the data that you hold with your cloud provider.

AWS identities

To perform cluster operations, identities on AWS take on responsibilities differently based on the level and the type of security that you have.
You can use the following identities in cluster operations:
kops role
The kops role is an IAM role that contains the elevated permissions required for cluster management.
The Secure Agent assumes the kops role through the Secure Agent role and uses the kops role to make requests to services on AWS.
For a list of the required permissions, see Create IAM roles.
Secure Agent role
The Secure Agent role is an IAM role that passes role information to the Secure Agent machine.
The Secure Agent role must be able to access the log location. If you configure role-based security for direct access to Amazon data sources, the Secure Agent role must also have access to the sources and targets that are used in jobs.
Master and worker roles
The master and worker roles are IAM roles that are used to pass permissions to the master and worker nodes in an elastic cluster.
The nodes use the role permissions to run the Spark applications that process the data in an elastic job. If you use credential-based security for direct access to Amazon data sources, the nodes also use the connection-level AWS credentials to access resources in the job.
An AWS administrator can create user-defined master and worker roles, or the Secure Agent can automatically create default master and worker roles. If an administrator creates user-defined roles, the master and worker roles, as well as the kops role, must exist in the same AWS account.
If you use default master and worker roles in addition to role-based security for direct access to Amazon data sources, the Secure Agent identifies the policies that are attached to the Secure Agent role and passes the policies to the worker role.
To view the permissions that the master and worker roles require, run the generate-policies-for-userdefined-roles.sh command. For more information, see generate-policies-for-userdefined-roles.sh.

Direct and indirect access to resources

During data processing and cluster operations, resources are accessed differently based on whether the connectors have direct access to Amazon data sources.

With direct access

A connector with direct access to Amazon data sources is a connector that uses connection-level AWS credentials to access data.
For example, Amazon S3 V2 Connector and Amazon Redshift V2 Connector require you to provide access and secret keys for your Amazon account. The connectors use the keys to access data on S3 or Redshift.
If a connector has direct access to Amazon data sources, you can define either of the following security types:
Credential-based security
Credential-based security reuses the connection-level AWS credentials that access data sources and uses the same credentials to access staging and log locations.
When you design an elastic mapping or preview data, the Secure Agent reads the credentials to access the data sources.
When you run an elastic job, the nodes in the elastic cluster read the credentials to access resources, including Amazon data sources and the staging location. The worker role is used to access the log location.
Credential-based security overrides role-based security. If any source or target in the job provides AWS credentials, the credentials are reused to access the staging location. For example, if a job uses a JDBC V2 source and an Amazon S3 V2 target, the AWS credentials that are used to access to the S3 target are reused to access the staging location for the job.
Role-based security
Role-based security uses AWS roles to access Amazon data sources, the staging location, and the log location.
When you design an elastic mapping or preview data, the Secure Agent role is used to access the data sources.
When you run an elastic job, the worker role is used to access the data sources, the staging location, and the log location.
For information about configuring credential-based security and role-based security, see Define security for direct access to Amazon data sources.

Without direct access

Connectors without direct access to Amazon data sources do not require AWS credentials or do not connect to AWS. Instead, these connectors use other connection properties to access data sources.
For example, JDBC V2 Connector uses a driver to query data on Amazon Aurora and does not directly access the underlying data. To access the data, the connector requires that you provide the user name and password for an Aurora database in the connection properties.
When you run an elastic job without connectors that have direct access to Amazon data sources, the worker role is used to access staging and log locations.

Data encryption

Encryption protects the data that is used to process elastic jobs. You can use encryption to protect data at rest, temporary data, and data in transit.
Encryption is available for the following types of data:
Data at rest
You can use the server-side encryption options on Amazon S3 to encrypt the following data at rest:
For more information about encrypting staging data and log files, see Encrypt staging data and log files at rest.
For information about encrypting source and target data, see the help for the appropriate connector in the Data Integration help.
Note: If you configure an encryption-related custom property in an Amazon S3 V2 connection, the Spark engine uses the same custom property to read and write staging data.
Temporary data
Temporary data includes cache data and shuffle data that the Serverless Spark engine generates on cluster nodes.
To encrypt temporary data, enable encryption in the elastic configuration. If you enable encryption, temporary data is encrypted using the HMAC-SHA1 algorithm by default. To use a different algorithm, contact Informatica Global Customer Support.
Data in transit
By default, data in transit to and from Amazon S3, including staging data and log files, is encrypted using the Transport Layer Security (TLS) protocol.