Hive Connector > Hive connections > Hive connection properties
  

Hive connection properties

To use Hive Connector in a mapping task, you must create a connection in Data Integration.
When you set up a Hive connection, you must configure the connection properties.
The following table describes the Hive connection properties:
Connection property
Description
Authentication Type
You can select one of the following authentication types:
  • - Kerberos. Select Kerberos for a Kerberos cluster.
  • - LDAP. Select LDAP for an LDAP-enabled cluster.
  • Note: LDAP is not applicable for elastic mappings.
  • - None. Select None for a cluster that is not secure or not LDAP-enabled.
JDBC URL *
The JDBC URL to connect to Hive.
Use the following format: jdbc:hive2://<server>:<port>/<database name>
To access Hive on a Hadoop cluster enabled for TLS, specify the details in the JDBC URL in the following format:
jdbc:hive2://<host>:<port>/<database name>;ssl=true;sslTrustStore=<TrustStore_path>;trustStorePassword=<TrustStore_password>,
where the truststore path is the directory path of the truststore file that contains the TLS certificate on the agent machine.
JDBC Driver *
The JDBC driver class to connect to Hive.
Username
The user name to connect to Hive in LDAP or None mode.
Password
The password to connect to Hive in LDAP or None mode.
Principal Name
The principal name to connect to Hive through Kerberos authentication.
Impersonation Name
The user name of the user that the Secure Agent impersonates to run mappings on a Hadoop cluster. You can configure user impersonation to enable different users to run mappings or connect to Hive. The impersonation name is required for the Hadoop connection if the cluster uses Kerberos authentication.
Keytab Location
The path and file name to the Keytab file for Kerberos login.
Configuration Files Path *
The directory that contains the Hadoop configuration files for the client.
Copy the site.xml files from the Hadoop cluster and add them to a folder in the Linux box. Specify the path in this field before you use the connection in a mapping or elastic mapping to access Hive on a Hadoop cluster:
  • - For mappings, you require the core-site.xml, hdfs-site.xml, and hive-site.xml files.
  • - For elastic mappings, you require the core-site.xml, hdfs-site.xml, hive-site.xml, mapred-site.xml, and yarn-site.xml files.
DFS URI *
The URI to access the Distributed File System (DFS), such as Amazon S3, Microsoft Azure Data Lake Storage, Google Cloud Storage, and HDFS.
Note: For elastic mappings, Azure Data Lake Storage Gen2 is supported on the Azure HDinsight cluster. Google Cloud Storage is not supported.
Based on the DFS you want to access, specify the required storage and bucket name.
For example, for HDFS, refer to the value of the fs.defaultFS property in the core-site.xml file of the cluster and enter the same value in the DFS URI field.
DFS Staging Directory
The staging directory in the cluster where the Secure Agent stages the data. You must have full permissions for the DFS staging directory.
Hive Staging Database
The Hive database where external or temporary tables are created. You must have full permissions for the Hive staging database.
Additional Properties
The additional properties required to access the DFS.
Configure the property as follows:
<DFS property name>=<value>;<DFS property name>=<value>
For example:
To access the Amazon S3 file system, specify the access key, secret key, and the Amazon S3 property name, each separated by a semicolon:
fs.s3a.<bucket_name>.access.key=<access key value>;
fs.s3a.<bucket_name>.secret.key=<secret key value>;
fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem;
To access the Azure Data Lake Storage Gen2 file system, specify the authentication type, authentication provider, client ID, client secret, and the client endpoint, each separated with a semicolon:
fs.azure.account.auth.type=<Authentication type>;
fs.azure.account.oauth.provider.type=<Authentication_provider>;
fs.azure.account.oauth2.client.id=<Client_ID>;
fs.azure.account.oauth2.client.secret=<Client-secret>;
fs.azure.account.oauth2.client.endpoint=<ADLS Gen2 endpoint>
* These fields are mandatory parameters.

Accessing multiple storage systems

Create an Hive connection to read data from or write data to Hive.
Use the DFS URI property in the connection parameters to connect to various storage systems. The following table lists the storage system and the DFS URI format for the storage system:
Storage
DFS URI Format
HDFS
hdfs://<namenode>:<port>
where:
  • - <namenode> is the host name or IP address of the NameNode.
  • - <port> is the port that the NameNode listens for remote procedure calls (RPC).
hdfs://<nameservice> in case of NameNode high availability.
WASB in HDInsight
wasb://<container_name>@<account_name>.blob.core.windows.net/<path>
where:
  • - <container_name> identifies a specific Azure Storage Blob container.
  • Note: <container_name> is optional.
  • - <account_name> identifies the Azure Storage Blob object.
Note: Not applicable for a Hive connection in elastic mappings.
Amazon S3
s3a://home
ADLS in HDInsight
adl://home
Note: Not applicable for a Hive connection in elastic mappings.
Azure Data Lake Gen2 in HDInsight
abfss://<container name>@<storage name>.dfs.core.windows.net
where:
  • - <container_name> identifies a specific Azure Data Lake Gen2 container.
  • - <storage name> identifies the Azure Data Lake Gen2 storage account name.
Note: Applicable for a Hive connection only in elastic mappings.
When you create a cluster configuration from an Azure HDInsight cluster, the cluster configuration uses either ADLS or WASB as the primary storage. You can edit the DFS URI property in the Hive connection to connect to a local Hive location.

JDBC URL format

Hive Connector connects to the HiveServer2 component of Hadoop with JDBC.
Hive Connector uses the following JDBC URL format:
jdbc:hive2://<server>:<port>/<database name>
The following parameters describe the JDBC URL format:
For example, jdbc:hive2://invrlx63iso7:10000/default connects to the database of Hive and uses a Hive Thrift server HiveServer2 that runs on the server invrlx63iso7 on port 10000.
Hive Connector uses the Hive thrift server to communicate with Hive.