Databricks Delta Connector > Databricks Delta connections > Databricks Delta connection properties
  

Databricks Delta connection properties

When you create a Databricks Delta connection, you must configure the connection properties, such as the connection name, type, and runtime environment.
The following table describes the Databricks Delta connection properties:
Property
Description
Connection Name
The name of the connection. The name is not case sensitive and must be unique within the domain.
You can change this property after you create the connection. The name cannot exceed 128 characters, contain spaces, or contain the following special characters: ~ ` ! $ % ^ & * ( ) - + = { [ } ] | \ : ; " ' < , > . ? /
Description
Optional. Description of the connection.
The description cannot exceed 4,000 characters.
Type
Select Databricks Delta.
Runtime Environment
Name of the runtime environment that contains the Secure Agent to connect to Databricks Delta.
Databricks Host
Required. The host name of the end-point the Databricks account belongs to.
Example:
if the JDBC URL for the cluster is: example_\\jdbc:spark://dbc-40d21191-cf1b.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/5275227018426481/1015-063325-loge501;AuthMech=3;UID=token;PWD=<personal-access-token>
The host name is dbc-40d21191-cf1b.cloud.databricks.com
Note: You can get the URL from the Databricks Delta analytics cluster or all purpose cluster -> Advanced Options ->JDBC / ODBC.
Org Id
Required. The unique organization ID for the workspace in Databricks.
For example, if the JDBC URL for the cluster is: example_\\jdbc:spark://dbc-40d21191-cf1b.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/5275227018426481/1015-063325-loge501;AuthMech=3;UID=token;PWD=<personal-access-token>
The organization ID is 5275227018426481.
Cluster ID
Required. The ID of the Databricks cluster. You can obtain the cluster ID from the JDBC URL.
For example, if the JDBC URL for your cluster is: example_\\jdbc:spark://dbc-40d21191-cf1b.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/5275227018426481/1015-063325-loge501;AuthMech=3;UID=token;PWD=<personal-access-token>
The cluster ID is 1015-063325-loge501.
Databricks Token
Required. Personal access token to access Databricks.
You must have permissions to attach to the cluster identified in the Cluster ID property.
For mappings, you must have additional permissions to create data engineering clusters.
Database
Optional. The database in Databricks Delta that you want to connect to.
For Data Integration, by default, all databases available in the workspace are listed.
JDBC Driver Class Name
Required. The name of the JDBC driver class.
You must specify the driver class name as com.simba.spark.jdbc.Driver for mappings and elastic mappings.
Cluster Environment
The cloud provider where the Databricks cluster is deployed.
You can select from the following options:
  • - AWS
  • - Azure
Default is AWS.
The connection attributes differ depending on the cluster environment you select. For more information, see the AWS and Azure cluster environment sections.
Min Workers
The minimum number of worker nodes to be used for the Spark job.
Mandatory for mappings and minimum value is 1.
Not applicable to elastic mappings.
Max Workers
Optional. The maximum number of worker nodes to be used for the Spark job.
If you do not want autoscale, set Max Workers = Min Workers or do not set Max Workers.
Not applicable to elastic mappings.
DB Runtime Version
The Databricks runtime version.
Select 7.3 LTS from the list.
Not applicable to elastic mappings.
Worker Node Type
Required. The instance type of the machine used for the Spark worker node.
Not applicable to elastic mappings.
Driver Node Type
Optional. The instance type of the machine used for the Spark driver node. If not provided, the value as in worker node type is used.
Not applicable to elastic mappings.
Instance Pool ID
Optional. The instance pool used for the Spark cluster.
If you specify the Instance Pool ID to run mappings, the following connection properties are ignored:
  • - Driver Node Type
  • - EBS Volume Count
  • - EBS Volume Type
  • - EBS Volume Size
  • - Enable Elastic Disk
  • - Worker Node Type
  • - Zone ID
Not applicable to elastic mappings.
Enable Elastic Disk
Optional. Enable this option for the cluster to dynamically acquire additional disk space when the Spark workers are running low on disk space.
Not applicable to elastic mappings.
Spark Configuration
Optional. The Spark configuration to be used in the Databricks cluster. The configuration must be in the following format:
"key1"="value1";"key2"="value2";....
For example:
"spark.executor.userClassPathFirst"="False"
Not applicable to elastic mappings.
Spark Environment Variables
Optional. The environment variables that you need to export before launching the Spark driver and workers. The variables must be in the following format:
"key1"="value1";"key2"="value2";....
For example:
"MY_ENVIRONMENT_VARIABLE"="true"
Not applicable to elastic mappings.
AWS cluster environment
The following table describes the Databricks Delta connection properties applicable when you select the AWS cluster environment:
Property
Description
S3 Authentication Mode
Required. The authentication mode to access Amazon S3.
Default is Permanent IAM credentials.
S3 Access Key
Required. The key to access the Amazon S3 bucket.
S3 Secret Key
Required. The secret key to access the Amazon S3 bucket.
S3 Data Bucket
Required. The existing bucket where the Databricks Delta data is stored.
S3 Staging Bucket
Required. The existing bucket to store staging files.
Not applicable to elastic mappings.
S3 Service Regional Endpoint
Optional. Use the S3 regional endpoint when the S3 data bucket and the S3 staging bucket need to be accessed through a region-specific S3 regional endpoint.
Default is s3.amazonaws.com to access the data bucket and staging bucket.
Zone ID
Required only if, at runtime, you want to create a Databricks job cluster in a particular zone.
for example, us-west-2a.
Note: The zone must be in the same region where your Databricks account resides.
Not applicable to elastic mappings.
EBS Volume Type
Optional. The type of EBS volumes launched with the cluster.
Not applicable to elastic mappings.
EBS Volume Count
Optional. The number of EBS volumes launched for each instance. You can choose up to 10 volumes.
Note: In a Databricks Delta connection, you must specify, at least one EBS volume for node types with no instance store, otherwise, cluster creation fails.
Not applicable to elastic mappings.
EBS Volume Size
Optional. The size of a single EBS volume in GiB launched for an instance.
Not applicable to elastic mappings.
Azure cluster environment
The following table describes the Databricks Delta connection properties applicable when you select the Azure cluster environment:
Property
Description
ADLS Storage Account Name
Required. The name of the Microsoft Azure Data Lake Storage account.
ADLS Client ID
Required. The ID of your application to complete the OAuth Authentication in the Active Directory.
ADLS Client Secret
Required. The client secret key to complete the OAuth Authentication in the Active Directory.
ADLS Tenant ID
Required. The ID of the Microsoft Azure Data Lake Storage directory that you use to read data or write data.
ADLS Endpoint
Required. The OAuth 2.0 token endpoint from where authentication based on the client ID and client secret is completed.
ADLS Data Filesystem Name
Required. The name of an existing file system where the Databricks Delta data is stored.
ADLS Staging Filesystem Name
Required. The name of an existing file system where the staging data is stored.
Not applicable to elastic mappings.
The following properties are required to launch the job cluster at run time for a mapping task: