7 Replies Latest reply on Sep 21, 2021 8:08 PM by Thiru S

    Azure SQL Data Warehouse Resources & Connections

    Rose Massela New Member

      What is the best practice for setting up a Azure SQL Data Warehouse Resources and Connection(s) when we want to capture the relationship between the assets contained within a Data Warehouse or Database?

       

      Is the best practice to create 1 ASDW Connection in the Admin Console at the Database level, and then create1 Resource per schema in Catalog Admin, and then use the 1 ASDW Connection created in the Admin Console as the Source Connection Name for each of the Resources?

       

      In other words, should we create 1 Connection and use it across multiple Resources with each Resource containing 1 schema or possibly multiple schemas?

       

      I don't know if the Database structure plays a role in this, but just in case it does, our Azure SQL Data Warehouse is structured as follows:

      • Database
        • => Tables
          • => Schema.tablename
            • Columns
            • Constraints
            • Indexes
            • Statistics

       

      The Schema is listed multiple times followed by dot TableName for each table associated with that schema.  For example under Table Names would display as:

      • SchemaABC.AlphabetBASE
        • Columns
        • Constraints
        • Indexes
        • Statistics
      • SchemaABC.LettersBASE
        • Columns
        • Constraints
        • Indexes
        • Statistics
      • SchemaABC.EnglishBASE
        • Columns
        • Constraints
        • Indexes
        • Statistics

       

      Whereas I believe the more traditional structure of a SQL DB would be:

      • Database
        • => Schema
          • => Tables
            • Columns
            • Constraints
            • Indexes
            • Statistics

      Where the Schema it listed once, and when you expand the schema, there are multiple TableNames listed below it, so the structure would display as:

      • SchemaABC
        • AlphabetBASE
          • Columns
          • Constraints
          • Indexes
          • Statistics
        • LettersBASE
          • Columns
          • Constraints
          • Indexes
          • Statistics
        • EnglishBASE
          • Columns
          • Constraints
          • Indexes
          • Statistics

       

      Does the DB structure affect the way Resources and Connections should be setup?

       

      Any assistance or insight you can provide would be much appreciated.

        • 1. Re: Azure SQL Data Warehouse Resources & Connections
          Darren Wrigley Guru

          the structure/model for Azure SQL DW is the same as other databases.  the Model is Database>Schema>Table|View>Column

          • 2. Re: Azure SQL Data Warehouse Resources & Connections
            Rose Massela New Member

            Actually, the connection properties SQL DW and Azure SQL DW are not the same.

            In the SQL connection properties the schema is optional, but with the Azure SQL DW connection properties the schema value is required.  Connection properties for SQL in Admin Console Azure SQL DW Connection properties

            The question is, given that the schema is required for the Azure SQL DW connection, what value goes in this field?  Is it best practice to create 1 connection and 1 resource per schema?

            • 3. Re: Azure SQL Data Warehouse Resources & Connections
              Darren Wrigley Guru

              for profiling connections - you do need to specify the schema - so you need to create 1 connection per schema.  for the EDC scanner, you can scan all schemas in a database, but if you enable profiling, then you really need to create 1 resource per schema.

              hopefully that restriction for profiling will change at some point

              • 4. Re: Azure SQL Data Warehouse Resources & Connections
                Rose Massela New Member

                Thank you, Darren.

                 

                When you say that you can scan all schemas in a database, can that be done with 1 source connection and 1 resource?

                • 5. Re: Azure SQL Data Warehouse Resources & Connections
                  Thiru S Seasoned Veteran

                  Could you confirm on the EDC version being used?

                   

                  From the test results of 10.5.x version, it could be confirmed that using one Azure SQL DW connection in Domain, we would be able to scan & profile multiple schemas from Azure SQL DW.

                   

                  Even if we give one schema name in Azure SQL DW connection level, when the profiling jobs are submitted from EDC, it would be having proper schema name prefixed to the Table queries and hence, the execution would work without any errors.

                   

                   

                  E.g.

                   

                   

                   

                  Azure SQL DW Resource in 10.5.x EDC

                   

                   

                   

                   

                  Azure SQL DW connection with 'dbo' schema

                   

                   

                   

                  Profiling Queries snippet for tables from 'edc' & 'infa' schemas:

                   

                  -- edc schema

                   

                  2021-09-03 08:47:31.374 <TASK_140117904488192-READER_1_1_1> INFO: [APPSDK_Msg_1762] extn =SATableReadCapabilityAttributesExtension  (hashCode=1922189173) (ContainerName=gcslabcontainer) (StagingFileFormat=Delimited Text) (FileSystemName=) (FieldDelimiter=0x1e) (ConcurrentBlobConnections=4) (BlobPartSize=8) (PreSQL=) (PostSQL=) (SQLOverride=SELECT TOP 10000  [LINE_ITEM_NO], [QUANTITY], [DISCOUNT], [PRODUCT_CODE], [INVOICE_NO] FROM [edc].[LINE_ITEM]) (QuoteChar=0x1f) (OnPrePostSQLError=Continue) (SchemaName=) (TableName=)

                   

                  -- infa schema

                   

                   

                  2021-09-03 08:45:07.393 <TASK_140117904488192-READER_1_1_1> INFO: [APPSDK_Msg_1762] Successfully got the ASO Operation %%%%%SAD_ASOOperation, name=emp (hashCode=1924140905)

                  2021-09-03 08:45:07.569 <TASK_140117904488192-READER_1_1_1> INFO: [APPSDK_Msg_1762] extn =SATableReadCapabilityAttributesExtension  (hashCode=1827957849) (ContainerName=gcslabcontainer) (StagingFileFormat=Delimited Text) (FileSystemName=) (FieldDelimiter=0x1e) (ConcurrentBlobConnections=4) (BlobPartSize=8) (PreSQL=) (PostSQL=) (SQLOverride=SELECT TOP 10000  [deptno], [mgr], [comm], [empno], [hiredate], [job], [sal], [ename] FROM [infa].[emp]) (QuoteChar=0x1f) (OnPrePostSQLError=Continue) (SchemaName=) (TableName=)

                  • 6. Re: Azure SQL Data Warehouse Resources & Connections
                    Rose Massela New Member

                    Hello Thiru,

                     

                    This information is very helpful and much appreciated.  We are currently on version 10.4.1 and installing version 10.5.

                    Will this work in version 10.4.1?

                    What about profiling?   Is 1 connection and 1 resource per schema still required to run profiling against  Azure SQL DW? 

                    • 7. Re: Azure SQL Data Warehouse Resources & Connections
                      Thiru S Seasoned Veteran

                      Hi Rose,

                       

                         I have validated the requirement in the 10.4.1 version.

                       

                      Based on the results, it could be confirmed that profiling of multiple schema tables using one Azure DW connection, is not possible in 10.4.1 version.  For 10.4.1 version, it would be required to create 1 connection & resource for each Azure DW schema, when profiling has to be performed.

                       

                      To achieve the use case of profiling multiple schema tables in Azure DW using one connection, it is recommended to upgrade to Informatica 10.5.x version.