This blog post shows how to call webservice in BDM using Spark.

We will be using the python transformation that’s introduced in BDM 10.2.1 to call the web-service.

 

Java tx is another option to call the webservice.

 

Pre-requisites

 

Python and jep package need to be installed on BDM DIS server, refer to the install documentation to configure python transformation with BDM.

 

Post python Installation, edit the Hadoop connection by going to window --> preferences --> connections --> click on your hadoop connection

 

 

 

Edit the hadoop connection and go to spark tab

 

 

 

Under the spark tab , advanced properties --> click the Edit button.  The first 3 properties in the screenshot are the python properties which come by default and we will put in the values for those 3 properties. Change values as per your python installation.

 

 

Web-service Details

 

 

We will be using the following webservice to get the states for any given country

 

http://services.groupkt.com/state/get

 

For example if we pass the country “USA” to the above url, the webservice will return all the state information within USA along with other details like area,capital, largest city etc.

 

To test the webservice for USA open the following URL in your browser and it will return json output.

 

http://services.groupkt.com/state/get/USA/all

 

 

 

Calling the web-service in BDM Mapping

 

 

We will create 2 mappings in BDM

 

In the first mapping we we will pass the country names from an input file,  then use the python tx to call the web-service and finally write the output to a HDFS file. The output will be a json file.

 

In the second mapping we will parse the json  output from the above mapping and write to Hive.

 

 

Mapping 1:

 

We have an input file is on hdfs with the following contents. We will pass the country names from this input file and get the states.

 

Create a flat file object file in developer client, in the advanced properties go to the read section and point to the HDFS connection and directory.

 

 

 

Create a new mapping and drag the flatfile object in the mapping and  choose the read operation.

 

 

 

Add a python transformation to the mapping and drag the country_name to input of python tx.

Create an output port for python tx and add call it states_data_json. The python tx ports should look like below.

 

 

Go to the python tab of the python tx and add the following code

 

 

import requests
import json

input_string = country_name
input_url =
"http://services.groupkt.com/state/get/"

states = requests.get(input_url + input_string + "/all")
states_data = states.json()
states_data_json = json.dumps(states_data)

 

 

Connect the output port of python transformation to a flatfile data  object writing to hdfs.

 

 

 

The target data object properties look like below

 

 

Change the execution mode of the mapping to run in spark

 

 

Execute the mapping and verify the status of the mapping in the admin console.

 

 

 

Verify the output of the mapping on hdfs and you will see the output in json format.

 

 

 

Mapping 2:

In this mapping we will parse the output file in the previous which is json making it structured.

 

Create a complex data object by right clicking on physical data objects -> New -> Physcial Data Object

 

 

Choose complex file data object and click Next

 

 

Name the complex file data objects as “cfr_states” and click on the browse button under connection and choose your hdfs connection and Under “selected resources” click on the Add button

 

 

In the Add resource, navigate to the hdfs file location (this is the output file location we gave in the previous mapping) and click on the json file and click OK

 

 

 

 

Click finish on the next step

 

 

Now create a dataprocessor transformation by right clicking on transformations -> New -> Transformation

 

 

Choose data processor transformation from the list of transformations

 

 

Name the data processor transformation as “dp_ws_state” and choose the “create a data processor using a wizard”

 

 

 

Since the input to the data processor transformation is coming as JSON , choose json In the next step and click next

 

 

 

Make sure you have sample output from the first mapping on the developer machine and choose the “sample json file” option and browse the sample json file and click next

 

 

 

Choose relational output and click finish

 

 

 

After you click on the finish button the data processor transformation will look like below

 

Create a hive table using the following DDL in your target database and import the hive table as a relational data object into the developer client

 

CREATE TABLE infa_pushdown.ws_states (

            FKey_states BIGINT,

            id DOUBLE,

            country STRING,

            name STRING,

            abbr STRING,

            area STRING,

            largest_city STRING,

            capital STRING

) ;

 

 

Now drag the compex file reader, the data processor transformation and the Hive target into the mapping. The connect the data port from CFR to the input of data processor and the output of data processor to Hive target.

 

The final mapping should look like below.

 

 

The mapping is tested in BDM 10.2.1 and in this version data processor is not supported in spark mode so we will run the second mapping using Blaze. Once data processor support is added in spark the second mapping can be eliminated by adding data processor transformation in the first mapping. Screen shot showing blaze as the execution engine.

 

 

Execute the mapping and verify the output of the target table by running the data viewer on target data object