This blog post shows how to call webservice in BDM using Spark.
We will be using the python transformation that’s introduced in BDM 10.2.1 to call the web-service.
Java tx is another option to call the webservice.
Python and jep package need to be installed on BDM DIS server, refer to the install documentation to configure python transformation with BDM.
Post python Installation, edit the Hadoop connection by going to window --> preferences --> connections --> click on your hadoop connection
Edit the hadoop connection and go to spark tab
Under the spark tab , advanced properties --> click the Edit button. The first 3 properties in the screenshot are the python properties which come by default and we will put in the values for those 3 properties. Change values as per your python installation.
We will be using the following webservice to get the states for any given country
For example if we pass the country “USA” to the above url, the webservice will return all the state information within USA along with other details like area,capital, largest city etc.
To test the webservice for USA open the following URL in your browser and it will return json output.
Calling the web-service in BDM Mapping
We will create 2 mappings in BDM
In the first mapping we we will pass the country names from an input file, then use the python tx to call the web-service and finally write the output to a HDFS file. The output will be a json file.
In the second mapping we will parse the json output from the above mapping and write to Hive.
We have an input file is on hdfs with the following contents. We will pass the country names from this input file and get the states.
Create a flat file object file in developer client, in the advanced properties go to the read section and point to the HDFS connection and directory.
Create a new mapping and drag the flatfile object in the mapping and choose the read operation.
Add a python transformation to the mapping and drag the country_name to input of python tx.
Create an output port for python tx and add call it states_data_json. The python tx ports should look like below.
Go to the python tab of the python tx and add the following code
input_string = country_name
input_url = "http://services.groupkt.com/state/get/"
states = requests.get(input_url + input_string + "/all")
states_data = states.json()
states_data_json = json.dumps(states_data)
Connect the output port of python transformation to a flatfile data object writing to hdfs.
The target data object properties look like below
Change the execution mode of the mapping to run in spark
Execute the mapping and verify the status of the mapping in the admin console.
Verify the output of the mapping on hdfs and you will see the output in json format.
In this mapping we will parse the output file in the previous which is json making it structured.
Create a complex data object by right clicking on physical data objects -> New -> Physcial Data Object
Choose complex file data object and click Next
Name the complex file data objects as “cfr_states” and click on the browse button under connection and choose your hdfs connection and Under “selected resources” click on the Add button
In the Add resource, navigate to the hdfs file location (this is the output file location we gave in the previous mapping) and click on the json file and click OK
Click finish on the next step
Now create a dataprocessor transformation by right clicking on transformations -> New -> Transformation
Choose data processor transformation from the list of transformations
Name the data processor transformation as “dp_ws_state” and choose the “create a data processor using a wizard”
Since the input to the data processor transformation is coming as JSON , choose json In the next step and click next
Make sure you have sample output from the first mapping on the developer machine and choose the “sample json file” option and browse the sample json file and click next
Choose relational output and click finish
After you click on the finish button the data processor transformation will look like below
Create a hive table using the following DDL in your target database and import the hive table as a relational data object into the developer client
CREATE TABLE infa_pushdown.ws_states (
Now drag the compex file reader, the data processor transformation and the Hive target into the mapping. The connect the data port from CFR to the input of data processor and the output of data processor to Hive target.
The final mapping should look like below.
The mapping is tested in BDM 10.2.1 and in this version data processor is not supported in spark mode so we will run the second mapping using Blaze. Once data processor support is added in spark the second mapping can be eliminated by adding data processor transformation in the first mapping. Screen shot showing blaze as the execution engine.
Execute the mapping and verify the output of the target table by running the data viewer on target data object