Our second Informatica Cloud #TechTuesdays session, "Accelerating Big Data Initiatives through Cloud Integration", shone a light upon the growing use of Big Data for datawarehousing projects. We cleared the confusion around many of the flavors of Big Data out there, such as Hadoop, and focused the discussion on Big Data providers in the cloud, such as Amazon RedShift.

 

Big Data has several use cases such as data warehousing, predictive analytics, machine data, and OLTP, and we decided to tackle the data warehousing use case. When looking at the industries that had the fastest adoption of Big Data, we found out, not suprisingly, that the banking, media, and government industry verticals led the way (Source: Forbes, Gartner).

 

http://blogs-images.forbes.com/louiscolumbus/files/2012/08/big-data-heat-map-by-industry.jpg

 

During the session, we discussed that the main drivers behind moving to cloud-based Big Data for data warehousing projects was because of the speed with which you could provision multiple database nodes. Other benefits involved saving on costs of provisioning multiple on-premise databases, as well as the ability to start petabyte-scale data warehousing projects a lot sooner. The demo itself touched on the following aspects of using Amazon RedShift:

 

  • Configuring RedShift for first-time use by downloading SQL Workbench
  • Ensuring that security groups were set up correctly
  • Writing to Amazon RedShift using the Informatica Cloud connector
  • Reading from Amazon RedShift using ODBC

 

Here are the slides from the session, and the entire recording featuring Vijay Narayanan, who takes care of product management for a lot of our new connectors.

 

Session 3, which focused on SAP integration just got over this morning and we'll have the slides and recordings up soon.