The option to connect to Google BigQuery was introduced from SAP Data Services 4.2 SP4. The Google BigQuery application datastore allows SAP Data Services to access your Google projects on your behalf and to load data from Data Services into your Google project tables for Google BigQuery analysis.
In the next version, SAP Data Services 4.2 SP6, you can use the application datastore to both extract data from and load data to your Google tables.
From SAP Data Services 4.2 SP7, Data Services has added support for Google Cloud Storage, so that you can upload and download files to Google Cloud Storage or local storage and load objects in Google Cloud Storage into Google BigQuery.
Moving on to version, SAP Data Services 4.2 SP8, The load_from_gcs_to_gbq function has been added to help you transfer data from a Google Cloud Storage into Google BigQuery tables. Also in SP08, you can create a Google BigQuery template table as a target in a data flow. When you execute the data flow, Data Services automatically creates the table in your Google account in the specified project and dataset.
For performance optimization, you could now use the new Google built-in function named gbq2file to optimize your software performance when you extract large volumes of data from Google BigQuery results to your local machine. This function exports results of a Google BigQuery to files in your Google Cloud Storage (GCS) and then transfers the data from GCS to a user-specified file on your local machine.
SAP Data Services 4.2 SP9 adds more performance optimization for Google BigQuery large data extraction and in the latest version as of writing, SAP Data Services 4.2 SP10, Data Services has added support for Date, Time, and Date time data types for Google BigQuery.
This evolution puts us at a point where using Data Services in conjunction with Google BigQuery is a really attractive proposition for large data workloads that require integration with on-premise corporate data workloads.
Big Data – Hadoop eco system & MongoDB
Connectivity to Hadoop Big Data was introduced in SAP Data Services 4.2. As we progress toward the latest versions, more connectivity options were introduced to a variety of Big Data Eco system tools like Hive, Impala and the core HDFS file exploration options as well.
Starting from SAP Data Services 4.2 SP2, Data Services supported only Apache HiveServer2 and Hive version 0.11 and higher. From SAP Data Services 4.2 SP3 onward, the option to preview the Hive table data was introduced. In the version SAP Data Services 4.2 SP4, MongoDB support was included. The MongoDB adapter allowed you to read data from MongoDB to other Data Services targets. After you create an adapter instance and a datastore, you can browse and import MongoDB entities, which are similar to database tables. Also, in this version, you could preview HDFS file data in Data Services for delimited and fixed width file types.
A lot has been introduced in regard to MongoDB and Hive in version SAP Data Services 4.2 SP5.
This release included the following MongoDB enhancements: LDAP authentication, Kerberos authentication, support for shared cluster connection and the ability to connect to MongoDB using SSL with or without a PEM file. Hive adapter datastore now supports the SQL function and transform. We could now push the JOIN operation to Hive (using the Data_Transfer transform). We could also use a Secure Socket Layer (SSL) connection when connecting to a Hive server. In the SAP Data Services 4.2 SP6 version, we can use MongoDB as a target in your data flow. A new MongoDB authentication type has been added to MongoDB adapter. You could also re-import a MongoDB schema into the Local Object Library. Starting with version SAP Data Services 4.2 SP9, Data Services connected to your Hadoop cluster in the cloud. In the latest version, SAP Data Services 4.2 SP10, you can now connect to Apache Impala using the Cloudera OBDC driver. You can use supported ODBC drivers (MapR, Hortonworks, Cloudera) to connect remotely to the Hive Server.
Support for Microsoft Azure DW (Data Warehouse) was introduced in SAP Data Services 4.2 SP09.
In this latest version of SAP Data Services 4.2 SP10, we can now create an Azure Data Lake Store file location object, to access data from your Azure Data Lake Store. We can use the data as a source or target.
Support for Amazon S3 (Simple Storage Service) started in version SAP Data Services 4.2 SP07.
Data Services now supports importing and exporting data to and from Amazon S3. You create a file location object that tells SAP Data Services where the file is located, making data transfer to S3 very easy. SAP Data Services 4.2 SP08 added connectivity to Amazon Redshift cluster database on Windows or Linux platforms by using the Amazon Redshift ODBC driver. After creating a Redshift database datastore, you can use Redshift tables as sources or targets in your data flows, preview data, create and import template tables, and load S3 data files into a Redshift table using the load_from_s3_to_redshift function, making RedShift a viable Cloud Data Warehouse for SAP Data Services customers. With the latest release SAP Data Services 4.2 SP10, Data Services now includes four server-side encryption methods for Amazon S3 data connectivity:
- Encryption Algorithm
- AWS KMS Key ID
- AWS KMS Encryption context
- Customer Key
You can choose the encryption method for use with Amazon S3 data any time you create a new Amazon S3 file location object or edit an existing S3 file location object. FYI, the default is no encryption!
As you can see, the evolution is ongoing, providing more and more functionality with a variety of Cloud and Big Data environments. It’s good to see SAP embracing these platforms and providing customers with choice, rather than only innovating on SAP platforms. I’m looking forward to what’s coming next!