Tuesday, March 19, 2024
HomeTechnologyHow to Configure Aws Glue with Snowflake for Data Integration

How to Configure Aws Glue with Snowflake for Data Integration

Every data warehousing project must include the extraction, transformation, and load (ETL) process. Customers also benefit from the alternate extraction, load, and transformation (ELT) method, where data processing is pushed to the database, thanks to advancements in cloud data warehouse designs.

Overview 

With the variety of data integration tools available, choosing a single approach is difficult. Some people will take this opportunity and go with either hand coding or one tool. Still, many organizations elect for an even mixture involving manual labor and powerful software.

However, Developers have the freedom to construct using their favorite languages while keeping a significant level of control over integration procedures and structures. The difficulty is that hand-coding solutions are frequently more complex and expensive to maintain. 

They can now quickly design and manage their data preparation and loading processes with generated code that is flexible, reusable, and portable without having to acquire, set up, or maintain infrastructure, thanks to AWS Glue.

Here is how you can use AWS Glue with Snowflake, along with the guide for how processing is optimized through query pushdown for ELT.

In this guide, you will learn – how to configure AWS Glue with Snowflake to enable data integration, how to set up the necessary components, and create a simple ETL job to extract, transform, and load data from one database to another. 

Some of the things we will answer in this article are – 

  • Why AWS Glue with Snowflake?
  • How to set up AWS Glue with Snowflake for Data Integration?
  • How to create a custom connector?
  • How to create a connection?
  • How to create a Job?

Why AWS Glue with Snowflake?

Snowflake customers can now enjoy easy and intuitive data integration processes without worrying about servers, Spark clusters, or the ongoing maintenance associated with these systems.

AWS Glue provides a fully-managed environment that integrates seamlessly into Snowflakes’ Data Warehouse as a Service. These technologies assist one can now manage their data intake and transformation pipelines with greater ease and flexibility. 

People who use AWS Glue and Snowflake gain access to the query pushdown feature of Snowflake, which automatically pushes Spark workloads that have been converted to SQL into Snowflake. 

They don’t need to worry about improving Spark performance; they can concentrate on building their code and instrumenting their pipelines. They may benefit from low-quality ELT processing, which is simple to use and easy to maintain, with the help of AWS Glue and Snowflake.

Prerequisites:

  • The Latest Snowflake Spark Connector
  • The Latest Snowflake JDBC Driver
  • S3 bucket within the same region as AWS Glue

For Candidates who want to advance their career, AWS Data Warehousing Training is the best option

How to set up AWS Glue with Snowflake for Data Integration?

  1. Connect to AWS.
  2. Look for the S3 link and click it. 
  • Make a folder and S3 bucket
  • Include the JDBC .jar and Spark Connector files in the folder.
  • Make a second folder in the same bucket that will be used as the temporary directory for Glue in subsequent steps.
  1. Switch to the AWS Glue Service.
  2. In the ETL section of the left side, select Jobs.
  3. To add a job, choose the Spark script editor option, click Create, and then pick the Job Details tab 
  • Assign a name to the position.
  • Decide on an IAM role. If an IAM role doesn’t already exist, make one, and include all of the Glue policies in it.
  • Choose the Spark kind.
  • Choose the Glue edition.
  • Make Python 3 your language of choice.
  • To enlarge that area, click on advanced properties.
  • Assign a name to the script.
  • Choose the temporary directory you created in step 2c 
  • In the path of the Dependent jar, add entries for both .jar files under Libraries. 

Note: You must add the full path to the actual .jars files. 

Example: s3://[bucket_name]/GlueJars/spark-snowflake_22.12-2.10.0-spark_3.1.jar,s3://[bucket_name]/GlueJars/snowflake-jdbc-3.13.14.jar

To set up Snowflake under Job parameters, enter the following information with your account. Make sure to include two dashes before each key.

Note: Storing your login credentials and account information in this manner exposes them to anyone with access to this particular job. This might be helpful for testing, but it’s advised that you preserve your credentials safely, as described in the “Store credentials safely” section.

How to create a custom connector?

To implement Data Integration using AWS Glue with Snowflake, you first need to create a custom connector. 

  1. Select Create custom connector from the Connectors drop-down menu on the AWS Glue Studio console.
  2. Enter the S3 location where you uploaded the Snowflake JDBC connector JAR file for the Connector S3 URL field.
  3. Enter a name in the field marked “Name.” Example – snowflake-jdbc-connector 
  4. Select JDBC as the connector type.
  5. Under Class Name, enter Snowflake JDBC driver class name. Example 

 net.snowflake.client.jdbc.SnowflakeDriver

  1. Enter the following URL for JDBC URL base – (Give your account)

jdbc:snowflake://<snowflake account info>/?user=${Username}&password=${Password}&warehouse=${warehouse}.

  1. Enter & as the URL parameter delimiter and choose to Create connection.

How to create a connection?

Follow these steps to create a JDBC connection to Snowflake:

  1. Select the connector on the Connectors page.
  2. Select Establish link.
  3. In place of “Name,” enter a Name such as snowflake-glue-jdbc-connection.
  4. Enter a descriptive text that clarifies the link under Description.
  5. Select default for the JDBC URL format.
    To keep your encrypted credentials, you may either input a username and password or use Secrets Manager.
  6. Choose to Use a secret under Data source credentials for this post.
  7. Select your secret for Secret, i.e., data security.
  8. To perform a SQL statement in Snowflake, one must enter the following parameters under Additional URL parameters:
  • Virtual Snowflake warehouse to use while doing the query. Replace {warehouse} with a valid value.
  • The name of the Snowflake database is DB.
  • Schema – The schema for the Snowflake database.
  1. Make sure the JDBC URL is properly formatted.

How to Create a Job?

Mentioned below are the steps to be followed to create a job:

  1. Choose your connection on the Connectors page.
  2. Select Create job.
  3. 3. Enter a name in the field marked “Name.”
  4. Enter a thorough work description under Description.
  5. Choose the IAM role that can access the source location from where the Snowflake JDBC JAR file is loaded and the target S3 location where the job will be written to launch the AWS Glue task.
  6. Use the default values for Type, Glue version, Language, Worker type, Worker count, Worker retries, and Job timeout.
  7. Disable the Job bookmark option.
  8. Save the job.
  9. Select the table or query to read from Snowflake by going to the Data Source properties-connector tab on the Visual tab.
  10. Select Save.
  11. To add a new S3 node for the destination, select the Add icon on the Visual tab.
  12. Pay great attention to selecting the node as the Target node on the Node properties tab.
  13. Set the S3 bucket location where AWS Glue will write the results on the Data target properties tab.
  14. To transfer the name of the Snowflake column to the destination column, add an Apply Mapping transformation.
  15. Save your preferences.
  16. Look at the AWS Glue-generated script under the Script tab for confirmation.
  17. Run the function and ensure that the table data is correctly placed in the designated S3 bucket location by performing a verification check.

CONCLUSION

You can easily start and manage your programmatic data integration activities using AWS Glue and Snowflake. AWS Glue can be used independently or with a data integration solution.

This method, which uses native query pushdown through the Snowflake Spark connector, optimizes efficiency and cost for real ELT processing. People who use AWS Glue with Snowflake have a fully managed and optimized platform to meet various data integration needs.

We hope this article helped you understand the basics of how to set up AWS Glue with Snowflake for data integration.

Raju Meravath
Raju Meravath
Meravath Raju is a Digital Marketer, and a passionate writer, who is working with MindMajix, a top global online training provider. He also holds in-depth knowledge of IT and demanding technologies such as Business Intelligence, Salesforce, Cybersecurity, Software Testing, QA, Data analytics, Project Management and ERP tools, etc.

Most Popular