Hot

6/recent/ticker-posts

What happens to Data on the fly when migrating from SQL to BigQuery using StitchData

If you happen to come across my previous article about creating a data warehouse in BigQuery, you came to know about the third party SaaS named Stitch. 


So I’ve ended the last post, hoping to write a brand new one about this particular topic.


Stitch Dashboard

So, as you can see, the above screenshot will summarise our job. Now, We want Stitch to know that the destination of the process will be Google BigQuery. Click on Destination tab on top left corner select Google BigQuery as the destination.

Selecting destination

Make sure you have the following:

  • Admin permissions for BigQuery and Google Cloud Storage.
  • Stitch requires that your user have BigQuery Admin and Storage Admin permissions.
  • Billing enabled, with a credit card attached.
  • Even if you’re using BigQuery’s free trial, billing must still be enabled for Stitch to load data.
  • Access to an existing Google Cloud Platform project.
  • Stitch won’t be able to create a project for you.
  • To continue, sign in with your Google account and select a project.

After you’ve logged in, select project name and location for your dataset.

Select Project Name and Location

Click here to learn more about Cloud Storage Location details.

Click here to read more about connecting Google BigQuery to Stitch.

Ok so enough talk on the destination and let’s move on to out Topic here.

Buckets

The Buckets resource represents a bucket in Google Cloud Storage. There is a single global namespace shared by all buckets. For more information, see Bucket Name Requirements.

Buckets contain objects which can be accessed by their own methods. In addition to the aclproperty, buckets contain bucketAccessControls, for use in fine-grained manipulation of an existing bucket's access controls.

So that’s what a bucket is about. Alright, So here comes the question of the day: how come a bucket fits in to our data migration? Do we have to create a bucket before the data gets loaded in to our BigQuery database? Or does it get created eventually?

OK, now comes the interesting part. (*Whispering*: Read carefully)

Before your data is loaded into BigQuery, Stitch’s replication engine will replicate, process, and prepare data from your various integrations and temporarily move it into a Google Cloud Storage (GCS) bucket. This Cloud Storage bucket is automatically created by Stitch but owned by you. Stitch files are deleted immediately after data is loaded into BigQuery, so the storage costs associated with a Cloud Storage bucket should be negligible.

This Cloud Storage bucket is automatically created by Stitch but owned by you. Stitch files are deleted immediately after data is loaded into BigQuery, so the storage costs associated with a Cloud Storage bucket should be negligible.

PIPEBOMB! Ok, so 1. A bucket is created. 2. Not manually. 3. And the data inside it is deleted just after the data is loaded. INTERESTING right!?

Now arises another question Is there a way to access the bucket before it gets deleted? To figure it out, Go to GCP Console, and go to Storage.

GCP Console: Storage

You’ll see all the buckets you’ve created (If you created one or many before) and the one Stitch created for you.

The ticked bucket name is the one Stitch created for our behalf

So, Let’s explore what’s inside the bucket, Simply click on it and it’ll take you inside and as you can see in below snap there are no live objects in this bucket that means we cannot see any data object within the bucket. Again, “Stitch files are deleted immediately after data is loaded into BigQuery”.

SQL data gets transferred into a BigQuery Bucket and will begin to deploy the table structure and data in it. Right after the job has been done, everything inside of it will be deleted immediately.

Before I came to realise this, I was trying to change and manipulate the data inside a bucket before it gets loaded in to BigQuery database, After hours of researching only I found that it’s not possible. So I just wanted to save you the trouble and explain What happens to data on the fly when transferring data from SQL to BigQuery using Stitch.

Click here to read furthermore about Buckets.

Thank you!


Keen on getting to know me and my work? Click here for more!


Post a Comment

1 Comments