What is the most cost-effective way to run this job?


TerramEarth’s 20 million vehicles are scattered around the world. Based on the vehicle’s location, its telemetry data is stored in a Google Cloud Storage (GCS) regional bucket (US, Europe, or Asia). The CTO has asked you to run a report on the raw telemetry data to determine why vehicles are breaking down after 100 K miles. You want to run this job on all the data.

What is the most cost-effective way to run this job?
A . Move all the data into 1 zone, then launch a Cloud Dataproc cluster to run the job
B . Move all the data into 1 region, then launch a Google Cloud Dataproc cluster to run the job
C . Launch a cluster in each region to preprocess and compress the raw data, then move the data into a multi-region bucket and use a Dataproc cluster to finish the job
D . Launch a cluster in each region to preprocess and compress the raw data, then move the data into a region bucket and use a Cloud Dataproc cluster to finish the job

Answer: D

Explanation:

Storageguarantees 2 replicates which are geo diverse (100 miles apart) which can get better remote latency and availability.

More importantly, is that multiregional heavily leverages Edge caching and CDNs to provide the content to the end users.

All this redundancy and caching means that Multiregional comes with overhead to sync and ensure consistency between geo-diverse areas. As such, it’s much better for write-once-read-many scenarios. This means frequently accessed (e.g. “hot” objects) around the world, such as website content, streaming

videos, gaming or mobile applications.

References: https://medium.com/google-cloud/google-cloud-storage-what-bucket-class-for-the-best­performance-5c847ac8f9f2

Leave a Reply

Your email address will not be published.