Which technologies should you use?

Posted by: Pdfprep Category: DP-200 Tags: DP-200 practice exam, Microsoft Data Certification, DP-200 exam questions Post Date: November 20, 2020

HOTSPOT

You are developing a solution using a Lambda architecture on Microsoft Azure.

The data at rest layer must meet the following requirements:

* Data storage:

– Serve as a repository for high volumes of large files in various formats.

– Implement optimized storage for big data analytics workloads.

– Ensure that data can be organized using a hierarchical structure.

* Batch processing:

– Use a managed solution for in-memory computation processing.

– Natively support Scala, Python, and R programming languages.

– Provide the ability to resize and terminate the cluster automatically.

* Analytical data store:

– Support parallel processing.

– Use columnar storage.

– Support SQL-based languages.

You need to identify the correct technologies to build the Lambda architecture.

Which technologies should you use? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point.

Answer:

Explanation:

Data storage: Azure Data Lake Store

A key mechanism that allows Azure Data Lake Storage Gen2 to provide file system performance at object storage scale and prices is the addition of a hierarchical namespace. This allows the collection of objects/files within an account to be organized into a hierarchy of directories and nested subdirectories in the same way that the file system on your computer is organized. With the hierarchical namespace enabled, a storage account becomes capable of providing the scalability and cost-effectiveness of object storage, with file system semantics that are familiar to analytics engines and frameworks.

Batch processing: HD Insight Spark

Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications.

HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch processing, you can use Spark, Hive, Hive LLAP, MapReduce.

Languages: R, Python, Java, Scala, SQL

Analytic data store: Azure Synapse Analytics

Azure Synapse Analytics Warehouse is a cloud-based Enterprise Data Warehouse (EDW) that uses Massively Parallel Processing (MPP).

Azure Synapse Analytics stores data into relational tables with columnar storage.

Note: As of November 2019, Azure SQL Data Warehouse is now Azure Synapse Analytics.

References:

https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-namespace

https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-overview-what-is

Which technologies should you use?

Author