What should you use?

Posted by: Pdfprep Category: DP-200 Tags: , ,

You need to develop a pipeline for processing data.

The pipeline must meet the following requirements.

• Scale up and down resources for cost reduction.

• Use an in-memory data processing engine to speed up ETL and machine learning operations.

• Use streaming capabilities.

• Provide the ability to code in SQL, Python, Scala, and R.

• Integrate workspace collaboration with Git.

What should you use?
A . HDInsight Spark Cluster
B . Azure Stream Analytics
C . HDInsight Hadoop Cluster
D . Azure SQL Data Warehouse

Answer: A

Explanation:

Aparch Spark is an open-source, parallel-processing framework that supports in-memory processing to boost the performance of big-data analysis applications.

HDInsight is a managed Hadoop service. Use it deploy and manage Hadoop clusters in Azure. For batch processing, you can use Spark, Hive, Hive LLAP, MapReduce.

Languages: R, Python, Java, Scala, SQL

You can create an HDInsight Spark cluster using an Azure Resource Manager template.

The template can be found in GitHub.

References: https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/batch-processing

Leave a Reply

Your email address will not be published.