Your team is building a data engineering and data science development environment.
The environment must support the following requirements:
– support Python and Scala
– compose data storage, movement, and processing services into automated data pipelines
– the same tool should be used for the orchestration of both data engineering and data science
– support workload isolation and interactive workloads
– enable scaling across a cluster of machines
You need to create the environment.
What should you do?
A . Build the environment in Apache Hive for HDInsight and use Azure Data Factory for orchestration.
B . Build the environment in Azure Databricks and use Azure Data Factory for orchestration.
C . Build the environment in Apache Spark for HDInsight and use Azure Container Instances for orchestration.
D . Build the environment in Azure Databricks and use Azure Container Instances for orchestration.
Answer: B
Explanation:
In Azure Databricks, we can create two different types of clusters.
– Standard, these are the default clusters and can be used with Python, R, Scala and SQL
– High-concurrency
Azure Databricks is fully integrated with Azure Data Factory.
Incorrect Answers:
D: Azure Container Instances is good for development or testing. Not suitable for production workloads.
References: https://docs.microsoft.com/en-us/azure/architecture/data-guide/technology-choices/data-science-andmachine-learning