Each day, company plans to store hundreds of files in Azure Blob Storage and Azure Data Lake Storage. The company uses the parquet format.
You must develop a pipeline that meets the following requirements:
✑ Process data every six hours
✑ Offer interactive data analysis capabilities
✑ Offer the ability to process data using solid-state drive (SSD) caching
✑ Use Directed Acyclic Graph(DAG) processing mechanisms
✑ Provide support for REST API calls to monitor processes
✑ Provide native support for Python
✑ Integrate with Microsoft Power BI
You need to select the appropriate data technology to implement the pipeline.
Which data technology should you implement?
A . Azure SQL Data Warehouse
B . HDInsight Apache Storm cluster
C . Azure Stream Analytics
D . HDInsight Apache Hadoop cluster using MapReduce
E . HDInsight Spark cluster
Answer: B
Explanation:
Storm runs topologies instead of the Apache Hadoop MapReduce jobs that you might be familiar with. Storm topologies are composed of multiple components that are arranged in a directed acyclic graph (DAG). Data flows between the components in the graph. Each component consumes one or more data streams, and can optionally emit one or more streams.
Python can be used to develop Storm components.
References: https://docs.microsoft.com/en-us/azure/hdinsight/storm/apache-storm-overview