What should you do?

Posted by: Pdfprep Category: DP-100 Tags: Microsoft Data Certification, DP-100 exam questions, DP-100 practice exam Post Date: October 30, 2020

A set of CSV files contains sales records. All the CSV files have the same data schema.

Each CSV file contains the sales record for a particular month and has the filename sales.csv. Each file in stored in a folder that indicates the month and year when the data was recorded. The folders are in an Azure blob container for which a datastore has been defined in an Azure Machine Learning workspace. The folders are organized in a parent folder named sales to create the following hierarchical structure:

At the end of each month, a new folder with that month’s sales file is added to the sales folder.

You plan to use the sales data to train a machine learning model based on the following requirements:

– You must define a dataset that loads all of the sales data to date into a structure that can be easily converted to a dataframe.

– You must be able to create experiments that use only data that was created before a specific previous month, ignoring any data that was added after that month.

– You must register the minimum number of datasets possible.

You need to register the sales data as a dataset in Azure Machine Learning service workspace.

What should you do?
A . Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/ sales.csv’ file every month. Register the dataset with the name sales_dataset each month, replacing the existing dataset and specifying a tag named month indicating the month and year it was registered. Use this dataset for all experiments.
B . Create a tabular dataset that references the datastore and specifies the path ‘sales/*/sales.csv’, register the dataset with the name sales_dataset and a tag named month indicating the month and year it was registered, and use this dataset for all experiments.
C . Create a new tabular dataset that references the datastore and explicitly specifies each ‘sales/mmyyyy/sales.csv’ file every month. Register the dataset with the name sales_dataset_MM-YYYY each month with appropriate MM and YYYY values for the month and year. Use the appropriate month-specific dataset for experiments.
D . Create a tabular dataset that references the datastore and explicitly specifies each ‘sales/mm-yyyy/ sales.csv’ file. Register the dataset with the name sales_dataset each month as a new version and with a tag named month indicating the month and year it was registered. Use this dataset for all experiments, identifying the version to be used based on the month tag as necessary.

Answer: B

Explanation:

Specify the path.

Example:

The following code gets the workspace existing workspace and the desired datastore by name. And then passes the datastore and file locations to the path parameter to create a new TabularDataset, weather_ds.

from azureml.core import Workspace, Datastore, Dataset

datastore_name = ‘your datastore name’

# get existing workspace

workspace = Workspace.from_config()

# retrieve an existing datastore in the workspace by name

datastore = Datastore.get(workspace, datastore_name)

# create a TabularDataset from 3 file paths in datastore

datastore_paths = [(datastore, ‘weather/2018/11.csv’),

(datastore, ‘weather/2018/12.csv’),

(datastore, ‘weather/2019/*.csv’)]

weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)

Author