Microsoft Data Scientist: ‘Common Dilemma’ Led to Creation of Azure Automated ML

NEW YORK — Automated machine learning (ML) is at the forefront of Microsoft’s push to make Azure ML an end-to-end solution for anyone who wants to build and train models that make predictions from data and then deploy them anywhere — in the cloud, on-premises, or at the edge, according to Danielle Dean, a principal data science lead for the company’s AzureCAT team.

It was a “common dilemma that led to the birth of automated machine learning on Azure,” she said during a keynote called “Automated ML: A journey from CRISPR.ML to Azure ML” at the O’Reilly Artificial Intelligence (AI) Conference April 17.

“Trial and error” plays a key role in finding solutions to many problems, she said, noting: “At the end of the day, we just have to try stuff.” But “the data scientist’s dilemma” is “how much of this do you do” when following that trial and error philosophy before stopping, she pointed out.

Dean discussed challenges that a Microsoft team faced with its CRISPR genome gene-editing project specifically. ( “The hard part” was figuring out “where to cut the genome” because “it turns out that there’s 18 billion combinations of genes,” guide ribonucleic acids (RNAs) and off-target locations, “which are important to understand in order to understand where to cut,” she said.

The Microsoft team approached the challenge as a “machine learning problem, looking to predict the likelihood of gene-edit success,” she told the conference, noting the team spent six months on ML predictive modelling to solve the problem. Using ML for CRISPR resulted in a 20% improvement in accuracy and 50% savings in cost and time per gene, Dean said, calling it “amazing progress in gene editing.”

The same ML predictive modelling has gone on to be used for other problems, she said, noting the same automated ML process has become available in AzureML and Microsoft’s PowerBI business analytics service, and has been used by other companies also, including oil and gas company BP.

Automated ML is just one example of Microsoft turning research into product, she went on to point out, noting Microsoft “AI breakthroughs” were made in object recognition in 2016, speech recognition in 2017, machine reading comprehension in January 2018 and machine translation in March 2018.

In separate sessions at the conference: Microsoft senior data scientists Mathew Salvaris and Fidan Boylu Uz discussed how deep learning models can be deployed on graphic processing unit (GPU)-enabled Kubernetes clusters; and Sarah Bird, AI research and products principal program manager, provided an overview of ML Ops (DevOps for ML), sharing solutions and best practices for an end-to-end pipeline for data preparation, model training and model deployment while maintaining a comprehensive audit trail.

In another keynote, Tony Jebara, director of an ML team at Netflix, discussed how his company uses ML to personalize the featured images for movie and TV show box art that Netflix subscribers see while using his company’s streaming service. As an example, he noted that for the film “Pulp Fiction,” if the user has shown a preference for Uma Thurman movies on the service, the image they may see for the Quentin Tarentino film will be of that actress, but if the user seems to like John Travolta movies, that person will instead see an image of the actor.

Similar customization of images is used for subscriber preferences that include what genres he or she likes best, he noted.

Since implementing ML for image personalization, Netflix has seen increased views of content by subscribers, according to Jebara, who said the goal is to “maximize the take rate” for content. Echoing Microsoft’s Dean, he noted that “trial and error” plays a key role in implementing the technology.