Squeezing the last DRiP: AutoML for cost-constrained Product classification

Published in Amazon Machine Learning Conference (AMLC), 2021

Citation: Abhishek Divekar*, Gaurav Manchanda*, Prit Raj, Abhishek Das, Karan Tanwar, Akshay Jagatap, Vinayak Puranik, Jagannathan Srinivasan, Ramakrishna Nalam, and Nikhil Rasiwasia. "Squeezing the last DRiP: AutoML for cost-constrained product classification". 9th conference of Amazon Machine Learning (AMLC 2021) (internal)

Recent progress in automated machine learning (AutoML), has shown that both hyperparameter search and stacking models can achieve performance that beats the median Kaggle competitor on standard classification metrics [13]. However, solutions that directly optimize cost of executing such models within minimal loss of performance have been sparingly explored. The challenge faced by current techniques is the lack of apriori knowledge of the deployment environment in which the model must operate, and its associated constraints (such as request batch-size or prediction latency). This shortcoming, however, affects a critical subset of users: business teams having cost-sensitive use-cases, who are unable to meet their goals if an AutoML-suggested model is expensive during training or inference. To bridge this gap, we propose DRiP, a versatile ML framework which encapsulates the phases needed for an automated system to select and iteratively refine candidate models while being constrained by an inference budget. We find that existing tools have individual capabilities that map to phases in our framework, but lack the overall capability to optimize against such constraints. Thus, we present our implementation of this framework as a new AutoML tool. When compared across 38 product classification datasets and various business use-cases, we find that DRiP is able to obtain 99.96% of the ROC-AUC performance of the best SOTA AutoML systems (AutoGluon and H2O.ai) at 37% of the cost. When tuned to minimize cost, DRiP offers better cost and performance than comparably optimized AutoML systems (average 0.21 ROC-AUC increase at 44% of the cost of distilled AutoGluon). If no cost constraints are imposed, “Unrestricted” DRiP provides the best overall performance (98.42 ROC-AUC vs 98.18 for next-best AutoML system). Although we evaluate classification, the framework can be used to automatically optimize any machine learning task having well-defined performance and cost.

Citation: Abhishek Divekar, Gaurav Manchanda, Prit Raj, Abhishek Das, Karan Tanwar, Akshay Jagatap, Vinayak Puranik, Jagannathan Srinivasan, Ramakrishna Nalam, and Nikhil Rasiwasia. “Squeezing the last DRiP: AutoML for cost-constrained product classification”. 9th conference of Amazon Machine Learning (AMLC 2021) (Poster) (acceptance-rate: 30%)