Stop Obsessing Over the “Best-Performing” Model

Nov 21, 2019

Makers in artificial intelligence today obsess over having the best performing model; they obsess over developing the model that has state of the art performance on a well-curated dataset, and they focus too little on how that model will perform in production or whether the model fits well into the end-user’s existing workflow.

If you have the “best-performing” model in an isolated training and testing environment but fail to integrate well into the end-user’s existing workflow and fail to identify and address data drift in production, your model will fail to deliver the type of return on investment your organization needs.

Why is there so much obsession with model performance?

Part of the obsession stems from artificial intelligence research being done in academia. To have their research published, academics strive to beat the performance of the latest state of the art model. If these researchers eke out better model performance than their peers, they are able to publish their work and claim their model is the latest and greatest.

Research published by the big tech giants (e.g., Amazon, Facebook, Google, Microsoft, etc.) fuels this obsession. Each of these tech giants wants to achieve fame for having the best model on the market.

In the natural language community, we’ve seen this obsession with Google’s release of BERT last fall, followed by OpenAI’s release of GPT-2 earlier this year, which was then followed by Google’s XLNet and then Facebook’s RoBERTa. All of these “advances” largely use the same architecture (i.e., the Transformer) but train for longer and on more data to beat the last best model on the market.

The academics and the big tech giant researchers are inadvertently propagating the myth that having the best-performing model is what matters most.

To be blunt, having the absolute best-performing model does not matter that much, at least not in enterprise.

To date, companies that have designed, funded, and tried to execute AI strategies have not realized the return on investment they had expected.

It is not because the developers of the models failed to achieve good model performance. Most seasoned data scientists are very methodical about setting up a proper training and testing environment for their models and squeezing as much performance from their model as possible.

Despite these good intentions, the models still fail an overwhelming majority of the time in production.

What are common reasons why AI strategies underwhelm?

First, models may fail because the data in production drifts from the data used in training and testing the model. If data scientists obsess over having the best-performing training and test performance, they may not focus on what truly matters. What truly matters is how the model will handle new data in production, especially if that data drifts from the type of data the model had trained on.

Building more robustness and resiliency into the model so that it is able to detect data drift, alert data scientists and other stakeholders, and re-train on new data to perform inference is what matters more than simply having the best-performing model in the training and testing environment.

Second, models may fail because they do not provide the type of insight the end-user truly needs. Having a great performing model that doesn’t truly solve the problem end-to-end is a model with great, albeit unrealized, potential.

For example, consider an anomaly detection system that does a great job flagging anomalous events but does little to provide interpretability of the results to its human end-users. If there is no investigation interface to this anomaly detection model, a lot of the value is left on the table, and the company does not realize the full potential of the model.

Third, AI strategies may fail because the models cannot perform all of the work satisfactorily using machine intelligence. While it is more elegant to design a fully automated, 100% machine-driven solution to problems, in reality this is just not possible. There will be a healthy number of cases that require the intervention of humans.

Instead of seeing this as a failure, the designers of the models should view this machine + human solution as a stepping stone for a more machine-driven solution in the future. Also, sometimes clever engineering using rules-based logic has a place and should be part of the solution. Don’t shoot for the perfect, most elegant solution and fail; settle for good enough, show good return on investment to key stakeholders at your company, and then improve the solution from there.

Deliver Complete End-to-End Solutions, Not Just Cool Tech

This brings us to my final point on the AI landscape today. Enterprises typically need customized solutions to solve their specific business problems. General purpose artificial intelligence products with little or no room for customization will not cut it for most enterprises.

Yet, the overwhelming number of SaaS-based vendors in artificial intelligence today build and deliver generalized AI products that require the end-user to do a lot of legwork on their own to properly integrate the SaaS products into their workflow. Without this degree of customization on their end, the end-users will not realize a proper return on investment.

These SaaS-based vendors design AI products that scale well to a broad base of end-users because they want to scale quickly and without a lot of professional services-type legwork. Typically, VC investors will want these SaaS-based vendors to pursue product strategies that do not require an army of consultants.

But, without the level of customization that consultants provide, enterprises with custom needs will not find true value from generalized AI products.

Where Do We Go From Here?

Focus less on model performance during training and testing and obsess more over how the model will adapt to unforeseen data drift in production.
Develop models with the end-user in mind. The model is a product, and the product should receive feedback from the end-user early and often as part of the model development and workflow integration.
Design a solution that leverages both machine intelligence and human intelligence. Keep humans in the loop to manage edge cases, which will almost certainly occur in production.
If you are a maker of an AI product, focus on making AI products that solve a specific end-to-end functional need (e.g., AI-based CRM solution or invoice and expense management) instead of building generalized AI products that require a lot of customization before the end-user is able to realize any value. Or, build a generalized AI product but have a team of solutions-focused consultants that help clients integrate the product into their workflow and realize the return on investment they need.

Cool tech that has state of the art performance is a great accomplishment, but the tech needs to be properly packaged into a complete end-to-end solution for enterprises to get true value from it. Obsess less on cool, benchmark-beating tech; obsess more on how the tech could solve high value problems businesses have. This is how more enterprises will realize the type of return on investment they need from their AI strategies.

Ankur’s Newsletter

Discussion about this post