2025’s Career and Hiring Trends - part 2

part 2 (last).

Upwinds for DS Jobs

Engineers can use AutoML to create models

I was chatting with my friend Gregory Azuolas about what it really takes to train an ML model at Google. The company has gone all-in on automating ML training—not just by streamlining data pipelines and automating code deployments, but by eliminating much of the manual effort traditionally required to build models.

Google’s AutoML infrastructure allows engineers with no background in machine learning or data science to train classifiers across a wide range of data types and scales. This shift reduces the need for specialized ML engineers to fine-tune models manually.

Gregory acknowledged that AutoML models might not be the most efficient in terms of power consumption or latency. But in his view, the real cost in ML isn’t inference—it’s model creation. While companies will continue optimizing inference efficiency (see DeepSeek-R1), the number of jobs dedicated to this task is relatively small. The bigger trend is clear: the barrier to training ML models is getting lower, and the role of traditional ML engineers is evolving fast.

Engineers can user LLMs to replace models

If performance and cost isn’t super critical, engineers can just use an GPT-based model to perform their classification. Before ChatGPT, custom classifiers must be trained and retrained as the business / regulations change. But since ChatGPT can do everything and is ready to do everything, there is no need to spend weeks or months trying to design and train a custom model for each business cases.

Instead of a team of DS people outputting 1-2 models per month, 1 engineer can modify a prompt to chatgpt in days and now there is a now classification ready for production.

Once the AI has classified enough production data and inference costs (either dollar or latency) exceed budget, a custom model can easily be trained using automl by an engineer.

Data cleaning and generation

Historically, data scientists spent a significant portion of their time collecting, labeling, and cleaning data—often writing complex scripts to sanitize and validate datasets before they were ready for model training. But today, the bottleneck isn’t human effort; it’s inference time. Advances like DeepSeek and Stanford’s Alpaca have made major strides in leveraging inference for data generation and refinement.

As a result, each data scientist can accomplish more with fewer resources, reducing the overall demand for headcount.

[0] - Some people use the term “cheaper”, but I don’t think the talent should be considered cheap, b/c that has negative connotations of lower quality.






More posts

Posts

subscribe via RSS