by Alexandre Matton and Adrian Lam on December 17th, 2020

At Scale AI, we use Machine Learning models in a wide range of applications to empower our data labeling pipeline. We strive for speed and efficiency, and always try to get the best out of the models. Here, we will discuss some tricks we discovered that drastically improve over the PyTorch Transformer implementation in just a few lines of code.

Transformers are Here to Stay

Transformers have become ubiquitous. They were first introduced in Attention is All You Need (Vaswani et al., 2017) and were quickly added to Pytorch. Their popularity increased even more with…

by Nishant Subramani on October 7th, 2020


The ML Team at Scale hosts a weekly reading groups where members choose papers from the broad AI/ML community and discuss them ranging from topics in Computer Vision to NLP to Active Learning. Here, we describe a brief summary of some of the insights we gained from various papers and how we aim to use some of that knowledge in future research projects and applications to Scale AI’s business.

Discovering Useful Sentence Representations from Large Pretrained Language Models
Presenter: Nishant Subramani

This is Scale’s first research paper and we wrote a blog post summarizing the paper. This paper focuses…

by James Lennon on August 18th, 2020


Machine learning is a once-in-a-generation technological shift that is creating huge value by unlocking insights from data. But algorithmic bias remains a major concern as ML becomes more widespread. Unless ML models are trained on representative data, they can develop serious biases, significantly harming underrepresented groups and leading to ineffective products. We investigated the CoNLL-2003 dataset — a standard for building algorithms that recognize named entities in text — and found that the data is highly skewed toward male names. Using Scale’s technology we were able to systematically mitigate this bias by:

  1. Enriching…

by Scale Team on August 5th, 2020

Introducing: Scale Nucleus

We founded Scale to create the infrastructure needed to build AI in any industry, by anyone. We started tackling this complex problem at the root — turning raw data into high quality training data for models. In this pursuit, we spent the last four years building ML-augmented annotation products for all data types, expanding our solutions for major industries, and making significant technological strides in scaling our use of ML in doing so.

But the problem of building effective, accurate, unbiased ML models still remains. To do this, aggregate metrics in ML are not good enough. Better ML starts with…

by Aatish Nayak on April 22nd, 2020

Scale AI is pleased to announce the launch of Scale Document — our endpoint for the secure processing of documents.

Advancements in natural language processing (NLP) technologies have spurred initiatives across a wide range of industries to digitize document-based content. Led by the logistics, banking, financial services, and insurance industries, the intelligent document processing (IDP) market is expected to reach $1.1 billion by the end of 2020 according to research conducted by the Everest Group.

Scale Document builds on the previously released Scale Text product to better address customers with document processing and…

by Daniel Lee on May 18th, 2020


At Scale AI, we label on the order of 10MM annotations per week. To deliver high-quality annotations for this enormous volume of data, we’ve developed a number of techniques including advanced sensor fusion to provide rich detail about complex environments, active tooling to accelerate the labeling process, and automated benchmarks to measure and maintain labeler (Tasker) quality. As we work with more customers, more Taskers, and more data, we continue to refine these methods to improve our labeling quality, efficiency, and scalability.

How we use ML

While this vast quantity of data provides Scale AI with invaluable…

by Scale Team on May 20th, 2020

In these unprecedented times, COVID-19 has brought out a renewed and inspiring sense of collaboration in AI and research communities as we work toward solving pressing issues. But the pandemic has also exacerbated some of the difficulties of developing new technologies at scale.

For example, as we shelter in place around the world, the promise of autonomous vehicles (AVs) to improve access to critical goods and services has never felt more relevant. …

Paul Gresia

Lead Recruiter - Scale AI

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store