Diffgram Engineering Insider: Optimizing For Millions of Annotations on a Single Machine

How we scaled up a newly popular feature.

Pablo Estrada
Diffgram

--

At Diffgram we recently published a new feature which was an instant hit — The Import Wizard!

It allows non-technical users to skip the normal “API/Integration” step and import their model pre-labels, and other annotations straight from the UI. Diffgram was the first major provider to offer this!

Today I will talk about how we scaled his new feature.

The feature we are optimizing.

“We were on a call with a CTO in Israel a few days ago. He was super excited to show us that he had installed Diffgram.

He also had an amazon cloud permissions blocker we were helping him with, and during the process it required redoing a step in Diffgram.

I thought it was annoying to redo. But he was super excited!! Nearly giddy, to show us how quickly he could step through the UI Import Wizard. It was quite magical to see someone so enthusiastic with the product. It totally fit the cliché of “gives a user superpowers”. Anthony, — June 24 2021

The Problem: How can we handle millions of Annotations?

The new feature is getting used heavily and people are sending larger and larger files to it. How can we scale up the technical aspects of this new feature?

Today I will cover

  1. Optimizations to make it possible to stream millions of annotations from a single machine.
  2. Technical specifics that are likely applicable to other applications.

TL;DR We created a new streaming based process so massive JSON files can be processed easily be a single average-memory machine.

Optimization Summary:

  • Added streaming capabilities to the Frontend Import Wizard, allowing the upload of 100’s of thousands of instances from a single huge JSON file.
  • Testing this concept by efficiently parsing and uploading 823,901 bounding boxes for the COCO dataset from a single average-memory machine.
  • And on the receiving side, maintaining performance of our test bench server. You can see the server-side memory and CPU we did not go above 25%.
Test Bench Memory Usage during the upload of 800,000+ bounding boxes (32GB)
Test Bench CPU Usage during the upload of 800,000+ bounding boxes (16 core)

There are some notes and design principles that were really useful to achieve these numbers.

Understanding the Process & Context

The big idea here is that we want to transform arbitrary JSON Schema into the Diffgram Schema.

There’s two parts to the process:

  1. Pre-Processing on the UI. e.g. Validation that labels and keys exist. Counts of instances, etc.
  2. Server Processing. e.g. Matching existing Files (to be updated) and a variety of other processing.

Admin Context — Why Frontend?

Generally there’s an assumption that a client (Frontend) should rarely do much processing, and that all processing should be offloaded to the server.

We have been finding a lot of success in reversing this, and pushing more compute back to the client.

For this case, we have found that by defaulting to running this on the client two of the biggest bottlenecks for loading data are solved:

  1. The mapping and configuring side. The UI allows non-technical administrators to “fill a form” and gracefully map their existing Schema to Diffgram. Since the data is able to be present on the client side, we can do this in near real time and with less technical effort and less risk.
  2. Previewing that the result will be as the Admin intended. Becuase we can parse aspects of the data immediately, we can provide very useful checks and balances (eg how many instances are there?). For many use cases this is a huge benefit verses having to wait for some remote process to finish, checking if it parsed it correctly or not etc.

Keep in mind that the “real” processing is still happening service side — this is the pre-processing, pre-mapping, and validation. If you imagine a SDK/API wrapper that’s on the right high level track.

A brief tangent: Directionally, we plan to make this mapping wizard also usable for API/SDK level technical integrations. The service architecture is already horizontally scalable, so for automation cases, or even more extreme cases (eg Billions of annotations per import), the Admin’s mapping choices may be remotely loaded into the system. Ok Back to the main event!

Optimizing Performance for the Uploads

When a user creates a new upload batch, all the instances are validated, formatted, and saved on your database.

Usually these instances are sent along with the file data on a single request. But when the user upload a really big file this can cause issues both on the backend and the frontend UI.

For context, the JSON file may be remote, but may still need to download it in order to provide the advanced parsing and checks. For example, validating that labels exist, keys exist, etc. prior to uploading.

A quick refresher on Vue before we dive in:

  1. Diffgram’s frontend is based on regular JS, HTML and VueJS. Diffgram is a “large” VueJS app and we have written about some of our other little insights regarding 3 years of Vue here.
  2. Vue’s reactivity system is a powerful tool that helps your apps re render based on changes on the state of your component.

VueJS and Huge JSON files

The first problem we encountered when trying to upload was that the UI was freezing when we added a JSON file that was big enough.

Computed Props and Object.Freeze()

  • The first cause of this issue was that the JSON was being parsed and added to the component state for a preview of the data about to be uploaded.
  • Any time a related prop or data attribute in the component was changed, Vue had to re-render a 800,000+ array, this was completely unnecessary as this was a read only property that we did not mutate.
  • Changing the computed property into a method and saving the result of the method in a data attribute gave us more control on when this was called and helped us avoid unnecessary re-renders of the component.
  • We did not want reactivity on the 800,000 element array. So we used the useful Object.freeze() to remove reactivity of that object. That gave a huge performance boost and removed the UI freezing problem completely.

Basically think of like making an object “read only”. Since the parsing we were doing was always “over top” of the data there was no need to modify the original object.

But Vue didn’t know that, so by default it was trying to watch this giant file and slowing everything down. Object.freeze() fixed it in this context.

While in retrospect this may appear a fairly simple thing, it’s the first time in the system we have had to use it.

Stream the Data (Chunking)

We don’t want to send huge requests to the server. This is probably self explanatory but generally big files mean more network risk, more memory requirements, etc. At a minimum we want to be able to control each requests size. This may sound obvious but keep in mind the client may provide a JSON of arbitrary size (in the GB range).

So we used a slicing technique using the Blob class and our cloud storage, as well as the Content-Range headers for HTTP.

This was in part inspired by our existing resumable media uploading process which is implemented in all 3 major cloud providers.

const headers = {
'Content-Type': 'multipart/form-data',
'Content-Range': `bytes ${chunk_start}-${chunk_end - 1}/${file.size}`,
'chunk-file-id': `${uuid}`,
}

Here is and example of how a huge file was sent to the server.

const chunk_size_bytes = 1* 1024 * 1024; // Let's chunk in 1mb
const fileClone = blob.slice();
let bytes_sent = 0;
let chunk_start = 0;
let chunk_end = chunk_start + chunk_size_bytes;
let index = 0;
let total_chunks = Math.ceil(parseFloat(fileClone.size) / chunk_size_bytes);
while(bytes_sent < fileClone.size){
const uuid = uuidv4();
const formData = new FormData();
const slicedPart = blob.slice(chunk_start, chunk_end);
try{
const response = await axios.post(
`your/server/url`,
formData,
{headers: headers}
);

}
catch (e) {
console.error(e);
this.batch_error = this.$route_api_errors(e)
return
}
}

This allowed us to send more than 823,901 instances without getting server timeouts or drastically reducing our system resources.

Because Diffgram is installed on your hardware this saves you resources directly too!

Streaming Improves Parallel Processing

An added benefit of streaming the file processing was that our backend servers can now load balance the instances and process them in parallel.

We have to be careful though to now send all the chunked requests in parallel as that could potentially degrade your system or lead to an outage if a file is big enough. Here are 2 ways we keep this in mind:

  1. We rate limit the backend server. A simple decorator helps us rate limit endpoints in flask like this:
@limiter.limit("5 per second")

2. Limit the concurrency of your requests on the client side too. If you are doing something like this:

const result = await Promise.all(promises)

Where each promise can have a request similar to this:

const data = await axios.post(`your/server/url`, {key1: value1});

If you chunk your data in 50,000 promises, then you will potentially load your servers with 50,000 requests. This can easily break your entire system and leave your servers returning 502 errors.

To mitigate this, control the concurrency using a library like pLimit:

const limit = pLimit(10); // 10 Max concurrent request.
const file_keys = Object.keys(file_data); // An 800,000+ array
const promises = file_keys.map(file_key => {
return limit(() => this.create_input_for_update(file_data, file_key))
});

Now your client will send your requests in batches of 10 and will limit the load that your backend servers receive.

Final Thoughts

Diffgram Import Wizard is capable of ingesting instances close to the millions count from the UI on a single machine — all from an Admin filling a form.

Before the user would need to chunk the JSON or use an SDK to accomplish this.

Now with small memory management, concurrency control and reactivity management techniques we improved the UI and service to “just handle it”.

If you are interested in learning more about Diffgram code, check out our github repo.

See an error in this article, or the code itself? Feel free to open an issue here!

Found this interesting? Want to help us solve more problems like this?

We are actively growing our team! Feel free to choose the role that’s the nearest match or contact us,

Thanks for reading!

Follow us on Twitter

Follow us on LinkedIn

--

--