glTF in Unity optimization - 6. Asynchronous Programming

27 Jan 2021

This is part 6 of a mini-series.

TL;DR: C# Asynchronous Programming opens opportunities to make your code clean and fast

Disclaimer

Before we even start, two things:

This is not an extensive comparison between Coroutines and asynchronous programming. I highly recommend watching the presentation Best practices: Async vs. coroutines - Unite Copenhagen.
Async is not generally a replacement for Coroutines in Unity. The two have their respective pros and cons. Always choose the right tool for the task!

Cool? Then lets get started!

Why do we need coroutines

When a glTF file exceeds a certain size, loading it might take longer than there is time available within a single render cycle (frame). If that happens, certain frames take very long, your framerate starts to drop and you will see visual stuttering (which is especially bad in immersive XR). A way to prevent this is to split up the work into smaller portions and spread those evenly across frames.

Another use case are I/O bound tasks, like loading a file into memory or performing a download. While the data loads, rendering should continue without hiccups.

Unity Coroutines

One way to achieve this is to use Unity Coroutines. They creatively use an enumerator pattern to defer the execution of parts of a method to a later point in time. Whenever you call yield return null; in your coroutine, execution would stop at this point and continue in the next frame.

Some pseudo code example demonstrating how coroutines were used in glTFast


// The lack of return values is compensated by storing them in class variables
int globalVar;

void Start() {
  // Notice: you need a reference to a MonoBehaviour (this)
  // to start the coroutine
  this.StartCoroutine( SplitTheWorkCoroutine() );
}

IEnumerator SplitTheWorkCoroutine() {
  Debug.Log("I'm shown right away");
  yield return null; // Pause here and return a frame later
  Debug.Log("I'm shown on the next frame");
  yield return new WaitForSeconds(1); // Pause here and return a second later
  Debug.Log("I'm shown a second later");
  // storing some result state
  globalVar = 42;
}

Coroutines are lean and effective. They served me well a lot of times. With regards towards the future of glTFast optimizations, they raise some concerns though:

You need a MonoBehaviour to run the coroutine
Coroutines must be run exclusively on the main thread
No support for return values, so you need a workaround (storing results in class variables mostly)

The future of high performance computing in Unity is called DOTS. Part of this philosophy is about replacing GameObjects and MonoBehaviours with Entities and (data only) Components. At the time of writing DOTS is still under development and I haven't found the time to experiment with it. What seems clear though is, that depending on MonoBehaviour and coroutines is not future proof.

glTFast already uses the C# Job System to offload work to other threads. Code that operates on managed data cannot be run with C# Job system though.

Async to the rescue

The more I learned about asynchronous programming, the more it appeared to be a viable option for glTF. Replacing coroutines would make changes to the API necessary, but turned out to not be too much of work.

Example above converted:


async void Start() {
  // This method runs non-blocking, but is "awaited". Its return value can be used afterwards.
  var result = await SplitTheWork();
}

async Task<int> SplitTheWork() {
  Debug.Log("I'm shown right away");
  await Task.Yield(); // Pause here and return a frame later
  Debug.Log("I'm shown on the next frame");
  await Task.Delay(1000); // Pause here and return a second later
  Debug.Log("I'm shown a second later");
  return 42; // You can return values like any regular method
}

Benefits

To sum up some of the benefits of async:

No more MonoBehaviour
Pushing work on other threads should become easier (more on that in a next article)
Async method can have return values
- This resulted in less boilerplate code and removal of class variables for result state
More meaningful stack traces (also in profiler)

Benchmarking the results

I did not expect any significant change in performance between the two approaches. In order not to introduce performance regressions, I did some random benchmark.

For the tests I used a couple of different glTFs with unique characteristics, loaded them repeatedly and picked average values. The differences were similar across models.

Example model: Buggy. This glTF has a somewhat complex scene graph (and thus bigger JSON part), but no textures.

glTFast 2.5.1: 80 ms (17.6 ms max frame time)
glTFast async: 44 ms (21 ms max frame time)

Wow, that's a LOT of unexpected difference. Looking at the profiling data it seems that the coroutine version renders another two frames in between starting loading and instancing the glTF. The CPU cycles of those two frames should be put to use. It turns out, I was a little too aggressive in the past when it comes to spreading work across frames. The code waited for no reason for the next frame twice.

On the other hand, notice the increased max frame time, which botches the desired frame rate. The async version does not defer work so aggressively and thus executes too much work within one cycle. This was especially bad on models with many Jpeg/PNG textures.

The conclusion here is that the decision at what point execution should pause until the next frame is subject to optimization.

Conclusion

I am not happy with the fact, that I discovered these unexpected changes by coincidence and not systematically. The regressions of the async branch need to be eliminated/reduced.

But even more important, I'm gonna need some automated performance testing, so that regressions don't go (un)detected by coincidence in the future.