<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>Andreas Atteneder&#39;s website</title>
  <subtitle>Creating pixels and polygons</subtitle>
  <link href="https://pixel.engineer/feed.xml" rel="self"/>
  <link href="https://pixel.engineer/"/>
  <updated>2025-12-07T00:00:00Z</updated>
  <id>https://pixel.engineer/</id>
  <author>
    <name>Andreas Atteneder</name>
    <email>andreas.atteneder@gmail.com</email>
  </author>
  
  <entry>
    <title>Breaking radio silence</title>
    
    <summary>This blog came down to a halt. Why so?</summary>
    
    <link href="https://pixel.engineer/posts/breaking-radio-silence/"/>
    <updated>2025-12-07T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/breaking-radio-silence/</id>
    <content type="html">&lt;p&gt;I noticed I didn&#39;t write a blog post in over 3 years and I reflected on why I think that is.&lt;/p&gt;
&lt;p&gt;TL;DR: Founding a family, focusing on work instead of public relations and imposter syndrome.&lt;/p&gt;
&lt;h2 id=&quot;a-blog-is-born&quot; tabindex=&quot;-1&quot;&gt;A blog is born&lt;/h2&gt;
&lt;p&gt;I started this blog in 2019. At the time I wasn&#39;t finding too much joy with the direction work related projects took, so I had started hobby projects. Those cured the itch for technological excitement and gave me a sense of freedom I haven&#39;t felt in my job at the time. Naturally I wanted to write about what excited me and, inspired by other great developers who were blogging and sharing gems of wisdom, I followed their path.&lt;/p&gt;
&lt;p&gt;It didn&#39;t take long and my work got noticed. I was contracted to work on Khronos Group&#39;s &lt;a href=&quot;https://github.com/KhronosGroup/KTX-Software&quot;&gt;KTX-Software&lt;/a&gt;, to implement features for &lt;a href=&quot;https://github.com/atteneder/glTFast&quot;&gt;glTFast&lt;/a&gt; (my very own open source project) and got solid job offers. One of which was from &lt;a href=&quot;https://unity.com/&quot;&gt;Unity&lt;/a&gt;. They asked me to continue working on my hobby projects on their payroll, a dream come true offer that I gladly took. I&#39;m not sure if my blog played a significant part in these things happening. While I think my work on open source projects spoke the loudest, I&#39;d like to believe that the blog helped ever so slightly.&lt;/p&gt;
&lt;h2 id=&quot;activity-going-down&quot; tabindex=&quot;-1&quot;&gt;Activity going down&lt;/h2&gt;
&lt;p&gt;Before I joined Unity I dished out 11 blog posts over the course of 1.5 years. The next 1.5 years only three more posts and then I stopped completely. What happened?&lt;/p&gt;
&lt;h2 id=&quot;pack-up-all-you-belongings&quot; tabindex=&quot;-1&quot;&gt;Pack up all you belongings&lt;/h2&gt;
&lt;p&gt;Starting at Unity forced me to relocate to a country where they had a legal entity to be employed by. While I interviewed with Unity my girlfriend got pregnant. We decided to move to Hamburg, Germany, as my girlfriend is German and we would live close by my in-laws. Our first child was born 4 months into my new role. We married later that year, another two summers later our second child was born and we moved yet again to a place that was big enough for the four of us.&lt;/p&gt;
&lt;h2 id=&quot;there&#39;s-no-guarantee-for-health&quot; tabindex=&quot;-1&quot;&gt;There&#39;s no guarantee for health&lt;/h2&gt;
&lt;p&gt;During her second pregnancy my wife developed &lt;a href=&quot;https://en.wikipedia.org/wiki/Gestational_diabetes&quot;&gt;gestational diabetes&lt;/a&gt;. Definitely something to be taken seriously, but usually not too concerning long-term. When she continued to check her blood sugar levels after childbirth (out of pure curiosity) it was still way too high. She then got diagnosed with diabetes type 1, a permanent condition. It turned her&#39;s and our life around, as you can imagine.&lt;/p&gt;
&lt;p&gt;Before becoming a parent I used to stay up late and sleep until shortly before I got to work. After working hours I had plenty of time and energy for hobbies and even more so on the weekends. It was in those hours where I worked on blog posts. Those precious hours and my energy got directed towards a much more important thing now, supporting my family.&lt;/p&gt;
&lt;h2 id=&quot;doing-vs.-writing&quot; tabindex=&quot;-1&quot;&gt;Doing vs. writing&lt;/h2&gt;
&lt;p&gt;Most people have heard this public relations quote at some point:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Do something good and talk about it.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Though promotion might help getting more support for my projects and possibly advance my career as well, I enjoy hacking on projects much more than rambling on about it. It&#39;s a bit different when it&#39;s purely about sharing knowledge, but even that takes time away from doing productive work.&lt;/p&gt;
&lt;h2 id=&quot;never-good-enough&quot; tabindex=&quot;-1&quot;&gt;Never good enough&lt;/h2&gt;
&lt;p&gt;I rarely feel that my work is good enough to hold up in the spotlight and tend to focus on the remaining flaws. That relates to both my software as well as my blog posts.&lt;/p&gt;
&lt;p&gt;By now I started many blog posts which ended up unpublished and abandoned in half-finished state as I didn&#39;t feel they were of substantial value.&lt;/p&gt;
&lt;p&gt;A little reminder for me and my fellow &lt;a href=&quot;https://en.wikipedia.org/wiki/Impostor_syndrome&quot;&gt;impostor syndrome&lt;/a&gt; victims: Perfect is the enemy of done.&lt;/p&gt;
&lt;h2 id=&quot;a-no-reason-to-highlight&quot; tabindex=&quot;-1&quot;&gt;A no-reason to highlight&lt;/h2&gt;
&lt;p&gt;One would have to think the fear of being exposed as fraud must stem from experiencing malevolent, adverse criticism.&lt;/p&gt;
&lt;p&gt;Quite the opposite is true. In all those years putting out my work all I&#39;ve gotten has been acknowledgements, words of encouragement and praise. Criticism has only been formulated in positive and constructive manner. Even when I haven&#39;t answered issue reports or pull requests on GitHub in weeks, I was never met with hostility whatsoever.&lt;/p&gt;
&lt;p&gt;Given that there&#39;s so many stories about open source maintainers who have to endure toxic behaviour I&#39;m super grateful that this never happened to me, not even once.&lt;/p&gt;
&lt;p&gt;Thank you, people of the internet ❤️ for keeping my hope in humanity intact.&lt;/p&gt;
&lt;h2 id=&quot;so%2C-where-does-that-lead-us&quot; tabindex=&quot;-1&quot;&gt;So, where does that lead us&lt;/h2&gt;
&lt;p&gt;I don&#39;t plan on becoming a volume writer anytime soon, but I&#39;d like to slowly resurrect the habit of writing posts to share my learnings.&lt;/p&gt;
&lt;p&gt;The kids grow and become more independent by the day. My wife has gotten somewhat comfortable with her condition and we are steadily learning to better balance our individual needs within the family. All that leads me to believe I&#39;ll able to spend more time on things like sports and hobbies, including this blog.&lt;/p&gt;
&lt;p&gt;I might broaden the topics I write about, starting with this blog post. Coding, personal stories, non-work-related hobbies, you name it.&lt;/p&gt;
&lt;p&gt;At the very least some AI bots will enjoy crawling it. Maybe my kids dig it out, once they&#39;re older. Who knows.&lt;/p&gt;
&lt;p&gt;Thanks for sticking around.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 8. Asynchronous Scene Instantiation</title>
    
    <summary>Read how asynchronous scene instantiation helped improving the frame rate</summary>
    
    <link href="https://pixel.engineer/posts/gltfast-async-instantiation/"/>
    <updated>2022-09-16T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltfast-async-instantiation/</id>
    <content type="html">&lt;p&gt;This is part 8 of a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;TL;DR: &lt;a href=&quot;https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/&quot;&gt;Asynchronous&lt;/a&gt; instantiation reduces frame rate drops on big scenes (i.e. scenes with many nodes and lots of content), but it required breaking more of the API&lt;br /&gt;
than I presumed.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&quot;introduction&quot; tabindex=&quot;-1&quot;&gt;Introduction&lt;/h2&gt;
&lt;p&gt;glTFast currently loads a glTF file in two phases.&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Reading the source glTF and converting it to Unity resources (meshes, materials, textures)&lt;/li&gt;
&lt;li&gt;Instantiation: Actually creating GameObjects/Entities of a glTF scene&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The whole process might exceed the computational time budget available for a single frame and thus causing stutter. That&#39;s why async programming was used to spread work across frames, but unfortunately this was only done for the first phase. This post is about the adventure of making phase 2 async as well.&lt;/p&gt;
&lt;h2 id=&quot;preparation&quot; tabindex=&quot;-1&quot;&gt;Preparation&lt;/h2&gt;
&lt;p&gt;First, we need proper test scenes. From previous day-to-day observations I know that scenes with many nodes (and lots of content in general) perform worst and make the frame rate stutter apparent. To create such a test glTF I used Blender and a small python script, that creates thousands of cubes.&lt;/p&gt;
&lt;pre class=&quot;language-python&quot;&gt;&lt;code class=&quot;language-python&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;import&lt;/span&gt; bpy

&lt;span class=&quot;token comment&quot;&gt;# count - final count will be 15 to the power of 3 = 3375&lt;/span&gt;
c &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;15&lt;/span&gt;
size &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;.95&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;c
&lt;span class=&quot;token comment&quot;&gt;# Pick a mesh from a previously created cube&lt;/span&gt;
m &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; bpy&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;meshes&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;Cube&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; x &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;c&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; y &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;c&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; z &lt;span class=&quot;token keyword&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;token builtin&quot;&gt;range&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;c&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt;
            bpy&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;ops&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;object&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;add&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token builtin&quot;&gt;type&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&#39;MESH&#39;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;location&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;x&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;c&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; y&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;c&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; z&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;c&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            obj &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; bpy&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;context&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;active_object
            obj&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;scale&lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; size&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
            obj&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; m&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;To my negative surprise this script took quite some time to execute. It could probably be sped up by creating the objects in &lt;code&gt;bpy.data&lt;/code&gt; and linking it to the scene instead of using the &lt;code&gt;bpy.ops.object.add&lt;/code&gt; operator, but I&#39;m here to optimize glTF loading, not this helper script. So I decided to refill my glass of water in the meantime and manually duplicate the resulting cubes a couple more times to have an even bigger scene. It ended up with 13,500 nicely arranged cubes.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/8/01-test-scene-blender.png&quot; alt=&quot;&amp;quot;Test scene in Blender showing thousands of cubes&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;h2 id=&quot;baseline&quot; tabindex=&quot;-1&quot;&gt;Baseline&lt;/h2&gt;
&lt;p&gt;Overall it takes between 550 and 580 ms to load the model. As expected, frame rate is smooth until instantiation. Here&#39;s that nasty last frame that takes ages:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/8/02-sync-profile.png&quot; alt=&quot;&amp;quot;CPU Usage in Profiler showing one long frame&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The instantiation blocks execution for 353 ms and the frame overall takes almost half a second. Key observation breakdown:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;54 ms creating GameObjects&lt;/li&gt;
&lt;li&gt;250 ms populating the hierarchy (e.g. adding &lt;code&gt;Renderer&lt;/code&gt; components)&lt;/li&gt;
&lt;li&gt;33 ms adding the scene (22 ms of that was parenting root nodes to the scene GameObject; more on that later)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;baseline-dots&quot; tabindex=&quot;-1&quot;&gt;Baseline DOTS&lt;/h3&gt;
&lt;p&gt;I also tested the experimental DOTS support. Loading time in the Editor is really bad (almost one second), but quite fast in builds:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;250 ms to 300 ms overall&lt;/li&gt;
&lt;li&gt;133 ms for the last (instantiation) frame&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&#39;s see if we can do something about it.&lt;/p&gt;
&lt;h2 id=&quot;first-async-attempt&quot; tabindex=&quot;-1&quot;&gt;First Async Attempt&lt;/h2&gt;
&lt;p&gt;The first approach was to make the instantiation methods async and add breakpoint in between creating or altering GameObjects.&lt;/p&gt;
&lt;p&gt;A problem that surfaced right away was that now also half-way loaded scenes are already rendered. A quick solution was to de-activate all GameObjects right after they are created and activate them at the end, after all of the scene is ready. The &lt;code&gt;IInstantiator&lt;/code&gt; API does not offer a final callback, so I crammed that into &lt;code&gt;AddScene&lt;/code&gt;, which is not very clean, but does the job for now.&lt;/p&gt;
&lt;p&gt;Here&#39;s the first result:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/8/03-cpu-usage.gif&quot; alt=&quot;&amp;quot;CPU usage async - first attempt&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Although the frame rate drops from 60 to 30 (which could be tweaked), most of the work is spread evenly now. First highlight is GameObject creation and following portion is adding components.&lt;/p&gt;
&lt;p&gt;What&#39;s problematic is that there is still a sharp spike in the end. Let&#39;s look at this frame in detail.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/8/04-async-profile.png&quot; alt=&quot;&amp;quot;CPU usage async - last frame&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Breakdown of causes of this 200 ms frame time:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Scripting
&lt;ul&gt;
&lt;li&gt;56 ms &lt;code&gt;AddScene&lt;/code&gt;
&lt;ul&gt;
&lt;li&gt;20 ms Re-parenting root nodes onto scene GameObject&lt;/li&gt;
&lt;li&gt;22 ms activating all nodes so that they are rendered correctly&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Engine
&lt;ul&gt;
&lt;li&gt;122 ms &lt;code&gt;PostLateUpdate.FinishFrameRendering&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s a bit scary that the engine takes 122 ms, but I assume it&#39;s a consequence of the way the scene is assembled. Instead let&#39;s focus on the scripting part and &lt;code&gt;AddScene&lt;/code&gt;, since we&#39;re in direct control of that.&lt;/p&gt;
&lt;p&gt;Re-parenting thousands of nodes takes some time. We could go ahead and make &lt;code&gt;AddScene&lt;/code&gt; itself async to split it up, but I&#39;d like to address a deeper conceptual problem here. Re-parenting a GameObject has a certain computational overhead (that becomes even bigger if it has a deep hierarchy of children). To avoid this overhead, it&#39;s best to create hierarchies from the root upwards without any unnecessary re-parenting. glTFast does not do this currently.&lt;/p&gt;
&lt;p&gt;IIRC glTFast instantiation historically worked like this:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create all nodes upfront (i.e. nodes of all scenes; &lt;code&gt;IInstantiator.CreateNode&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Parent non-root nodes to their respective parent node (in &lt;code&gt;IInstantiator.SetParent&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Add components&lt;/li&gt;
&lt;li&gt;Add the scene, which parents all root-level nodes/GameObject to the scene GameObject (&lt;code&gt;IInstantiator.AddScene&lt;/code&gt;)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This was nice (I thought) since it did not require the overhead of iterating the hierarchy tree in a particular order. At some point support for scenes was added, and then iterating a scene&#39;s node tree had to be done anyways (see &lt;code&gt;GltfImport.InstantiateSceneInternal.IterateNodes&lt;/code&gt;).&lt;/p&gt;
&lt;h2 id=&quot;refactor-api&quot; tabindex=&quot;-1&quot;&gt;Refactor API&lt;/h2&gt;
&lt;p&gt;The obvious thing to do now is to create the scene hierarchy starting from the root, so that a parent is always created before its children and parented right away. &lt;code&gt;GltfImport.InstantiateSceneInternal&lt;/code&gt; already iterates nodes in that order, but we have to refactor &lt;code&gt;IInstantiator&lt;/code&gt; to pull this off. The new approach is:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Create the scene (which certifies and prepares the root level GameObject)&lt;/li&gt;
&lt;li&gt;Create nodes (in correct order and parented right away)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Also, previously the scene&#39;s node tree was iterated twice. Once for creating the raw hierarchy and a second time to add components (like &lt;code&gt;Renderer&lt;/code&gt;, etc.). We can now do it in one go.&lt;/p&gt;
&lt;p&gt;While the overall loading time did not suffer much at all the worrying last frame improved a bit.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/8/06-async-frame.png&quot; alt=&quot;&amp;quot;CPU usage async - improved attempt&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;code&gt;EndScene&lt;/code&gt;, the logical successor of &lt;code&gt;AddScene&lt;/code&gt; now takes less than 16 ms instead of 56 ms. I book that as a success, even though the engine itself still spends a lot of time preparing the rendering. Still the last frame is now at around 100 ms instead of 450 ms for this corner-case benchmark file.&lt;/p&gt;
&lt;h3 id=&quot;dots&quot; tabindex=&quot;-1&quot;&gt;DOTS&lt;/h3&gt;
&lt;p&gt;Besides adopting the &lt;code&gt;EntityInstantiator&lt;/code&gt; to the new &lt;code&gt;IInstantiator&lt;/code&gt; API I had to fix the visibility problem of half-way loaded scenes as well.&lt;/p&gt;
&lt;p&gt;In the Editor this still perform really bad. Up to 4 full seconds to load the test file. I haven&#39;t certified, but I blame the DOTS Editor UI for the overhead.&lt;/p&gt;
&lt;p&gt;Briefly I thought about optimizing this API for DOTS right away. At the moment I create each node one at a piece via &lt;code&gt;EntityManager.CreateEntity&lt;/code&gt;, which the documentation says this about:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Creating entities with the EntityManager is typically the least efficient method&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;Spawning them all at once in a job seems to be a better solution, but it is a bit more complicated to calculate the number of required entities (which is not necessarily the number of nodes) upfront.&lt;/p&gt;
&lt;p&gt;Taking a look at an actual build (which you should always do), the test file loads in ~400 ms overall with maximum frame times below 33 ms.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/8/07-async-dots.png&quot; alt=&quot;&amp;quot;CPU usage async - DOTS&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Even if it&#39;s slower overall, the frame rate is quite solid and it&#39;s still better than the GameObjects workflow.&lt;/p&gt;
&lt;p&gt;For now I decided to not optimize further, but once DOTS nears maturity, I&#39;ll definitely have to revise this part.&lt;/p&gt;
&lt;h2 id=&quot;side-quest---materials-count&quot; tabindex=&quot;-1&quot;&gt;Side Quest - Materials Count&lt;/h2&gt;
&lt;p&gt;While benchmarking the test scene I noticed that there was an absurd high count of materials, one per cube to be precise.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/8/05-materials.png&quot; alt=&quot;&amp;quot;Memory profiler showing too many materials&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The glTF does not contain any materials, so they fallback to the default material. What&#39;s going on?&lt;/p&gt;
&lt;p&gt;Turns out the fallback material was not cached and re-generated on every occasion. Another small fix along the way.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; tabindex=&quot;-1&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;Reducing maximum frame time from 450 ms to 100 ms (GameObject) and 133 ms to 33 ms (DOTS) is quite the result I was aiming for. There&#39;s still potential for improvements (especially for the DOTS implementation), but that has to wait for later.&lt;/p&gt;
&lt;p&gt;This work landed in glTFast&#39;s main branch and will ship in the upcoming 5.x release.&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot; tabindex=&quot;-1&quot;&gt;Next up&lt;/h2&gt;
&lt;p&gt;I have a couple of half-finishes ideas/posts lying around and haven&#39;t decided which one and when to finish, but I strive to put out posts more regularly in the future.&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;Overview of this mini-series&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 7. Performance Tests</title>
    
    <link href="https://pixel.engineer/posts/gltfast-perf-tests/"/>
    <updated>2021-02-08T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltfast-perf-tests/</id>
    <content type="html">&lt;p&gt;This is part 7 of a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;TL;DR: Being able to check and see how changes to your code affect the performance is super helpful and fun.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I have to admit that lots of my work is not &lt;a href=&quot;https://en.wikipedia.org/wiki/Test-driven_development&quot;&gt;test-driven&lt;/a&gt;, but whenever I tests save my ass (which happened numerous times) I promise to improve.&lt;/p&gt;
&lt;h2 id=&quot;status-quo&quot; tabindex=&quot;-1&quot;&gt;Status Quo&lt;/h2&gt;
&lt;p&gt;Right now glTFast has a couple of Editor and Play Mode tests. The most important ones are the &amp;quot;(Quick)Load&amp;quot; play mode tests. Given a set of sample glTF files, one test per file is created. The only set that is tested currently is Khronos&#39; official &lt;a href=&quot;https://github.com/KhronosGroup/glTF-Sample-Models&quot;&gt;glTF-Sample-Models&lt;/a&gt; (this could and should be extended to further sample files in the future).&lt;/p&gt;
&lt;p&gt;In Unity&#39;s &lt;em&gt;Test Runner&lt;/em&gt; window it (hopefully) looks like this:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/7/tests.png&quot; alt=&quot;&amp;quot;Test Runner window&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The tests themselves only trigger loading the file, wait until it finished and check if no exceptions or error logs happened. What they do not consider is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The speed or duration of the loading process&lt;/li&gt;
&lt;li&gt;The resulting data&lt;/li&gt;
&lt;li&gt;The visual result&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In the &lt;a href=&quot;https://pixel.engineer/posts/gltfast-async-1&quot;&gt;last post&lt;/a&gt; I already mentioned I need a way to automatically and systematically measure performance changes to avoid regressions and this is what the rest of this post will be about.&lt;/p&gt;
&lt;p&gt;Checking the resulting data and the visual results will be a topic for follow-up posts.&lt;/p&gt;
&lt;h2 id=&quot;measuring-performance&quot; tabindex=&quot;-1&quot;&gt;Measuring Performance&lt;/h2&gt;
&lt;p&gt;At first I made the mistake of trying to create my own time measuring tests, only to realize that this is &lt;em&gt;a lot&lt;/em&gt; of work (if done well). I started searching for existing solutions and quickly found the &lt;a href=&quot;https://docs.unity3d.com/Packages/com.unity.test-framework.performance@2.6&quot;&gt;Performance Testing Extension for Unity Test Runner&lt;/a&gt;, a Unity package in preview.&lt;/p&gt;
&lt;p&gt;To get started, you decorate your tests with the &lt;code&gt;Performance&lt;/code&gt; attributes:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token attribute&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;Test&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Performance&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;MyTest&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token range operator&quot;&gt;..&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There&#39;s a couple of things you can measure. I for once wanted:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Repeat test procedure ~ 10 times to even out deviations&lt;/li&gt;
&lt;li&gt;Warmup (run the test at least once first without capturing)&lt;/li&gt;
&lt;li&gt;Capture total duration&lt;/li&gt;
&lt;li&gt;Capture frame times&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The performance tests framework offers all of these out of the box, but I couldn&#39;t find a straight-forward way to combine them all. So took care about warmup and repetition manually. Here&#39;s one of the results:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token attribute&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;UnityTest&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token attribute&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;UseGltfSampleSetTestCase&lt;/span&gt;&lt;span class=&quot;token attribute-arguments&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;glTFSampleSetJsonPath&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token attribute&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;Performance&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;IEnumerator&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;SmoothLoading&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;SampleSetItem&lt;/span&gt; testCase&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    Debug&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token interpolation-string&quot;&gt;&lt;span class=&quot;token string&quot;&gt;$&quot;Testing &lt;/span&gt;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;&lt;span class=&quot;token expression language-csharp&quot;&gt;testCase&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;path&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; go &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;GameObject&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; deferAgent &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; go&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token generic-method&quot;&gt;&lt;span class=&quot;token function&quot;&gt;AddComponent&lt;/span&gt;&lt;span class=&quot;token generic class-name&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;TimeBudgetPerFrameDeferAgent&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// Set up measuring total duration&lt;/span&gt;
    &lt;span class=&quot;token class-name&quot;&gt;SampleGroup&lt;/span&gt; loadTime &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;SampleGroup&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;LoadTime&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; SampleUnit&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Millisecond&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token comment&quot;&gt;// First time without measuring&lt;/span&gt;
    &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; task &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;LoadGltfSampleSetItem&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;testCase&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; go&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; deferAgent&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;WaitForTask&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;task&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;using&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Measure&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Frames&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Scope&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token comment&quot;&gt;// Repeat test `k_Repetitions` times&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; k_Repetitions&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            task &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;LoadGltfSampleSetItem&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;testCase&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; go&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; deferAgent&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; loadTime&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;WaitForTask&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;task&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
            &lt;span class=&quot;token comment&quot;&gt;// Wait one more frame. Usually some more action happens in this one.&lt;/span&gt;
            &lt;span class=&quot;token keyword&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    Object&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Destroy&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;go&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&quot;inspecting-results&quot; tabindex=&quot;-1&quot;&gt;Inspecting Results&lt;/h2&gt;
&lt;p&gt;The package comes with a brilliant, dedicated window for test inspection.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/7/test-report.png&quot; alt=&quot;&amp;quot;Test Runner window&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The top bar chart shows the frame times. This particular test has extreme outliers which should be investigated.&lt;/p&gt;
&lt;p&gt;The second chart shows the overall loading times with 10 repetitions.&lt;/p&gt;
&lt;h2 id=&quot;comparing-results&quot; tabindex=&quot;-1&quot;&gt;Comparing Results&lt;/h2&gt;
&lt;p&gt;So far this package is already very useful, but the next step is to re-run the tests after changes were made to see how the results compare. Or run the tests on different hardware or build configurations.&lt;/p&gt;
&lt;h3 id=&quot;performance-benchmark-reporter&quot; tabindex=&quot;-1&quot;&gt;Performance Benchmark Reporter&lt;/h3&gt;
&lt;p&gt;The documentation refer to a tool called &lt;a href=&quot;https://github.com/Unity-Technologies/PerformanceBenchmarkReporter&quot;&gt;Performance Benchmark Reporter&lt;/a&gt;. It&#39;s a .NET based CLI tool that takes the output files that performance tests creates and produces an HTML page with comparison bar charts for each test. Here&#39;s how it looks like:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/7/img-perf-reporter.png&quot; alt=&quot;&amp;quot;Screenshot of the Performance Benchmark Reporter HTML page&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Initially I was keen on it, since it provided a solution out of the box. The more I played with it, the more flaws I found:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It does not really give you a quick overview across all test. You have to disable &amp;quot;filter by failed&amp;quot; and scroll through results a lot to get a feel.&lt;/li&gt;
&lt;li&gt;It is &lt;strong&gt;very&lt;/strong&gt; slow to view and react as the number of tests goes into the hundreds. Resizing the window took seconds. It uses HTML canvases to draw the charts for each test on a single (non lazy-loaded page)&lt;/li&gt;
&lt;li&gt;No way of filtering or sorting for total load times or frame times (those are called &lt;em&gt;sample groups&lt;/em&gt;, in performance test speak)&lt;/li&gt;
&lt;li&gt;Result files have to be copied and building the page has to be triggered manually.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;custom-charts&quot; tabindex=&quot;-1&quot;&gt;Custom Charts&lt;/h3&gt;
&lt;p&gt;Disappointed with the fact that I couldn&#39;t quickly make sense of all this cool results that I had gathered I thought that I could probably make some charts myself if only I had the data in a spread sheet.&lt;/p&gt;
&lt;p&gt;Since the test data is written to disk in a JSON file I tried to quickly hack a script that does exactly that: Output the data I wanted in a CSV file that can be copy pasted into my spread sheet app of choice.&lt;/p&gt;
&lt;p&gt;The output of the &lt;a href=&quot;https://github.com/atteneder/PerformanceBenchmarkReporterPython&quot;&gt;Python script&lt;/a&gt; allowed me to create my own charts.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/7/result-chart.png&quot; alt=&quot;&amp;quot;Screenshot of a spread sheet with custom charts&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;I still have to trigger the creation of CSV files and copy-paste them manually, but making and updating meaningful charts is doable now.&lt;/p&gt;
&lt;h2 id=&quot;putting-it-to-use&quot; tabindex=&quot;-1&quot;&gt;Putting it to use&lt;/h2&gt;
&lt;p&gt;Before I created this setup I used to compare a handful of manually selected models and watched at the total time all tests would take, which was tedious and prone to overseeing something.&lt;/p&gt;
&lt;p&gt;Creating this kind of performance benchmark set me back a bit, but I cannot stress enough how much more fun it is to tweak performance now and seeing actual, exact results across all test files in a matter of minutes.&lt;/p&gt;
&lt;p&gt;It helped me spot regressions already and I&#39;m sure it will in the future.&lt;/p&gt;
&lt;h2 id=&quot;follow-up&quot; tabindex=&quot;-1&quot;&gt;Follow-Up&lt;/h2&gt;
&lt;p&gt;There&#39;s still a lot room to improve:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Add more real-world and corner case glTF models to tests (e.g. high polygon models)&lt;/li&gt;
&lt;li&gt;Measure memory usage&lt;/li&gt;
&lt;li&gt;Run benchmarks on mobile and WebGL platforms (at the moment it&#39;s only Editor and standalone)&lt;/li&gt;
&lt;li&gt;Automate creating comparison charts (further)&lt;/li&gt;
&lt;li&gt;Automate running the tests&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When it comes to quality assurance in general, there&#39;s even more missing pieces&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Code Coverage&lt;/li&gt;
&lt;li&gt;Render Tests&lt;/li&gt;
&lt;li&gt;Integration of all test in Continuous Integration&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I hope I&#39;ll be able to push in this direction as we go. It certainly won&#39;t get boring soon.&lt;/p&gt;
&lt;h2 id=&quot;next&quot; tabindex=&quot;-1&quot;&gt;Next&lt;/h2&gt;
&lt;p&gt;In the next post I&#39;ll try to show how the performance benchmarks helped me improve frame rate and loading speed alike.&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Next: &lt;a href=&quot;https://pixel.engineer/posts/gltfast-async-instantiation&quot;&gt;Asynchronous Scene Instantiation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;Overview of this mini-series&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 6. Asynchronous Programming</title>
    
    <link href="https://pixel.engineer/posts/gltfast-async-1/"/>
    <updated>2021-01-27T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltfast-async-1/</id>
    <content type="html">&lt;p&gt;This is part 6 of a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;TL;DR: &lt;a href=&quot;https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/&quot;&gt;C# Asynchronous Programming&lt;/a&gt; opens opportunities to make your code clean and fast&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&quot;disclaimer&quot; tabindex=&quot;-1&quot;&gt;Disclaimer&lt;/h2&gt;
&lt;p&gt;Before we even start, two things:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;This is not an extensive comparison between &lt;a href=&quot;https://docs.unity3d.com/Manual/Coroutines.html&quot;&gt;Coroutines&lt;/a&gt; and &lt;a href=&quot;https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/&quot;&gt;asynchronous programming&lt;/a&gt;. I highly recommend watching the presentation &lt;a href=&quot;https://www.youtube.com/watch?v=7eKi6NKri6I&quot;&gt;&lt;em&gt;Best practices: Async vs. coroutines - Unite Copenhagen&lt;/em&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Async is not generally a replacement for Coroutines in Unity. The two have their respective pros and cons. Always choose the right tool for the task!&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Cool? Then lets get started!&lt;/p&gt;
&lt;h2 id=&quot;why-do-we-need-coroutines&quot; tabindex=&quot;-1&quot;&gt;Why do we need coroutines&lt;/h2&gt;
&lt;p&gt;When a glTF file exceeds a certain size, loading it might take longer than there is time available within a single render cycle (frame). If that happens, certain frames take very long, your framerate starts to drop and you will see visual stuttering (which is especially bad in immersive XR). A way to prevent this is to split up the work into smaller portions and spread those evenly across frames.&lt;/p&gt;
&lt;p&gt;Another use case are I/O bound tasks, like loading a file into memory or performing a download. While the data loads, rendering should continue without hiccups.&lt;/p&gt;
&lt;h2 id=&quot;unity-coroutines&quot; tabindex=&quot;-1&quot;&gt;Unity Coroutines&lt;/h2&gt;
&lt;p&gt;One way to achieve this is to use &lt;a href=&quot;https://docs.unity3d.com/Manual/Coroutines.html&quot;&gt;Unity Coroutines&lt;/a&gt;. They creatively use an enumerator pattern to defer the execution of parts of a method to a later point in time. Whenever you call &lt;code&gt;yield return null;&lt;/code&gt; in your coroutine, execution would stop at this point and continue in the next frame.&lt;/p&gt;
&lt;p&gt;Some pseudo code example demonstrating how coroutines were used in glTFast&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;
&lt;span class=&quot;token comment&quot;&gt;// The lack of return values is compensated by storing them in class variables&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; globalVar&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Start&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// Notice: you need a reference to a MonoBehaviour (this)&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// to start the coroutine&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;this&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;StartCoroutine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;SplitTheWorkCoroutine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token return-type class-name&quot;&gt;IEnumerator&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;SplitTheWorkCoroutine&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  Debug&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;I&#39;m shown right away&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;null&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Pause here and return a frame later&lt;/span&gt;
  Debug&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;I&#39;m shown on the next frame&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;yield&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;WaitForSeconds&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Pause here and return a second later&lt;/span&gt;
  Debug&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;I&#39;m shown a second later&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// storing some result state&lt;/span&gt;
  globalVar &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Coroutines are lean and effective. They served me well a lot of times. With regards towards the future of glTFast optimizations, they raise some concerns though:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;You need a MonoBehaviour to run the coroutine&lt;/li&gt;
&lt;li&gt;Coroutines must be run exclusively on the main thread&lt;/li&gt;
&lt;li&gt;No support for return values, so you need a workaround (storing results in class variables mostly)&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The future of high performance computing in Unity is called &lt;a href=&quot;https://unity.com/dots&quot;&gt;DOTS&lt;/a&gt;. Part of this philosophy is about replacing GameObjects and MonoBehaviours with Entities and (data only) Components. At the time of writing DOTS is still under development and I haven&#39;t found the time to experiment with it. What seems clear though is, that depending on MonoBehaviour and coroutines is not future proof.&lt;/p&gt;
&lt;p&gt;glTFast already uses the C# Job System to offload work to other threads. Code that operates on managed data cannot be run with C# Job system though.&lt;/p&gt;
&lt;h2 id=&quot;async-to-the-rescue&quot; tabindex=&quot;-1&quot;&gt;Async to the rescue&lt;/h2&gt;
&lt;p&gt;The more I learned about &lt;a href=&quot;https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/&quot;&gt;asynchronous programming&lt;/a&gt;, the more it appeared to be a viable option for glTF. Replacing coroutines would make changes to the API necessary, but turned out to not be too much of work.&lt;/p&gt;
&lt;p&gt;Example above converted:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;
&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Start&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// This method runs non-blocking, but is &quot;awaited&quot;. Its return value can be used afterwards.&lt;/span&gt;
  &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; result &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;SplitTheWork&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;async&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;Task&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;SplitTheWork&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  Debug&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;I&#39;m shown right away&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; Task&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Yield&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Pause here and return a frame later&lt;/span&gt;
  Debug&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;I&#39;m shown on the next frame&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;await&lt;/span&gt; Task&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Delay&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1000&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// Pause here and return a second later&lt;/span&gt;
  Debug&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Log&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;I&#39;m shown a second later&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  &lt;span class=&quot;token keyword&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;42&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// You can return values like any regular method&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&quot;benefits&quot; tabindex=&quot;-1&quot;&gt;Benefits&lt;/h2&gt;
&lt;p&gt;To sum up some of the benefits of async:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;No more MonoBehaviour&lt;/li&gt;
&lt;li&gt;Pushing work on other threads should become easier (more on that in a next article)&lt;/li&gt;
&lt;li&gt;Async method can have return values
&lt;ul&gt;
&lt;li&gt;This resulted in less boilerplate code and removal of class variables for result state&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;More meaningful stack traces (also in profiler)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;benchmarking-the-results&quot; tabindex=&quot;-1&quot;&gt;Benchmarking the results&lt;/h2&gt;
&lt;p&gt;I did not expect any significant change in performance between the two approaches. In order not to introduce performance regressions, I did some random benchmark.&lt;/p&gt;
&lt;p&gt;For the tests I used a couple of different glTFs with unique characteristics, loaded them repeatedly and picked average values. The differences were similar across models.&lt;/p&gt;
&lt;p&gt;Example model: Buggy. This glTF has a somewhat complex scene graph (and thus bigger JSON part), but no textures.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;glTFast 2.5.1: 80 ms (17.6 ms max frame time)&lt;/li&gt;
&lt;li&gt;glTFast async: 44 ms (21 ms max frame time)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Wow, that&#39;s a LOT of unexpected difference. Looking at the profiling data it seems that the coroutine version renders another two frames in between starting loading and instancing the glTF. The CPU cycles of those two frames should be put to use. It turns out, I was a little too aggressive in the past when it comes to spreading work across frames. The code waited for no reason for the next frame twice.&lt;/p&gt;
&lt;p&gt;On the other hand, notice the increased max frame time, which botches the desired frame rate. The async version does not defer work so aggressively and thus executes too much work within one cycle. This was especially bad on models with many Jpeg/PNG textures.&lt;/p&gt;
&lt;p&gt;The conclusion here is that the decision at what point execution should pause until the next frame is subject to optimization.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; tabindex=&quot;-1&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;I am not happy with the fact, that I discovered these unexpected changes by coincidence and not systematically. The regressions of the async branch need to be eliminated/reduced.&lt;/p&gt;
&lt;p&gt;But even more important, I&#39;m gonna need some automated performance testing, so that regressions don&#39;t go (un)detected by coincidence in the future.&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot; tabindex=&quot;-1&quot;&gt;Next up&lt;/h2&gt;
&lt;p&gt;I&#39;ve already started working on both performance test automation and optimizing async processes. Very likely I&#39;ll write about it.&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Next: &lt;a href=&quot;https://pixel.engineer/posts/gltfast-perf-tests&quot;&gt;Performance Tests&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;Overview of this mini-series&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Matrix Decomposition in Unity</title>
    
    <link href="https://pixel.engineer/posts/matrix-decomposition-unity/"/>
    <updated>2020-11-16T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/matrix-decomposition-unity/</id>
    <content type="html">&lt;blockquote&gt;
&lt;p&gt;TL;DR: Unity&#39;s &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Matrix4x4-rotation.html&quot;&gt;Matrix4x4.rotation&lt;/a&gt; and &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Matrix4x4-lossyScale.html&quot;&gt;Matrix4x4.lossyScale&lt;/a&gt; deliver incorrect results on some corner-cases. Here&#39;s an alternative &lt;a href=&quot;https://pixel.engineer/posts/matrix-decomposition-unity/#solution&quot;&gt;solution&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;In this article I&#39;ll tell you about the journey from detecting the problem towards solving it but won&#39;t go into math details. Feel free to jump ahead to the &lt;a href=&quot;https://pixel.engineer/posts/matrix-decomposition-unity/#problem-definition&quot;&gt;problem definition&lt;/a&gt; and the &lt;a href=&quot;https://pixel.engineer/posts/matrix-decomposition-unity/#solution&quot;&gt;solution&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;the-issue&quot; tabindex=&quot;-1&quot;&gt;The Issue&lt;/h2&gt;
&lt;p&gt;A while ago an &lt;a href=&quot;https://github.com/atteneder/glTFast/issues/99&quot;&gt;issue&lt;/a&gt; was raised which observed strange rotation errors when loading a glTF 3D assets in Unity with glTFast and I could reproduce it:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/matrix_decomp/3d_incorrect.png&quot; alt=&quot;&amp;quot;3D scene showing a car in Unity with incorrect rotations on some objects&amp;quot;&quot; title=&quot;Some objects are misplaced&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Some objects like tires, doors and head lights are rotated incorrectly ☹️. In order to draw comparison (and rule out content errors) I viewed the file in the &lt;a href=&quot;https://gltf-viewer.donmccurdy.com/&quot;&gt;glTF Viewer&lt;/a&gt; and imported it into Blender. In both cases it seemed to work fine:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/matrix_decomp/blender.png&quot; alt=&quot;&amp;quot;3D scene showing a car in Blender with correct rotations and the transform values of the tire object&amp;quot;&quot; title=&quot;The asset imported in Blender looks correct. Notice the negative scaling!&quot; /&gt;&lt;/p&gt;
&lt;p&gt;What&#39;s cool in Blender is that you can inspect and compare the transform values. I picked one of the tires to inspect closer. If you look at the transform panel (on the right side in the image), you notice that the scale is a uniform negative 1. While that&#39;s not good practice, it certainly shouldn&#39;t break positioning of objects. The rotation is shown as quaternion. I switched it to Euler, so it&#39;s easier to compare it with Unity inspector&#39;s transform component.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/matrix_decomp/blender_euler.png&quot; alt=&quot;&amp;quot;Transform values of the tire object with rotation as XYZ euler angles&amp;quot;&quot; title=&quot;Rotation in XYZ Euler angles&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Let&#39;s compare this to the transform values in Unity:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/matrix_decomp/insp_incorrect.png&quot; alt=&quot;&amp;quot;Transform component of the tire object in Unity&amp;quot;&quot; title=&quot;The values differ from Blender&quot; /&gt;&lt;/p&gt;
&lt;p&gt;They differ! Even when considering that Unity and Blender have different coordinate systems (Y-up versus Z-up), in Unity…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;…the scale is negative in X only&lt;/li&gt;
&lt;li&gt;…rotation values are not equal&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Let&#39;s found out why!&lt;/p&gt;
&lt;h2 id=&quot;lucky-guess&quot; tabindex=&quot;-1&quot;&gt;Lucky guess&lt;/h2&gt;
&lt;p&gt;In the past I rarely had problems with glTF files exported from Blender, so I re-exported the asset and imported this second version in Unity again. This time it worked, so I started comparing the original and the Blender export. glTF&#39;s scene definition is JSON based and thus readable.&lt;/p&gt;
&lt;p&gt;In glTF, a node&#39;s transformation can be defined by either a tuple of &lt;em&gt;translation&lt;/em&gt;, &lt;em&gt;rotation&lt;/em&gt;, and &lt;em&gt;scale&lt;/em&gt; or a (4-by-4) &lt;em&gt;transformation matrix&lt;/em&gt;. It turned out the original was using matrices only and the re-export separate transformations 💡.&lt;/p&gt;
&lt;h2 id=&quot;enter-the-matrix&quot; tabindex=&quot;-1&quot;&gt;Enter the Matrix&lt;/h2&gt;
&lt;p&gt;I decided to dig down into the matrix import code, which consists of two steps:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Convert the matrix from glTF&#39;s coordinate space into Unity&#39;s (by flipping signs on certain values)&lt;/li&gt;
&lt;li&gt;Decomposing the matrix into separate &lt;em&gt;translation&lt;/em&gt;, &lt;em&gt;rotation&lt;/em&gt;, and &lt;em&gt;scale&lt;/em&gt; (since you cannot assign a full matrix to a Unity &lt;code&gt;Transforms&lt;/code&gt;).&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;I tinkered with the space conversion without any luck. I tried a different approach (via &lt;a href=&quot;https://gist.github.com/atteneder/594d4d6ac8bbf88d3c4efd0564fea75e&quot;&gt;conversion matrix&lt;/a&gt; multiplication), which yielded the same result as before but didn&#39;t solve the issue.&lt;/p&gt;
&lt;p&gt;I then wanted to see if omitting the space conversion yielded in the desired uniform scale of -1, which it didn&#39;t. I started to suspect the error lies in the matrix decomposition, which looked something like this:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// Given is a matrix (already converted to Unity space):&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;Matrix4x4&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;m&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;ValidTRS&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
  &lt;span class=&quot;token comment&quot;&gt;// Translation is the first three values in the last column&lt;/span&gt;
  position &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;Vector3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;m03&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;m13&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;m23 &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  rotation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;rotation&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
  scale &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;lossyScale&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Both &lt;code&gt;Matrix4x4.rotation&lt;/code&gt; and &lt;code&gt;Matrix4x4.lossyScale&lt;/code&gt; are undisclosed, so I couldn&#39;t investigate how they work.&lt;/p&gt;
&lt;p&gt;Before going on I had to freshen it up my linear algebra theory and picked up &lt;a href=&quot;https://www.amazon.com/dp/0985811749/?tag=terathon-20&quot;&gt;&lt;em&gt;Foundations of Game Engine Development, Volume 1: Mathematics&lt;/em&gt;&lt;/a&gt; by &lt;a href=&quot;http://terathon.com/lengyel&quot;&gt;&lt;em&gt;Eric Lengyel&lt;/em&gt;&lt;/a&gt; (great book!) and found some useful inputs. It was immensely helpful when trying to understand other people&#39;s code.&lt;/p&gt;
&lt;p&gt;There are &lt;a href=&quot;https://en.wikipedia.org/wiki/Matrix_decomposition&quot;&gt;many different ways&lt;/a&gt; to decompose a matrix. I found some algorithms and tried one of them out. Still the same error, but identical (which is also a great observation). I now had a starting point to compare to. Unfortunately I discarded this interim result and cannot find the source anymore.&lt;/p&gt;
&lt;p&gt;The next thought was &amp;quot;What does Blender do different than this Unity script?&amp;quot;. Since Blender is open source it only seemed natural to me to look it up. Turns out Blender does an additional negativity check on the rotation matrix and flips both scale and rotation in some cases. Sounds exactly like what it&#39;s missing, so I decided to port this code and use the &lt;a href=&quot;https://docs.unity3d.com/Manual/com.unity.mathematics.html&quot;&gt;Unity.Mathematics&lt;/a&gt; package for it. It already contains types and methods I&#39;d need (like float3x3, a 3-by-3 matrix), which saved a lot of time. It&#39;s also said to be well optimized, so yay.&lt;/p&gt;
&lt;p&gt;Behold the results:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/matrix_decomp/3d_correct.png&quot; alt=&quot;&amp;quot;3D scene showing a car in Unity with correct rotations on all objects&amp;quot;&quot; title=&quot;Now it&#39;s correct&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/matrix_decomp/insp_correct.png&quot; alt=&quot;&amp;quot;Transform component of the tire object in Unity&amp;quot;&quot; title=&quot;Now rotation and scale are correct&quot; /&gt;&lt;/p&gt;
&lt;p&gt;🎉🚙😎&lt;/p&gt;
&lt;h2 id=&quot;performance&quot; tabindex=&quot;-1&quot;&gt;Performance&lt;/h2&gt;
&lt;p&gt;Whilst at it, I ran the conversions in a loop to see how they perform:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/matrix_decomp/profile.png&quot; alt=&quot;&amp;quot;Screenshot of Unity Profiler showing timings and memory consumptions&amp;quot;&quot; title=&quot;Correct, but slower&quot; /&gt;&lt;/p&gt;
&lt;p&gt;So the downside of the correct solution (&lt;code&gt;Matrix4x4.DecomposeCustom&lt;/code&gt;) is that it&#39;s ~2.3 times slower. This is done once per node. The typical scene won&#39;t have a large number of nodes, so it&#39;s safe to neglect the minor performance loss.&lt;/p&gt;
&lt;p&gt;There&#39;s also a tiny raise of memory allocations. I made another, pure &lt;code&gt;Unity.Mathematics&lt;/code&gt; types based variant (&lt;code&gt;float4x4.Decompose&lt;/code&gt;), which does eliminate this flaw. Another good reason to switch to &lt;code&gt;Unity.Mathematics&lt;/code&gt; types overall.&lt;/p&gt;
&lt;h2 id=&quot;problem-definition&quot; tabindex=&quot;-1&quot;&gt;Problem Definition&lt;/h2&gt;
&lt;p&gt;The original problem turned out to be a corner-case matrix with…&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Negative scale&lt;/li&gt;
&lt;li&gt;Rotations in multiple axis, one of them being 45° (so perfectly in-between quarter rotations)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Separating rotation and scale is a non-trivial problem that only gets harder if you cannot assume that the scale is positive. I assume Unity&#39;s &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Matrix4x4-rotation.html&quot;&gt;Matrix4x4.rotation&lt;/a&gt; and &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Matrix4x4-lossyScale.html&quot;&gt;Matrix4x4.lossyScale&lt;/a&gt; was written with only positive scales in mind. It&#39;s also worth mentioning that they are two separate, non-coherent calculations (hidden behind properties). It may be, that they&#39;re not consistently aligned for corner-cases.&lt;/p&gt;
&lt;h2 id=&quot;solution&quot; tabindex=&quot;-1&quot;&gt;Solution&lt;/h2&gt;
&lt;p&gt;The solution was to port the matrix decomposition algorithm of Blender to C#:&lt;/p&gt;
&lt;p&gt;Here&#39;s the &lt;a href=&quot;https://gist.github.com/atteneder/b6675c9a73860c00d795dcea7149e8d2&quot;&gt;Solution Source Code in C#&lt;/a&gt; ✅&lt;/p&gt;
&lt;p&gt;It can be used like so:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// Given this matrix&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;Matrix4x4&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;// But could also be this Unity.Mathematics type&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;float4x4&lt;/span&gt; m&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

m&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Decompose&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;out&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
position &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; t&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
rotation &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; r&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
scale &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; s&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I hope that was helpful.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 5. New Mesh API - The Refactor</title>
    
    <link href="https://pixel.engineer/posts/gltfast-new-mesh-api-2/"/>
    <updated>2020-04-13T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltfast-new-mesh-api-2/</id>
    <content type="html">&lt;p&gt;This is part 5 of a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;TL;DR: The speed improvements from the advanced Mesh API for glTFast are marginal and needed to be earned via a time-consuming refactor.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id=&quot;goal&quot; tabindex=&quot;-1&quot;&gt;Goal&lt;/h2&gt;
&lt;p&gt;In the &lt;a href=&quot;https://pixel.engineer/posts/gltfast-new-mesh-api-1&quot;&gt;last post&lt;/a&gt; I concluded that in order to use the advanced Mesh API to its fullest, I&#39;d have to refactor a lot of things. In this post I want to show you what I did and compare the results.&lt;/p&gt;
&lt;h2 id=&quot;problem&quot; tabindex=&quot;-1&quot;&gt;Problem&lt;/h2&gt;
&lt;p&gt;Meshes&#39; vertices come with a couple of attributes. At least they have positions (in 3D space), but most often also normals, texture coordinates, tangents and sometimes colors, weights and even more texture coordinates. The advanced Mesh API only supports up to 4 vertex streams.&lt;/p&gt;
&lt;h2 id=&quot;solution&quot; tabindex=&quot;-1&quot;&gt;Solution&lt;/h2&gt;
&lt;p&gt;Once you have more than 4 vertex attributes, you have to combine some of them into one interleaved vertex stream. From a memory layout perspective this means instead of having one array per vertex attribute you now have to create an array of structs (AoS), where the struct contains multiple vertex attributes.&lt;/p&gt;
&lt;p&gt;So instead of this…&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;vertexCount &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; positions &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;Vector3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;vertexCount&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; normals &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;Vector3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;vertexCount&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; tangents &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;Vector4&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;vertexCount&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;…you create something like this:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token attribute&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;StructLayout&lt;/span&gt;&lt;span class=&quot;token attribute-arguments&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;LayoutKind&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Sequential&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vertex&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vector3&lt;/span&gt; position&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vector3&lt;/span&gt; normal&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;Vector4&lt;/span&gt; tangent&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

vertexCount &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;100&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; vertexData &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;NativeArray&lt;span class=&quot;token punctuation&quot;&gt;&amp;lt;&lt;/span&gt;Vertex&lt;span class=&quot;token punctuation&quot;&gt;&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;vertexCount&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;Allocator&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;TempJob&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The next step is to retrieve the data from the glTF buffers into the NativeArray. This was done via C# jobs before and has to be changed, so that the interleaved nature of the output array is respected. I introduced an &lt;code&gt;outputByteStride&lt;/code&gt; parameter for that reason. A new C# Job looks something like this:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;unsafe&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;GetVector3sInterleavedJob&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token type-list&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;IJobParallelFor&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; inputByteStride&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; outputByteStride&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; Vector3&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Execute&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; resultV &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;result&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;outputByteStride&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; off &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; input &lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;inputByteStride&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Vector2&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;resultV&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;Vector2&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;off&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;resultV&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;float&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;off&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I had to change all existing &amp;gt;50 Jobs to support output byte strides. On top of that, I had to create new data structures/classes and determine the way vertex data will be clustered retrieved before being able to schedule these new Jobs.&lt;/p&gt;
&lt;p&gt;This better be worth it 😃&lt;/p&gt;
&lt;h2 id=&quot;benchmarks&quot; tabindex=&quot;-1&quot;&gt;Benchmarks&lt;/h2&gt;
&lt;h3 id=&quot;high-polygon-scene&quot; tabindex=&quot;-1&quot;&gt;High polygon scene&lt;/h3&gt;
&lt;p&gt;First test scene is a high resolution mesh (4 million triangles) with normals tangents and texture coordinates. These are the most important timings when loading it with glTFast 1.0.0&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Prepare&lt;/td&gt;
&lt;td&gt;12 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieve Data&lt;/td&gt;
&lt;td&gt;19 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set Positions&lt;/td&gt;
&lt;td&gt;21 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set Indices&lt;/td&gt;
&lt;td&gt;65 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set UVs&lt;/td&gt;
&lt;td&gt;34 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set Normals&lt;/td&gt;
&lt;td&gt;59 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set Tangents&lt;/td&gt;
&lt;td&gt;101 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Initial test were not very promising. Overall loading times were identical or worse, not better. A little investigation showed, that setting the data was taking too long. This led to a first learning: the advanced Mesh API is faster, because it makes certain sanity checks (like index out of bounds) optional. But to benefit from that, you&#39;d have to actually disable those checks via &lt;code&gt;MeshUpdateFlags&lt;/code&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;MeshUpdateFlags&lt;/span&gt; flags &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt;
  MeshUpdateFlags&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DontNotifyMeshUsers
  &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; MeshUpdateFlags&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DontRecalculateBounds
  &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; MeshUpdateFlags&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DontResetBoneBounds
  &lt;span class=&quot;token operator&quot;&gt;|&lt;/span&gt; MeshUpdateFlags&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;DontValidateIndices&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

msh&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;SetVertexBufferParams&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;…&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
msh&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;SetVertexBufferData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;vertexData&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;vertexData&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Length&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;stream&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;flags&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
msh&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;SetIndexBufferData&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;indices&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;indexCount&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;indices&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Length&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;flags&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
msh&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;SetSubMesh&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token constructor-invocation class-name&quot;&gt;SubMeshDescriptor&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;indexCount&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;indices&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;Length&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;topology&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;flags&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;After I did this, I got improved timings.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phase&lt;/th&gt;
&lt;th&gt;Duration&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Alloc VB array&lt;/td&gt;
&lt;td&gt;55 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alloc UV array&lt;/td&gt;
&lt;td&gt;8 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Alloc Index array&lt;/td&gt;
&lt;td&gt;11 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Retrieve Data&lt;/td&gt;
&lt;td&gt;30 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set VB Params&lt;/td&gt;
&lt;td&gt;47 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set VB Data&lt;/td&gt;
&lt;td&gt;14 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Apply UV&lt;/td&gt;
&lt;td&gt;3 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set Index Data&lt;/td&gt;
&lt;td&gt;31 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Set SubMesh&lt;/td&gt;
&lt;td&gt;0.004 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Recalculate Bounds&lt;/td&gt;
&lt;td&gt;38 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Since so many things changed at once here (the structure of the code base and the procedure of loading in general), we have totally different load phases and to some extent this is like comparing apples and oranges. I&#39;ll try to interpret this as fair as possible and figure out the actual factors of causation.&lt;/p&gt;
&lt;p&gt;The former &lt;em&gt;Prepare&lt;/em&gt; phase consistent mostly of allocating C# arrays for mesh data (like &lt;code&gt;Vector3[]&lt;/code&gt; for positions) and was surprisingly fast compared to the newer allocations of NativeArrays (Alloc VB=VertexBuffer, UV=Texture Coordinates and Index arrays). Since in the past I got errors when holding NativeArrays for more than 4 frames (which occasionally happens when bulk-loading many scenes), I used the slower persistent allocator. But even changing the Allocator type to TempJob didn&#39;t bring much relieve.&lt;/p&gt;
&lt;p&gt;Retrieving data (via C# Jobs) became up to 50% slower. I assume the addition of the output byte stride is the reason. The positive thing to mention is that the vertex buffer data jobs start in parallel before the Index buffers are allocated now (5 ms - 10 ms overlap). In some tests runs data retrieving was not slower overall because of that. Makes me wonder how much potential lies in restructuring the Coroutines.&lt;/p&gt;
&lt;p&gt;Let&#39;s compare setting the data. The new &lt;em&gt;Set VB params&lt;/em&gt; and &lt;em&gt;Set VB Data&lt;/em&gt; (61 ms combined) are replacing the former &lt;em&gt;Set Positions&lt;/em&gt;, &lt;em&gt;Set Normals&lt;/em&gt; and &lt;em&gt;Set Tangents&lt;/em&gt; (181 ms combined). Hard to tell why it&#39;s so much faster. Maybe the promised benefits of the advanced Mesh API, maybe the old interface triggers additional, dispensable calculations.&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Set UVs&lt;/em&gt; was replaced by &lt;em&gt;Apply UV&lt;/em&gt; and is 11 times faster!&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Set Indices&lt;/em&gt; (65 ms) was replaced by &lt;em&gt;Set Index Data&lt;/em&gt; and &lt;em&gt;Set Submesh&lt;/em&gt; (combined 31 ms). I assume this is because the old interface calculates the mesh bounds automatically. I added it (after getting errors about invalid bounds), which takes 38 ms. So overall this is comparable to slightly slower.&lt;/p&gt;
&lt;p&gt;Most important is the overall load times, so I made 10 runs reach and it got ~14% faster.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;1.0.0&lt;/th&gt;
&lt;th&gt;new Mesh API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;High Polygon Scene&lt;/td&gt;
&lt;td&gt;716 ms&lt;/td&gt;
&lt;td&gt;618 ms&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3 id=&quot;test-sample-set&quot; tabindex=&quot;-1&quot;&gt;Test sample set&lt;/h3&gt;
&lt;p&gt;Even with a specific test scene it is hard to lay the finger on where exactly things got better or worse without knowing the inner workings. So I tried to get a feeling for the overall impact on a wide variety of lower complex scenes. I took the &lt;a href=&quot;https://github.com/KhronosGroup/glTF-Sample-Models&quot;&gt;glTF Sample Model set&lt;/a&gt;, removed irrelevant scenes (embed buffers, draco compression) and loaded the 114 files all at once.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;1.0.0&lt;/th&gt;
&lt;th&gt;new Mesh API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;glTF Sample Models&lt;/td&gt;
&lt;td&gt;10.6 sec&lt;/td&gt;
&lt;td&gt;10.1 sec&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;That&#39;s an ~4% - 5% improvement.&lt;/p&gt;
&lt;h2 id=&quot;conclusion&quot; tabindex=&quot;-1&quot;&gt;Conclusion&lt;/h2&gt;
&lt;p&gt;The way I implemented usage of the advanced Mesh API is an ever so slight improvement, no doubt. It is not the silver bullet I hoped it would be though.&lt;/p&gt;
&lt;p&gt;I still think it is a good foundation to build upon and try the next couple of things. Maybe I missed something and use it in an non-optimal way. I&#39;d be delighted to be corrected, in that case.&lt;/p&gt;
&lt;p&gt;This work is not in a release yet, as I think it needs some more work and polishing before going live.&lt;/p&gt;
&lt;h2 id=&quot;where-to-go-from-here&quot; tabindex=&quot;-1&quot;&gt;Where to go from here&lt;/h2&gt;
&lt;p&gt;With this refactor work being out of the way, I can move on to try other ideas for improvement. Some candidates that popped up as a result of this post:&lt;/p&gt;
&lt;p&gt;Maybe it is better to set a mesh&#39;s vertex parameters before/in parallel to data retrieval.&lt;/p&gt;
&lt;p&gt;One thing is to take a look at are the now slower Jobs:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Schedule them as soon as possible for better parallelization&lt;/li&gt;
&lt;li&gt;Burst and Mathematics&lt;/li&gt;
&lt;li&gt;Optimized special Jobs for most used cases&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Draco meshes were not involved thus far, so those should be using the advanced Mesh API as well.&lt;/p&gt;
&lt;p&gt;Looking at the profiling data, more specifically the worker threads, it seems that the C# Jobs are not the bottleneck. So looking into smarter flow of the async loading process seems to be the most promising endeavour.&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot; tabindex=&quot;-1&quot;&gt;Next up&lt;/h2&gt;
&lt;p&gt;I haven&#39;t decided yet. It&#39;ll be a surprise 😃&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Next: &lt;a href=&quot;https://pixel.engineer/posts/gltfast-async-1&quot;&gt;Asynchronous Programming&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;Overview of this mini-series&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 4. New Mesh API - The Failed Attempt</title>
    
    <link href="https://pixel.engineer/posts/gltfast-new-mesh-api-1/"/>
    <updated>2020-04-06T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltfast-new-mesh-api-1/</id>
    <content type="html">&lt;p&gt;This is part 4 of a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;goal&quot; tabindex=&quot;-1&quot;&gt;Goal&lt;/h2&gt;
&lt;p&gt;With version 2019.3 Unity introduced a new &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Mesh.html&quot;&gt;Advanced Mesh API&lt;/a&gt; for creating meshes and announced the following advantages&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Faster (yay!)&lt;/li&gt;
&lt;li&gt;Flexible vertex attribute data layout&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s faster, since it omits all validation checks the simple API does. In this post we will see if that&#39;s true for my cases.&lt;/p&gt;
&lt;p&gt;The plan I have in mind is start from the rear end (mesh data submission) and gradually improve the workflow towards data retrieval from glTF buffers.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Replace simple by advanced API calls&lt;/li&gt;
&lt;li&gt;Replace existing data structures (C# arrays) by Unity NativeArrays&lt;/li&gt;
&lt;li&gt;Experiment with data types (instead of using floats for everything, use smaller types; esp. if the original glTF type is not a float)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;step-1%3A-advanced-api-calls&quot; tabindex=&quot;-1&quot;&gt;Step 1: Advanced API calls&lt;/h2&gt;
&lt;p&gt;First thing I did was replacing the simple API calls by the advanced ones (see &lt;a href=&quot;https://github.com/atteneder/glTFast/commit/140ec1d0df2894a91d85e38482af27fbc346adb2&quot;&gt;commit&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;This is an approach from the rear end, where all vertex data is already retrieved from buffers in the form of arrays (e.g. &lt;code&gt;Vector3[]&lt;/code&gt;) and ready to be pushed. I created one vertex stream per attribute.&lt;/p&gt;
&lt;h3 id=&quot;test-1%3A-full-high-resolution-mesh&quot; tabindex=&quot;-1&quot;&gt;Test 1: Full high resolution mesh&lt;/h3&gt;
&lt;p&gt;First comparison loading a high resolution mesh with UVs, normals and tangents (repeated 10 times)&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;glTFast 0.11.0&lt;/th&gt;
&lt;th&gt;glTFast dev&lt;/th&gt;
&lt;th&gt;speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SetVertices&lt;/td&gt;
&lt;td&gt;20.51 ms&lt;/td&gt;
&lt;td&gt;3.17 ms&lt;/td&gt;
&lt;td&gt;6.5 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SetIndices&lt;/td&gt;
&lt;td&gt;66.42 ms&lt;/td&gt;
&lt;td&gt;29.34 ms&lt;/td&gt;
&lt;td&gt;2.3 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SetUVs&lt;/td&gt;
&lt;td&gt;31.99 ms&lt;/td&gt;
&lt;td&gt;2.07 ms&lt;/td&gt;
&lt;td&gt;15.45 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SetNormals&lt;/td&gt;
&lt;td&gt;54.14 ms&lt;/td&gt;
&lt;td&gt;3.16 ms&lt;/td&gt;
&lt;td&gt;17.1 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SetTangents&lt;/td&gt;
&lt;td&gt;93.80 ms&lt;/td&gt;
&lt;td&gt;4.42 ms&lt;/td&gt;
&lt;td&gt;21.2 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RecalculateBounds&lt;/td&gt;
&lt;td&gt;34.63 ms&lt;/td&gt;
&lt;td&gt;30.02 ms&lt;/td&gt;
&lt;td&gt;1.2 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UploadMeshData&lt;/td&gt;
&lt;td&gt;117.70 ms&lt;/td&gt;
&lt;td&gt;121.69 ms&lt;/td&gt;
&lt;td&gt;1.0 x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;That&#39;s some great improvements 😀! Vertex data is 6x to 21x times faster and setting indices is twice as fast. As a result the total loading time for 10 huge meshes went down from 8.0 sec to 5.5 sec ( 45% faster ).&lt;/p&gt;
&lt;h3 id=&quot;test-2%3A-high-resolution-mesh-without-normals-and-tangents&quot; tabindex=&quot;-1&quot;&gt;Test 2: High resolution mesh without normals and tangents&lt;/h3&gt;
&lt;p&gt;The second test is the same mesh, but without normals and tangents&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Note: In previous posts of this miniseries it became clear that normal/tangent calculations are a bottleneck. Still I&#39;d like to see if the new mesh API improves the situation.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;glTFast 0.11.0&lt;/th&gt;
&lt;th&gt;glTFast dev&lt;/th&gt;
&lt;th&gt;speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;SetVertices&lt;/td&gt;
&lt;td&gt;32.64 ms&lt;/td&gt;
&lt;td&gt;3.09 ms&lt;/td&gt;
&lt;td&gt;10.6 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SetIndices&lt;/td&gt;
&lt;td&gt;69.80 ms&lt;/td&gt;
&lt;td&gt;27.38 ms&lt;/td&gt;
&lt;td&gt;2.6 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;SetUVs&lt;/td&gt;
&lt;td&gt;35.42 ms&lt;/td&gt;
&lt;td&gt;2.27 ms&lt;/td&gt;
&lt;td&gt;15.6 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RecalculateNormals&lt;/td&gt;
&lt;td&gt;130.61 ms&lt;/td&gt;
&lt;td&gt;75.62 ms&lt;/td&gt;
&lt;td&gt;1.7 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RecalculateTangents&lt;/td&gt;
&lt;td&gt;960.36 ms&lt;/td&gt;
&lt;td&gt;892.66 ms&lt;/td&gt;
&lt;td&gt;1.1 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RecalculateBounds&lt;/td&gt;
&lt;td&gt;35.81 ms&lt;/td&gt;
&lt;td&gt;26.77 ms&lt;/td&gt;
&lt;td&gt;1.3 x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;UploadMeshData&lt;/td&gt;
&lt;td&gt;134.27 ms&lt;/td&gt;
&lt;td&gt;132.60 ms&lt;/td&gt;
&lt;td&gt;1.0 x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;The normal and especially the tangent calculations are still devastating, but they got a bit faster. Other than that we see similar results. Setting positions is even 10 times faster. The total time went down from 16.4 sec to 14.5 sec ( 13% faster ).&lt;/p&gt;
&lt;h3 id=&quot;test-3%3A-generic-sample-sets&quot; tabindex=&quot;-1&quot;&gt;Test 3: Generic sample sets&lt;/h3&gt;
&lt;p&gt;Moving on to sample set with more variation and practical real-world features.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;glTFast 0.11.0&lt;/th&gt;
&lt;th&gt;glTFast dev&lt;/th&gt;
&lt;th&gt;speedup&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href=&quot;https://github.com/KhronosGroup/glTF-Sample-Models&quot;&gt;glTF sample models&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;9.48 sec&lt;/td&gt;
&lt;td&gt;8.82 sec&lt;/td&gt;
&lt;td&gt;+7.5%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;furniture set&lt;/td&gt;
&lt;td&gt;9.08 sec&lt;/td&gt;
&lt;td&gt;8.32 sec&lt;/td&gt;
&lt;td&gt;+10%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;This looked promising at first sight, but I saw that I introduced regressions (like not supporting a second UV set or vertex colors). When I tried to fix those to re-run the tests, suddenly Unity crashed 😱 .&lt;/p&gt;
&lt;p&gt;I tracked down the crash at &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Mesh.SetVertexBufferParams.html&quot;&gt;&lt;code&gt;Mesh.SetVertexBufferParams&lt;/code&gt;&lt;/a&gt;. Turns out the troubling mesh primitive uses positions, normals, tangents and two sets of texture coordinates. Reading the docs carefully I found that Unity supports up to four vertex streams maximum and my &amp;quot;one stream per attribute&amp;quot; approach exceeds this limit and causes Unity to crash.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;I filed a &lt;a href=&quot;https://issuetracker.unity3d.com/issues/editor-crashes-when-mesh-dot-setvertexbufferparams-function-is-provided-with-a-vertexattributedescriptor-of-length-5-or-greater&quot;&gt;bug report&lt;/a&gt; and Unity fixed it in 2020.2. It still won&#39;t work, but at least fellow developers don&#39;t have to wonder why it crashes anymore 💯.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;I investigated a bit in the code base and came to the conclusion, that for the moment this is a dead end. The experimental Mesh API support stops here 🛑.&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot; tabindex=&quot;-1&quot;&gt;Next up&lt;/h2&gt;
&lt;p&gt;In order to use the advanced Mesh API I have to group vertex attributes in a way that results in 4 vertex streams or less, no matter how many attributes there are.&lt;/p&gt;
&lt;p&gt;This totally screwed up my plan of starting at the &amp;quot;rear end&amp;quot; and improve from there in tiny steps. I&#39;ll have to refactor the data retrieval from start to end in order to support this. The positive initial results motivate me to do exactly that, so I decided to draw a line under the current version of glTFast and &lt;a href=&quot;https://pixel.engineer/posts/gltfast-1.0&quot;&gt;call it version 1.0&lt;/a&gt; before I proceed doing this major refactor.&lt;/p&gt;
&lt;p&gt;On the plus side, I&#39;ll build the refactored version based on NativeArrays from the start, so that&#39;s two things at one sweep.&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Next: &lt;a href=&quot;https://pixel.engineer/posts/gltfast-new-mesh-api-2&quot;&gt;New Mesh API - The Refactor&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;Overview of this mini-series&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTFast Next Steps</title>
    
    <link href="https://pixel.engineer/posts/gltfast-1.0/"/>
    <updated>2020-03-13T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltfast-1.0/</id>
    <content type="html">&lt;h2 id=&quot;state-of-gltfast&quot; tabindex=&quot;-1&quot;&gt;State of glTFast&lt;/h2&gt;
&lt;p&gt;A while back I started a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;, in which I document my aspiration to get my Unity package for loading glTF (&lt;a href=&quot;https://github.com/atteneder/glTFast&quot;&gt;glTFast&lt;/a&gt;) more efficient, modern and elegant.&lt;/p&gt;
&lt;p&gt;I&#39;m far from running out of ideas yet, but I have to balance backwards compatibility vs. innovation speed.&lt;/p&gt;
&lt;p&gt;First of all, due to excessive profiling I figured out where the biggest potential for improvements is and followed the trace. I started using Unity&#39;s new advanced Mesh API and was making first progress (up to 45% faster loading of big meshes). I also figured that in order to get ideal results and to not introduce regressions, I have to do a significant refactor.&lt;/p&gt;
&lt;p&gt;I wasn&#39;t in the mood for that yet, so I did a quick stab at Unity&#39;s new Burst compiler. Again I could strip off some milliseconds and retrieving data from buffers got ~50% faster in a test case. But again, this is not backwards compatible to the currently supported 2018.2.&lt;/p&gt;
&lt;p&gt;Next I tried to tackle some last, missing features (outside of animation; like support for multiple UV sets), but the thought that this is not important (to me) and possibly obsolete in upcoming tech stacks did not appeal to me at all.&lt;/p&gt;
&lt;h2 id=&quot;so-what-to-do-then&quot; tabindex=&quot;-1&quot;&gt;So what to do then&lt;/h2&gt;
&lt;p&gt;Today (on Friday the 13th) I freeze glTFast&#39;s feature set and &lt;strong&gt;release version 1.0.0!&lt;/strong&gt; 🎉🎉🎉&lt;/p&gt;
&lt;p&gt;I think the project is at a point where it can be used in production. Possible fixes will be released in 1.0.x cycles.&lt;/p&gt;
&lt;p&gt;Future development will be based on Unity 2019.3/4 and break backwards compatibility with older versions. Dropping the baggage of legacy support allows me to innovate faster. Focus will be speed and efficiency foremost. I think about:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Advanced Mesh API&lt;/li&gt;
&lt;li&gt;Burst&lt;/li&gt;
&lt;li&gt;DOTS&lt;/li&gt;
&lt;li&gt;Loading improvements (maybe async)&lt;/li&gt;
&lt;li&gt;Streaming&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Secondary features/goals that might come in are&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Newer glTF next gen material features, as they evolve&lt;/li&gt;
&lt;li&gt;Universal Render Pipeline support&lt;/li&gt;
&lt;li&gt;High Definition Render Pipeline support&lt;/li&gt;
&lt;li&gt;Better interface and more developer convenience&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I&#39;ll release those innovations in a 1.x cycles until it is ready for production in a final 2.0 release.&lt;/p&gt;
&lt;p&gt;I think this project has some exciting times ahead. I&#39;d be glad if you follow and star it &lt;a href=&quot;https://github.com/atteneder/glTFast&quot;&gt;on GitHub&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Cheers!&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 3. Parallel Jobs</title>
    
    <link href="https://pixel.engineer/posts/gltfast-parallel-jobs/"/>
    <updated>2020-02-28T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltfast-parallel-jobs/</id>
    <content type="html">&lt;p&gt;This is part 3 of a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;goal&quot; tabindex=&quot;-1&quot;&gt;Goal&lt;/h2&gt;
&lt;p&gt;In this episode I wanted to investigate, if the task of converting index/vertex data from binary buffers into Unity structures could be sped up by using parallel jobs.&lt;/p&gt;
&lt;p&gt;The abstract steps of converting data in glTFast look like this&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Prepare Jobs
&lt;ul&gt;
&lt;li&gt;Allocate memory for the destination Unity data structures&lt;/li&gt;
&lt;li&gt;Schedule Jobs&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Execute Jobs&lt;/li&gt;
&lt;li&gt;Create Meshes from retrieved data structures&lt;/li&gt;
&lt;li&gt;Instantiate GameObjects&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The job execution is already threaded via the C# Job system, but one glTF Accessor is always retrieved in one thread and not split up further. For smaller meshes this is probably fine, but I can imagine large meshes may benefit if the work is spread across CPU cores more evenly.&lt;/p&gt;
&lt;h2 id=&quot;analysis&quot; tabindex=&quot;-1&quot;&gt;Analysis&lt;/h2&gt;
&lt;p&gt;Lets do some profiling to get a measure of the status quo. I used a 4 million triangle mesh with indices, positions and texture coordinates only.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/3/no_parallel.png&quot; alt=&quot;&amp;quot;HighRes scene part 1: preparation and job execution&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;The interesting bit is in the lower half. After scheduling the jobs, the worker threads start the execution roughly at the same time. 3 Workers have nothing to do, the UV job is done after 12.56 ms while the indices job takes the longest ( 24.55 ms ). Since we can only continue to build the mesh after all jobs are finished, 24.55 ms is the starting measure.&lt;/p&gt;
&lt;p&gt;What I try to achieve is to spread the work more evenly across the worker threads, so the total computation is finished earlier. If I add up the Job&#39;s execution times and divide it evenly by the number of workers (6), I get 9.21 ms, which would be the absolute optimum if there was no overhead for spreading the work. Let&#39;s see how close we can get.&lt;/p&gt;
&lt;h2 id=&quot;status-quo%3A-job-implementation&quot; tabindex=&quot;-1&quot;&gt;Status Quo: Job implementation&lt;/h2&gt;
&lt;p&gt;There are over 30 different C# Jobs in glTFast, all of which implement the &lt;code&gt;IJob&lt;/code&gt; interface and most of the have a main for-loop at their heart.&lt;/p&gt;
&lt;p&gt;To make them parallel, they have to implement the &lt;code&gt;IJobParallelFor&lt;/code&gt; interface. Let me demonstrate this on one simplified example. This Job retrieves the mesh indices in one loop:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;unsafe&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;GetIndicesUInt32Job&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token type-list&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;IJob&lt;/span&gt;&lt;/span&gt;  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; count&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; System&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;UInt32&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Execute&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; triCount &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; count&lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token comment&quot;&gt;// This is the main loop we want to spread&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; triCount&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
            result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
            result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&quot;let&#39;s-go-parallel&quot; tabindex=&quot;-1&quot;&gt;Let&#39;s go parallel&lt;/h2&gt;
&lt;p&gt;Instead of this, we want the Job Scheduler to split this up onto the threads.&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;unsafe&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;GetIndicesUInt32Job&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token type-list&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;IJobParallelFor&lt;/span&gt;&lt;/span&gt;  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; System&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;UInt32&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Execute&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; index&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;index&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;index&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;Scheduling this job changes a bit:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// old way for IJob&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; jobHandle &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; job&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Schedule&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;// new way for IJobParallelFor&lt;/span&gt;
&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; jobHandle &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; job&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;Schedule&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;indexCount&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;We now have to provide the number of iterations (index count in the example) to the &lt;code&gt;Schedule&lt;/code&gt; method. Before it was a member of the Job struct (&lt;code&gt;public int count;&lt;/code&gt;);&lt;/p&gt;
&lt;p&gt;The second parameter is called batch count. We get to that soon and leave it at a default &lt;code&gt;1&lt;/code&gt;&lt;/p&gt;
&lt;h3 id=&quot;first-test&quot; tabindex=&quot;-1&quot;&gt;First test&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/3/parallel_batch_1.png&quot; alt=&quot;&amp;quot;First attempt at parallel jobs with moderate results&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Now the work is spread across workers evenly, but the total execution time went up to over 32 ms! 😱&lt;/p&gt;
&lt;p&gt;Reading the &lt;a href=&quot;https://docs.unity3d.com/2018.3/Documentation/Manual/JobSystemParallelForJobs.html&quot;&gt;documentation&lt;/a&gt; explains why. Our new job has little computational complexity but is repeated millions of times. Spreading this iterations across threads does have an overhead and if this is done on a very small/granular level, the overhead is bigger than the gain.&lt;/p&gt;
&lt;h3 id=&quot;batch-count-counts&quot; tabindex=&quot;-1&quot;&gt;Batch count counts&lt;/h3&gt;
&lt;p&gt;This is where the batch count parameter comes in. It tells the Job System to not make batches with less iterations than this number. So next thing I did was increase the batch count in test iterations and tried to find a sweet spot. I settled for 50000 for this Job, but results may vary on different (more complex) jobs or different hardware.&lt;/p&gt;
&lt;p&gt;Let&#39;s see how it performs now:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/3/parallel.png&quot; alt=&quot;&amp;quot;Second attempt at parallel jobs&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Down to 14.6 ms. That&#39;s not as low as the targeted 9.21 ms, but 40% less than the original 24.55 ms. And look how neatly aligned the blue bars on the worker threads are 😍. Good job 😎!&lt;/p&gt;
&lt;p&gt;I think the result may be improved further if the batch count is tweaked more carefully on the actual target hardware (which will be mainly mobile in my case). That I still have to do.&lt;/p&gt;
&lt;h3 id=&quot;hot-loops&quot; tabindex=&quot;-1&quot;&gt;Hot loops&lt;/h3&gt;
&lt;p&gt;Some jobs had an option for normalization, like this one:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Execute&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;normalize&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; count&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;255f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
            result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;y &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;255f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;else&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token keyword&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;var&lt;/span&gt;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i &lt;span class=&quot;token operator&quot;&gt;&amp;lt;&lt;/span&gt; count&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; i&lt;span class=&quot;token operator&quot;&gt;++&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
            result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
            result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;y &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;There were two design decisions I made when writing this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;One Job for both normalized and not normalized variants for more convenient Job usage.&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;normalize&lt;/code&gt; condition is outside of the hot loop. It&#39;s a bit more redundant because of this, but it&#39;s faster.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;When making them parallel, I have to either make two Job types or move the if condition into the loop. I chose the first option for performance reasons:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// &quot;regular&quot; variant:&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;unsafe&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;GetUVsUInt8Job&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token type-list&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;IJobParallelFor&lt;/span&gt;&lt;/span&gt;  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; Vector2&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Execute&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;y &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// And the second, normalized variant:&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;unsafe&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;struct&lt;/span&gt; &lt;span class=&quot;token class-name&quot;&gt;GetUVsUInt8NormalizedJob&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;token type-list&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;IJobParallelFor&lt;/span&gt;&lt;/span&gt;  &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;byte&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; Vector2&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; result&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

    &lt;span class=&quot;token keyword&quot;&gt;public&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;Execute&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;int&lt;/span&gt;&lt;/span&gt; i&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;x &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;255f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
        result&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;y &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;-&lt;/span&gt; input&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;i&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;token operator&quot;&gt;+&lt;/span&gt;&lt;span class=&quot;token number&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;/&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;255f&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;I applied this everywhere and ended up with 43 parallel jobs now (See the &lt;a href=&quot;https://github.com/atteneder/glTFast/commit/884aaaa27a3b3db227ef5936604ba16eae56cc5c&quot;&gt;commit&lt;/a&gt; for details). This was released in &lt;a href=&quot;https://github.com/atteneder/glTFast/releases/tag/v0.11.0&quot;&gt;glTFast 0.11.0&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;putting-it-in-perspective&quot; tabindex=&quot;-1&quot;&gt;Putting it in perspective&lt;/h2&gt;
&lt;p&gt;Shaving off a couple of milliseconds sure feels nice, but looking at the overall loading time (which is still hundreds of milliseconds), frankly it doesn&#39;t really matter that much 🙁&lt;/p&gt;
&lt;p&gt;The demo scene profiled here also is an extreme case. Yes, high triangle counts will benefit more, but I guess most content out there is not.&lt;/p&gt;
&lt;p&gt;Originally I wanted to go on and see how much &lt;a href=&quot;https://docs.unity3d.com/Packages/com.unity.mathematics@1.0/api/Unity.Mathematics.html&quot;&gt;Unity.Mathematics&lt;/a&gt; and the &lt;a href=&quot;https://docs.unity3d.com/Packages/com.unity.burst@0.2/manual/index.html&quot;&gt;Burst&lt;/a&gt; compiler can speed up the Jobs, but it seems that the energy is spent wiser elsewhere first and get back to that later.&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot; tabindex=&quot;-1&quot;&gt;Next up&lt;/h2&gt;
&lt;p&gt;I&#39;ll probably tackle the new Mesh API in an attempt to get these crazy mesh instantiation times down. Stay tuned!&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Next: &lt;a href=&quot;https://pixel.engineer/posts/gltfast-new-mesh-api-1&quot;&gt;New Mesh API - The Failed Attempt&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;Overview of this mini-series&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 2. Avoid Tangents and Normals Calculation</title>
    
    <link href="https://pixel.engineer/posts/gltfast-no-tangents/"/>
    <updated>2020-02-26T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltfast-no-tangents/</id>
    <content type="html">&lt;p&gt;This is part 2 of a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;instant-side-quest&quot; tabindex=&quot;-1&quot;&gt;Instant side quest&lt;/h2&gt;
&lt;p&gt;Originally I wanted to investigate the benefit of parallel jobs in this post. I started by creating test cases and profiling the status quo. That&#39;s where I figured that there&#39;s higher potential in other optimizations.&lt;/p&gt;
&lt;h2 id=&quot;analysis&quot; tabindex=&quot;-1&quot;&gt;Analysis&lt;/h2&gt;
&lt;p&gt;I created two test scenes&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &amp;quot;HighRes&amp;quot; scene with one 4 million triangle mesh&lt;/li&gt;
&lt;li&gt;The &amp;quot;Sixfold&amp;quot; scene with six meshes with ~252k triangles each (1.5 million triangles total)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is the HighRes scene in Unity&#39;s profiler. More specifically, the first frame:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/2/glTFast_0.10.0_highres_1_PrepareJobs.png&quot; alt=&quot;&amp;quot;HighRes scene part 1: preparation and job execution&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Most time is spent/lost allocating Unity&#39;s data structures (a hint to switch to the new Mesh API, maybe). Let&#39;s look at the second part in the following frame:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/2/glTFast_0.10.0_highres_2_CreatePrimitive.png&quot; alt=&quot;&amp;quot;HighRes scene part 2: Mesh creation&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Here the Unity Mesh is created and it takes a devastating 1.279 seconds !!! Most of that time (1.06 seconds) is spent calculating tangents. Overall this scene loads in ~1.6 seconds.&lt;/p&gt;
&lt;p&gt;For the Sixfold scene the result is similar, except that the six Mesh creations are spread amongst six frames, so at least the frame stall is not as bad.&lt;/p&gt;
&lt;h2 id=&quot;optimize&quot; tabindex=&quot;-1&quot;&gt;Optimize&lt;/h2&gt;
&lt;p&gt;So tangent recalculation is expensive. The best and first way to make code faster is to not execute it, so why do we need tangents in first place? Some materials/shaders rely on correct them. I know they are needed for consistent normal mapping, but I&#39;m not sure if for anything else (maybe anisotropic shading?). For now I assume we only need them for normal mapped materials.&lt;/p&gt;
&lt;p&gt;Calculating normals is less expensive than tangents, but while we&#39;re at it, let&#39;s think about them as well. They are necessary for all shaded materials, but not for unlit ones.&lt;/p&gt;
&lt;p&gt;So in &lt;a href=&quot;https://github.com/atteneder/glTFast/releases/tag/v0.10.2&quot;&gt;glTFast 0.10.2&lt;/a&gt; I changed the importer to only calculate normals/tangents, if the material requires them. Let&#39;s have a look at the final frame when loading the previous scene again:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/2/glTFast_0.10.2_highres_2_CreatePrimitive.png&quot; alt=&quot;&amp;quot;HighRes scene: optimized mesh creation&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Boom, down a full second. The whole scene now loads in ~700 ms. It&#39;s even faster when using unlit materials:&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/2/glTFast_0.10.2_highres_2_CreatePrimitive_unlit.png&quot; alt=&quot;&amp;quot;HighRes scene unlit: optimized mesh creation&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Loads consistently in under 400 ms. Nice achievement for a couple of lines of code.&lt;/p&gt;
&lt;p&gt;This was released in &lt;a href=&quot;https://github.com/atteneder/glTFast/releases/tag/v0.10.2&quot;&gt;glTFast 0.10.2&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;next-up&quot; tabindex=&quot;-1&quot;&gt;Next up&lt;/h2&gt;
&lt;p&gt;What about all the cases where having normal/tangents is absolutely necessary? The Unity mesh API does not provide us with a threaded version of &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Mesh.RecalculateNormals.html&quot;&gt;RecalculateNormals&lt;/a&gt; or &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Mesh.RecalculateTangents.html&quot;&gt;RecalculateTangents&lt;/a&gt;, so the only way around this would be to make one&#39;s own C# job that does the normal/tangent calculations on a thread.&lt;/p&gt;
&lt;p&gt;I definitely plan to do this, but only after glTFast switched to the new Mesh API (based on NativeArrays) and other improvements, so stay tuned!&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Next: &lt;a href=&quot;https://pixel.engineer/posts/gltfast-parallel-jobs&quot;&gt;3 Parallel Jobs&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;Overview of this mini-series&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 1. Buffers, Accessors and Primitives</title>
    
    <link href="https://pixel.engineer/posts/gltf-in-unity-optimization-1.-buffers-accessors-and-primitives/"/>
    <updated>2020-02-18T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltf-in-unity-optimization-1.-buffers-accessors-and-primitives/</id>
    <content type="html">&lt;p&gt;This is part 1 of a &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;mini-series&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;TL;DR:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reading the glTF specification carefully right away would have spared me performance issues in corner cases later on.&lt;/li&gt;
&lt;li&gt;How to get from 8 GB down to 0.034 GB of mesh memory usage. glTF may be a bit too generous and allows exporters to create valid, but troublesome data.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;observations&quot; tabindex=&quot;-1&quot;&gt;Observations&lt;/h2&gt;
&lt;p&gt;Though glTFast did a decent job on all of the assets in the official sample model repository, some users reported poor performance with certain scenes of theirs.&lt;/p&gt;
&lt;p&gt;The investigation showed, that I was approaching loading (mesh) primitive data from the wrong side. Let&#39;s look at how a glTF scene is structured.&lt;/p&gt;
&lt;p&gt;I&#39;ll give a brief overview of the &lt;a href=&quot;https://github.com/KhronosGroup/glTF/blob/master/specification/2.0/README.md&quot;&gt;glTF 2.0 specification&lt;/a&gt;. If you already know the intrinsic structure of glTFs, you might want to skip ahead to the &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-1.-buffers-accessors-and-primitives/#finding-a-solution&quot;&gt;solution&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;In this article I&#39;ll use words that can stand for a general concept and a specific glTF schema entity as well (like a mesh). I&#39;ll use capitalized words when I talk about the glTF schema entity in particular and lower-case when it&#39;s about the general idea/concept.&lt;/p&gt;
&lt;h2 id=&quot;gltf-scene-structure&quot; tabindex=&quot;-1&quot;&gt;glTF scene structure&lt;/h2&gt;
&lt;p&gt;A glTF scene consists of a hierarchy of Nodes. Nodes have a transform (position, rotation and scale) and can have any number of child Nodes. They can also have a reference to a Mesh, which indicates that the Mesh&#39;s geometry is supposed to be rendered with the Nodes transformation. A Mesh can be references by multiple Nodes in a Scene.&lt;/p&gt;
&lt;h2 id=&quot;gltf-data-structure&quot; tabindex=&quot;-1&quot;&gt;glTF data structure&lt;/h2&gt;
&lt;p&gt;In glTF all geometry data is inside of Buffers, which are plain binary data. Several entities in the glTF&#39;s JSON describe how this data is structured and how it can be accessed.&lt;/p&gt;
&lt;p&gt;The smallest unit of visible geometry is a Primitive. It consists of triangle indices, references to vertex attribute buffers via Accessors ( positions, normals, texture coordinates, colors and tangents) that those indices reference into and a material. A Primitive belongs to a Mesh, which can contain multiple Primitives.&lt;br /&gt;
The Primitive&#39;s references to vertex attribute buffers are actually references to Accessors. Accessors define the attribute&#39;s type (scalars, vectors of 2,3 or 4 dimensions, etc) and its components data type (byte, short, float). They also reference which BufferView contains the described vertex attributes. The BufferView is the final connection to the actual data inside the Buffers. BufferViews define in which Buffer, at what position (and in case of interleaved attributes, the byte-stride) the actual data is.&lt;/p&gt;
&lt;p&gt;This structure gives DCC (digital content creation) tool architects a lot of possibilities and freedom to put 3D scenes into glTFs. It allows efficient re-use of Accessors and Meshes, so there&#39;s no need to make data redundant.&lt;/p&gt;
&lt;h2 id=&quot;gltfast&#39;s-behavior-status-quo&quot; tabindex=&quot;-1&quot;&gt;glTFast&#39;s behavior status quo&lt;/h2&gt;
&lt;p&gt;When I sketched out glTFast, I apparently was approaching things from the final scene structure&#39;s perspective and not from a data perspective. I can hear &lt;a href=&quot;https://unity.com/dots&quot;&gt;DOTS&lt;/a&gt; folks shaking their heads now 😉&lt;/p&gt;
&lt;p&gt;glTFast iterates over all Primitives and then retrieves their data through Accessors and BufferViews from the Buffer. Retrieving means the raw byte values are transferred into C# data types / Unity data structures (like Vector3 arrays for positions) and converted into Unity&#39;s coordinate system&#39;s space.&lt;/p&gt;
&lt;p&gt;This is simple and in many cases fast enough (or not slower), but as soon as multiple Primitives reference the same attributes or indices, those will be retrieved from the Buffer multiple times redundantly. This is bad for two reasons:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It takes longer&lt;/li&gt;
&lt;li&gt;It wastes memory since more copies of the same data are created&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I can&#39;t remember why I designed it this way (other than not being aware of this problem), but I probably thought unused Accessors will never get imported this way. A fair point, but a common/sane glTF files don&#39;t contain unused data in first place 😃&lt;/p&gt;
&lt;h2 id=&quot;analysis-of-the-problem&quot; tabindex=&quot;-1&quot;&gt;Analysis of the problem&lt;/h2&gt;
&lt;p&gt;I didn&#39;t want to repeat the mistake of jumping into coding before I understand the problem, so I started researching the glTF structure as well as Unity&#39;s Mesh API (the legacy and the new 2019.3 version), because it all comes down to bringing those two together efficiently.&lt;/p&gt;
&lt;p&gt;I&#39;ve had two bad-performing scenes on my hand which illustrate two different problems.&lt;/p&gt;
&lt;h3 id=&quot;case-1%3A-smart-re-usage&quot; tabindex=&quot;-1&quot;&gt;Case 1: Smart re-usage&lt;/h3&gt;
&lt;p&gt;The first case had Primitives of one Mesh reference the same vertex attributes, but with different materials (and indices). This is actually very common with assets. Triangles that share the Node&#39;s transform and even vertices, but have different materials. It&#39;s glTF&#39;s counterpart to &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Mesh-subMeshCount.html&quot;&gt;Unity subMeshes&lt;/a&gt; and that&#39;s what they should end up being imported to.&lt;/p&gt;
&lt;h3 id=&quot;case-2%3A-let&#39;s-go-bonkers&quot; tabindex=&quot;-1&quot;&gt;Case 2: Let&#39;s go bonkers&lt;/h3&gt;
&lt;p&gt;The second case took the concept of re-using Accessors even further. The scene had ~400 Primitives, but only two very large vertex attribute arrays (with Accessors for positions, normals, UVs for each). One buffer counted ~234.000 vertices and the other one ~138.000. This is pretty much the worst-case scenario for glTFast and thus an excellent use-case.&lt;/p&gt;
&lt;p&gt;When turned into Unity meshes, Primitives would occupy ~18 MB or ~13 MB each, but most of them consisted of only of a dozen or up to some hundred triangles. The hundreds of vertex data copies took their toll. Overall this 30 MB glTF binary file ate up over 7 GB of memory after import 😱&lt;/p&gt;
&lt;p&gt;The official &lt;a href=&quot;https://github.khronos.org/glTF-Validator&quot;&gt;glTF Validator&lt;/a&gt; thinks this glTF is perfectly valid. &lt;a href=&quot;https://gltf-viewer.donmccurdy.com/&quot;&gt;Don McCurdy&#39;s glTF Viewer&lt;/a&gt; reports 97 million vertices!&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/1/modelo_validator.gif&quot; alt=&quot;&amp;quot;Result of the glTF Validator&amp;quot;&quot; /&gt;.&lt;/p&gt;
&lt;p&gt;That&#39;s roughly the number of Primitives multiplied by the vertex count. So either this viewer does duplicate all vertex attributes as well, or the number in the report is wrong. It loads it very fast though ( ~3 seconds on my laptop ).&lt;/p&gt;
&lt;p&gt;My gut feeling tells me, this scene is not using glTFs ability to re-use accessors to the its benefit.&lt;/p&gt;
&lt;h2 id=&quot;finding-a-solution&quot; tabindex=&quot;-1&quot;&gt;Finding a solution&lt;/h2&gt;
&lt;p&gt;The rationale of the analysis is that the ideal glTF importer retrieves the data once per Accessor instead of once per Primitive.&lt;/p&gt;
&lt;h3 id=&quot;proper-sub-meshes&quot; tabindex=&quot;-1&quot;&gt;Proper sub meshes&lt;/h3&gt;
&lt;p&gt;I refactored the loading routine to the following, simplified steps&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Iterate over all Primitives
&lt;ul&gt;
&lt;li&gt;Assign the Accessors that they reference their usage type (used for positions or normals or …)&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Iterate over all Accessors
&lt;ul&gt;
&lt;li&gt;Now that we know how an Accessor is used, we can properly retrieve and convert its data&lt;/li&gt;
&lt;li&gt;This step is still fast, because threaded via C# Job System&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Wait for all Accessor data to be ready&lt;/li&gt;
&lt;li&gt;Iterate over all Meshes
&lt;ul&gt;
&lt;li&gt;Iterate over a Mesh&#39;s Primitives and cluster them by identical vertex attribute usage&lt;/li&gt;
&lt;li&gt;Create one Unity mesh per Primitive cluster
&lt;ul&gt;
&lt;li&gt;In case of multiple Primitives per cluster, create subMeshes for them&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Iterate over Scene&#39;s Nodes
&lt;ul&gt;
&lt;li&gt;Create a GameObject/MeshFilter that references the corresponding Mesh.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;As a consequence, Accessor data is retrieved only once in any case 🎉🎉🎉&lt;/p&gt;
&lt;p&gt;This solution has a bit more pre-computation overhead compared to the old, per Primitive approach, benchmarks showed, that from a performance perspective this is negligible. Scenes that don&#39;t re-use vertex attributes might load a couple of milliseconds slower.&lt;/p&gt;
&lt;p&gt;Scenes with similar Primitives per Mesh are faster and more memory efficient now. For example, this cube consists of one Mesh with six Primitives (one per side). Instead of importing the vertex data six times and creating six Unity meshes/GameObjects it now does everything only once.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/1/test_cube.png&quot; alt=&quot;&amp;quot;A cube with a single Mesh and one Primitive for each side.&amp;quot;&quot; /&gt;.&lt;/p&gt;
&lt;p&gt;These changes will soon be released in &lt;a href=&quot;https://github.com/atteneder/glTFast/releases/tag/v0.10.0&quot;&gt;glTFast version 0.10.0&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;arbitrary-accessors-dilemma&quot; tabindex=&quot;-1&quot;&gt;Arbitrary accessors dilemma&lt;/h3&gt;
&lt;p&gt;Unfortunately, the problem case #2 cannot be resolved 😦&lt;/p&gt;
&lt;p&gt;In Unity you render geometry via a Unity Mesh, which is a data structure that tightly bundles vertex attribute buffers and (sub) mesh indices. You cannot keep them separate, even with the improved new Mesh API from 2019.3 (Aras &lt;a href=&quot;https://forum.unity.com/threads/feedback-wanted-mesh-scripting-api-improvements.684670/page-2#post-5457570&quot;&gt;verified&lt;/a&gt; that). If Accessors are re-used across multiple Meshes their imported Unity data still gets duplicated.&lt;/p&gt;
&lt;p&gt;If your rendering API or game engine gives you access to low level functionality like binding vertex attribute buffers manually, then there&#39;s no problem in loading glTF data &amp;quot;as-is&amp;quot;.&lt;/p&gt;
&lt;h4 id=&quot;solution-idea-1&quot; tabindex=&quot;-1&quot;&gt;Solution idea 1&lt;/h4&gt;
&lt;p&gt;Detect non-overlapping fragments of large vertex buffers and split them up at run-time.&lt;/p&gt;
&lt;p&gt;Even if this can be done fairly efficient (which I doubt), it will worsen the computation time of the import. It also means more code to write, stabilize and maintain.&lt;/p&gt;
&lt;h4 id=&quot;solution-idea-2&quot; tabindex=&quot;-1&quot;&gt;Solution idea 2&lt;/h4&gt;
&lt;p&gt;Create one Mesh per vertex attributes combination and use &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/Graphics.DrawMesh.html&quot;&gt;Graphics.DrawMesh&lt;/a&gt; instead of MeshRenderers for rendering.&lt;/p&gt;
&lt;p&gt;In problem case #2 that would mean two giant Unity meshes with hundreds of sub-meshes and a scene hierarchy with custom/non-standard MonoBehaviours for rendering.&lt;/p&gt;
&lt;p&gt;I presume this approach is faster than solution 1 at run-time and maybe even more efficient in terms of memory. Still, the non-standard components is what I dislike.&lt;/p&gt;
&lt;h4 id=&quot;finally%3A-the-%22no%22-solution&quot; tabindex=&quot;-1&quot;&gt;Finally: the &amp;quot;no&amp;quot; solution&lt;/h4&gt;
&lt;p&gt;This is a classic example of &amp;quot;garbage in, garbage out&amp;quot;.&lt;/p&gt;
&lt;p&gt;The vast majority of glTF files I came across (including the official sample models), have a sane structure that doesn&#39;t yield inefficiencies. As a matter of fact, when trying to produce worst-case example glTFs with the Blender glTF addon, I couldn&#39;t achieve it, since it created efficient files with separate vertex attribute buffers all the time.&lt;/p&gt;
&lt;p&gt;So I came to the conclusion it&#39;s a content problem created by the glTF generator.&lt;/p&gt;
&lt;h4 id=&quot;workaround&quot; tabindex=&quot;-1&quot;&gt;Workaround&lt;/h4&gt;
&lt;p&gt;Fortunately there are tools that optimize glTFs. I used the excellent &lt;a href=&quot;https://github.com/zeux/meshoptimizer/tree/master/gltf&quot;&gt;gltfpack&lt;/a&gt; to optimize the named scene and look at the results.&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;glTFast 0.9&lt;/th&gt;
&lt;th&gt;time&lt;/th&gt;
&lt;th&gt;memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;original&lt;/td&gt;
&lt;td&gt;16.3 sec&lt;/td&gt;
&lt;td&gt;8.000 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gltfpack&lt;/td&gt;
&lt;td&gt;2.49 sec&lt;/td&gt;
&lt;td&gt;0.034 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;glTFast 0.10&lt;/th&gt;
&lt;th&gt;time&lt;/th&gt;
&lt;th&gt;memory&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;original&lt;/td&gt;
&lt;td&gt;9.20 sec&lt;/td&gt;
&lt;td&gt;6.010 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gltfpack&lt;/td&gt;
&lt;td&gt;2.53 sec&lt;/td&gt;
&lt;td&gt;0.034 GB&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;Optimized glTFs make all the difference! Apart from being ~40 milliseconds (1.6%) slower on optimized content (I assume that&#39;s the additional overhead), overall glTFast 0.10 improved quite a bit.&lt;/p&gt;
&lt;h4 id=&quot;where-to-go-with-these-problems&quot; tabindex=&quot;-1&quot;&gt;Where to go with these problems&lt;/h4&gt;
&lt;p&gt;The next steps I eventually intend to make&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Figure out if other glTF importers have problems with this as well&lt;/li&gt;
&lt;li&gt;Certify that this is not an implementation problem of glTFast&lt;/li&gt;
&lt;li&gt;Let affected glTF generators&#39; developers know that their output is troublesome&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If this proves to be a generic problem there&#39;s gotta be a discussion about possible solutions&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Make the next iteration of the glTF specification more rigorous&lt;/li&gt;
&lt;li&gt;Make the glTF validators throw warnings (or even errors) that inform exporter developers about potential performance problems&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;next-up&quot; tabindex=&quot;-1&quot;&gt;Next up&lt;/h2&gt;
&lt;p&gt;This optimization step, although good and necessary for some scenes, made simple scenes a tiny bit slower. I&#39;m tempted to optimize the nested structure of Coroutines to make up for this right now, but I know that there are lower hanging fruits in other topics first. This has to wait, maybe until after a switch to .NET 4 only and async/await.&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Next: &lt;a href=&quot;https://pixel.engineer/posts/gltfast-no-tangents&quot;&gt;2. Avoid Tangents and Normals Calculation&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;Overview of this mini-series&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>glTF in Unity optimization - 0. Introduction</title>
    
    <link href="https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction/"/>
    <updated>2020-02-17T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction/</id>
    <content type="html">&lt;p&gt;I see some optimization potentials for my glTF loading library &lt;a href=&quot;https://github.com/atteneder/glTFast&quot;&gt;glTFast&lt;/a&gt; for &lt;a href=&quot;https://unity.com/&quot;&gt;Unity&lt;/a&gt;. I decided to document the process and results in a mini-series:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-0.-introduction&quot;&gt;0. Introduction&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-1.-buffers-accessors-and-primitives&quot;&gt;1. Buffers, Accessors and Primitives&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltfast-no-tangents&quot;&gt;2. Avoid Tangents and Normals Calculation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltfast-parallel-jobs&quot;&gt;3. Parallel Jobs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltfast-new-mesh-api-1&quot;&gt;4. New Mesh API - The Failed Attempt&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltfast-new-mesh-api-2&quot;&gt;5. New Mesh API - The Refactor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltfast-async-1&quot;&gt;6. Asynchronous Programming&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltfast-perf-tests&quot;&gt;7. Performance Tests&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href=&quot;https://pixel.engineer/posts/gltfast-async-instantiation&quot;&gt;8. Asynchronous Scene Instantiation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;history-of-gltfast&quot; tabindex=&quot;-1&quot;&gt;History of glTFast&lt;/h2&gt;
&lt;p&gt;&lt;a href=&quot;https://www.khronos.org/gltf/&quot;&gt;glTF&lt;/a&gt;, in a nutshell, is an open 3D asset format that is striving to become a standard, the &amp;quot;Jpeg of 3D&amp;quot;.&lt;/p&gt;
&lt;p&gt;Early 2018 I was investigating in possibilities to load glTF files at run-time in a Unity project. Back then the only option was &lt;a href=&quot;https://github.com/KhronosGroup/UnityGLTF&quot;&gt;UnityGLTF&lt;/a&gt;, the official importer/exporter from the Khronos Group.&lt;/p&gt;
&lt;p&gt;While on the surface it seems to work nicely pretty much out of the box I noticed that it&#39;s not very fast 😦&lt;br /&gt;
First of all, the WebGL build&#39;s JS/WebAssembly file was big, so starting the project took long. Loading glTF files also wasn&#39;t blazing fast.&lt;/p&gt;
&lt;p&gt;During investigation the thing that stood out to me was UnityGLTF is using &lt;a href=&quot;https://www.newtonsoft.com/json&quot;&gt;Newtonsoft&#39;s Json.NET library&lt;/a&gt; for parsing. From previous projects I knew that feature-wise it&#39;s a great, flexible library, but it takes its toll by pulling in a lot of dependencies which eventually bloat the build by some mega bytes.&lt;/p&gt;
&lt;p&gt;I also knew Unity has its own &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/JsonUtility.html&quot;&gt;JSON parser&lt;/a&gt;, which is less powerful but:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&#39;s fast&lt;/li&gt;
&lt;li&gt;It does not use a lot of memory&lt;/li&gt;
&lt;li&gt;It&#39;s already built-in (or nowadays: a package)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So in ~May 2018 I started tinkering with my own solution based on that parser. Initial results were promising. This is a screenshot (made in May 2018) of WebGL builds of UnityGLTF and glTFast loading the same asset.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/gltfaster/0/glTFast_vs_UnityGLTF_2018-05-20.jpg&quot; alt=&quot;&amp;quot;Screenshot. UnityGLTF vs. glTFast&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Admitted, this is an old screenshot and not a meaningful, significantly measured test result. The test wasn&#39;t repeated/averaged and I cannot reconstruct the exact setting. But still, loading 56% quicker seems pretty good to me.&lt;/p&gt;
&lt;p&gt;Over time I kept on improving:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Speed up due to threading via C# Job system&lt;/li&gt;
&lt;li&gt;Using custom shaders that can consume glTF textures without re-processing&lt;/li&gt;
&lt;li&gt;Improved standard feature compliance with official sample files&lt;/li&gt;
&lt;li&gt;Added support for official extensions like Draco or mesh quantization&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;future&quot; tabindex=&quot;-1&quot;&gt;Future&lt;/h2&gt;
&lt;p&gt;Topics I want to tackle in the future that may be worth a post&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Treating Buffers and Accessors correctly&lt;/li&gt;
&lt;li&gt;Parallel jobs&lt;/li&gt;
&lt;li&gt;Unity Math and Burst&lt;/li&gt;
&lt;li&gt;New Mesh API&lt;/li&gt;
&lt;li&gt;Speed up tangent calculations&lt;/li&gt;
&lt;li&gt;DOTS&lt;/li&gt;
&lt;li&gt;Coroutines vs. async&lt;/li&gt;
&lt;li&gt;Custom DownloadHandler&lt;/li&gt;
&lt;li&gt;Streaming support&lt;/li&gt;
&lt;li&gt;Draw call reduction due to Material batching (Property blocks)&lt;/li&gt;
&lt;li&gt;Comparisons with recent alternatives and benchmarks.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Stay tuned!&lt;/p&gt;
&lt;p&gt;Follow me on &lt;a href=&quot;https://mastodon.gamedev.place/@tteneder&quot;&gt;Mastodon&lt;/a&gt; or &lt;a href=&quot;https://bsky.app/profile/tteneder.bsky.social&quot;&gt;Bluesky&lt;/a&gt; or &lt;a href=&quot;https://pixel.engineer/feed.xml&quot;&gt;subscribe the feed&lt;/a&gt; to not miss updates.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Next: &lt;a href=&quot;https://pixel.engineer/posts/gltf-in-unity-optimization-1.-buffers-accessors-and-primitives&quot;&gt;1 Buffers, Accessors and Primitives&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Benchmarking Basis Universal Transcoding in Unity</title>
    
    <link href="https://pixel.engineer/posts/benchmarking-basis-universal-transcoding-in-unity/"/>
    <updated>2019-11-10T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/benchmarking-basis-universal-transcoding-in-unity/</id>
    <content type="html">&lt;p&gt;Basis Universal super-compressed textures are awesome (read the &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction&quot;&gt;details&lt;/a&gt;). Let me show you some benchmark results of it being integrated into Unity.&lt;/p&gt;
&lt;p&gt;In these test an 1024 by 1024 pixel images is loaded. One is color only, one has an alpha channel. At first it&#39;s loaded from Jpeg/PNG format and then via a BasisU super-compressed KTX 2.0 file (both pre-loaded into memory).&lt;/p&gt;
&lt;p&gt;Feel free to run the &lt;a href=&quot;https://gitlab.com/atteneder/ktxunity&quot;&gt;benchmark project&lt;/a&gt; yourself. It uses the &lt;a href=&quot;https://gitlab.com/atteneder/ktxunity&quot;&gt;KtxUnity package&lt;/a&gt; for loading the KTX files. Let me know if your results differ.&lt;/p&gt;
&lt;h2 id=&quot;macos-standalone-build&quot; tabindex=&quot;-1&quot;&gt;macOS standalone build&lt;/h2&gt;
&lt;p&gt;Let&#39;s load the image 50 times.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/basisu_benchmark/trout_jpeg_screenshot.jpg&quot; alt=&quot;&amp;quot;Screenshot of the benchmark demo loaded 50 jpeg images&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Tests are run on a MacBook Pro (2017) with an Intel(R) Core(TM) i7-7920HQ CPU @ 3.10GHz ( 4 cores, 8 threads ).&lt;/p&gt;
&lt;h3 id=&quot;jpeg-%2F-png&quot; tabindex=&quot;-1&quot;&gt;Jpeg / PNG&lt;/h3&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/basisu_benchmark/trout_jpeg_profiler.jpg&quot; alt=&quot;&amp;quot;Profiling data&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;For the color only trout Jpeg image&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Decoded to RGB 8, which occupies 4 MB per image&lt;/li&gt;
&lt;li&gt;Blocks the main thread for ~564 ms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For the PNG image with alpha channel it&#39;s even worse&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/basisu_benchmark/alpha_png_profiler.jpg&quot; alt=&quot;&amp;quot;Profiling data&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Decoded to ARGB 8, occupying 5.3 MB each&lt;/li&gt;
&lt;li&gt;1064 ms (yikes)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So there are two takeaways.&lt;/p&gt;
&lt;p&gt;First, we can see that all the work is happening on the main thread (done via &lt;a href=&quot;https://docs.unity3d.com/ScriptReference/ImageConversion.LoadImage.html&quot;&gt;Texture2D.LoadImage&lt;/a&gt;), which freezes the demo for up to a second. Maybe Unity makes an async version of LoadImage some day that&#39;ll run on a worker thread, but right now it&#39;s a clog.&lt;/p&gt;
&lt;p&gt;Secondly, the amount of RAM needed for each texture is huge!&lt;/p&gt;
&lt;h3 id=&quot;super-compressed-ktx-files&quot; tabindex=&quot;-1&quot;&gt;Super-compressed KTX files&lt;/h3&gt;
&lt;p&gt;Let&#39;s load the same content from KTX 2.0 files with Basis Universal compression.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/basisu_benchmark/trout_ktx_profiler.jpg&quot; alt=&quot;&amp;quot;Profiling data&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;This is the first frame of action. You can immediately see that the transcoding part is off-loaded to the worker threads (via the C# job system). Only the actual GPU upload has to be done on the main thread.&lt;/p&gt;
&lt;p&gt;In one of the following frames there&#39;s a lot of transcoding going on on the worker threads and while the frame time is too high, it&#39;s far from being as bad as with Jpeg/PNG.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;https://pixel.engineer/static/img/basisu_benchmark/trout_ktx_profiler2.jpg&quot; alt=&quot;&amp;quot;Profiling data&amp;quot;&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Results&lt;/p&gt;
&lt;p&gt;Color only trout Jpeg image&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transcoded to BC1 (DXT1) occupying 0.5 MB per image&lt;/li&gt;
&lt;li&gt;Takes ~95 ms spread across 4-5 frames&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Star image KTX with alpha channel&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Transcoded to BC7 RGBA occupying 1 MB per image&lt;/li&gt;
&lt;li&gt;Takes ~85 ms spread across 4-5 frames&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;It&#39;s mind-blowing. The transcoded images take up up to 8 times less memory and load an order of magnitudes faster.&lt;/p&gt;
&lt;h2 id=&quot;ios&quot; tabindex=&quot;-1&quot;&gt;iOS&lt;/h2&gt;
&lt;p&gt;Let&#39;s look at the memory consumption on an iPhone 8. The benchmark loads one image per frame until the app runs out of memory and is stopped.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://www.youtube.com/watch?v=sYzkwQvaFSc&quot;&gt;https://www.youtube.com/watch?v=sYzkwQvaFSc&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;I think the video speaks for itself. The sheer amount of additional image data you can consume with this technology is amazing.&lt;/p&gt;
&lt;h2 id=&quot;recap&quot; tabindex=&quot;-1&quot;&gt;Recap&lt;/h2&gt;
&lt;p&gt;If you have to load image data on WebGL or more than one platform and you can encode it at build-time or later on server-side, super-compression like Basis Universal is a fantastic solution!&lt;/p&gt;
&lt;p&gt;At the moment the tools are rather raw. For example you have to compile the CLI tools that encode those files yourself. I hope and assume that this will get easier/more accessible over time and more technology providers will offer those benefits to developers and users.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Cross-platform C/C++ Plugins in Unity</title>
    
    <link href="https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/"/>
    <updated>2019-06-24T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/</id>
    <content type="html">&lt;p&gt;Have you ever wanted to use a native library in Unity or write parts of your game in super portable, efficient C/C++?&lt;/p&gt;
&lt;p&gt;Recently I stumbled over an interesting C++ library and integrated it in a Unity project. Let me show you what I did.&lt;/p&gt;
&lt;p&gt;I&#39;ll focus more on the workflow and tools to create comprehensive multi-platform support than on the actual interface source code. There&#39;s already quite good documentation about that&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;h2 id=&quot;the-target&quot; tabindex=&quot;-1&quot;&gt;The target&lt;/h2&gt;
&lt;p&gt;The library in focus is &lt;a href=&quot;https://github.com/BinomialLLC/basis_universal&quot;&gt;Basis Universal&lt;/a&gt; (see this &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction&quot;&gt;other post&lt;/a&gt; for more info). It consists of:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;an encoder that converts images into the basis file format&lt;/li&gt;
&lt;li&gt;a transcoder that converts the basis file into GPU-friendly format at runtime&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The transcoder is what I was interested in. It&#39;s also a great example, since it has a straightforward API, consist of a single file (header files not counting) and has no 3rd party dependencies.&lt;/p&gt;
&lt;h2 id=&quot;let&#39;s-get-started&quot; tabindex=&quot;-1&quot;&gt;Let&#39;s get started&lt;/h2&gt;
&lt;p&gt;In this post I&#39;ll only give code excerpts and examples. To see the complete code look at &lt;a href=&quot;https://github.com/atteneder/BasisUniversalUnityBuild&quot;&gt;BasisUniversalUnityBuild&lt;/a&gt; (for building the native libraries) and &lt;a href=&quot;https://github.com/atteneder/BasisUniversalUnity&quot;&gt;BasisUniversalUnity&lt;/a&gt; (the matching C# interface).&lt;/p&gt;
&lt;h2 id=&quot;inspect-the-library&quot; tabindex=&quot;-1&quot;&gt;Inspect the library&lt;/h2&gt;
&lt;p&gt;First, let&#39;s look how it&#39;s supposed to be used&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;. In a nutshell like this (warning: pseudo code):&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;
&lt;span class=&quot;token macro property&quot;&gt;&lt;span class=&quot;token directive-hash&quot;&gt;#&lt;/span&gt;&lt;span class=&quot;token directive keyword&quot;&gt;include&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;transcoder/basisu_transcoder.h&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;// one-time initialization at startup&lt;/span&gt;
basist&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;basisu_transcoder_init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
basist&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;etc1_global_selector_codebook &lt;span class=&quot;token function&quot;&gt;sel_codebook&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;basist&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;g_global_selector_cb_size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; basist&lt;span class=&quot;token double-colon punctuation&quot;&gt;::&lt;/span&gt;g_global_selector_cb&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// load the basis file content into a buffer&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;uint8_t&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; data &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;
size_t length &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// creating a basis_file instance&lt;/span&gt;
basis_file&lt;span class=&quot;token operator&quot;&gt;*&lt;/span&gt; new_basis &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;new&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;basis_file&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;data&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;length&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// and now you can do things like retrieving the image size&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;uint32_t&lt;/span&gt; image_index &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// first image&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;uint32_t&lt;/span&gt; level_index &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt; &lt;span class=&quot;token comment&quot;&gt;// highest mipmap level&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;uint32_t&lt;/span&gt; width &lt;span class=&quot;token operator&quot;&gt;=&lt;/span&gt; new_basis&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;getImageWidth&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;image_index&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;level_index&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;// and finally transcoding&lt;/span&gt;
new_basis&lt;span class=&quot;token operator&quot;&gt;-&gt;&lt;/span&gt;&lt;span class=&quot;token function&quot;&gt;transcodeImage&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;destination_buffer&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; dst_size&lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; image_index &lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt;level_index &lt;span class=&quot;token punctuation&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&quot;wrapping&quot; tabindex=&quot;-1&quot;&gt;Wrapping&lt;/h2&gt;
&lt;p&gt;You can call native C functions from C# by using the &lt;code&gt;DllImport&lt;/code&gt; method attribute&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/#fn1&quot; id=&quot;fnref1:1&quot;&gt;[1:1]&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;
&lt;pre class=&quot;language-csharp&quot;&gt;&lt;code class=&quot;language-csharp&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;token attribute&quot;&gt;&lt;span class=&quot;token class-name&quot;&gt;DllImport&lt;/span&gt; &lt;span class=&quot;token attribute-arguments&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token string&quot;&gt;&quot;PluginName&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;]&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;private&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;static&lt;/span&gt; &lt;span class=&quot;token keyword&quot;&gt;extern&lt;/span&gt; &lt;span class=&quot;token return-type class-name&quot;&gt;&lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt;&lt;/span&gt; some_function &lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;No, I did not forget the &lt;code&gt;++&lt;/code&gt;. You can only call pure C functions and use pure C data types (no C++ classes) from C#. Luckily C and C++ are compatible, so we need to wrap the logic within C functions like this:&lt;/p&gt;
&lt;pre class=&quot;language-cpp&quot;&gt;&lt;code class=&quot;language-cpp&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;// the extern &quot;C&quot; block tells the compiler to use C name mangling instead of C++&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;extern&lt;/span&gt; &lt;span class=&quot;token string&quot;&gt;&quot;C&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;token function&quot;&gt;aa_basis_init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;{&lt;/span&gt;
        &lt;span class=&quot;token function&quot;&gt;basisu_transcoder_init&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;;&lt;/span&gt;
    &lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;h2 id=&quot;build-via-cmake&quot; tabindex=&quot;-1&quot;&gt;Build via CMake&lt;/h2&gt;
&lt;p&gt;After writing all C wrapper code we need to build the libraries. In the end, I want this to work on multiple platforms:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;WebGL&lt;/li&gt;
&lt;li&gt;macOS 64-bit&lt;/li&gt;
&lt;li&gt;Windows
&lt;ul&gt;
&lt;li&gt;32-bit&lt;/li&gt;
&lt;li&gt;64-bit&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;iOS
&lt;ul&gt;
&lt;li&gt;armv7&lt;/li&gt;
&lt;li&gt;armv7s&lt;/li&gt;
&lt;li&gt;arm64&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Android
&lt;ul&gt;
&lt;li&gt;armeabi-v7a&lt;/li&gt;
&lt;li&gt;arm64-v8a&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Linux 64-bit&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The library needs to be built for each one of these, so it makes sense to use a build system. Since I&#39;m familiar with it, I chose &lt;a href=&quot;https://cmake.org/&quot;&gt;CMake&lt;/a&gt;.&lt;/p&gt;
&lt;h3 id=&quot;cmake-rules&quot; tabindex=&quot;-1&quot;&gt;CMake rules&lt;/h3&gt;
&lt;p&gt;All you have to do is describe to CMake what you want to build/compile and it takes care of creating projects and invoking the right compiler/linker commands. This description has a certain syntax and needs to be saved to a &lt;code&gt;CMakeLists.txt&lt;/code&gt; file. This is how a minimal version looks like:&lt;/p&gt;
&lt;pre class=&quot;language-cmake&quot;&gt;&lt;code class=&quot;language-cmake&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# define a CMake version&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;cmake_minimum_required&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token property&quot;&gt;VERSION&lt;/span&gt; &lt;span class=&quot;token number&quot;&gt;3.0&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# give the project a name&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;project&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;basisu_transcoder&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# create a list of files to compile&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;BASISU_SRC_LIST
    &lt;span class=&quot;token comment&quot;&gt;# this is the transcoder itself&lt;/span&gt;
    basis_universal/transcoder/basisu_transcoder.cpp
    &lt;span class=&quot;token comment&quot;&gt;# this is my wrapper code&lt;/span&gt;
    basisu_wrapper/basisu_wrapper.cpp
    &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# add a library target&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;add_library&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;basisu &lt;span class=&quot;token namespace&quot;&gt;SHARED&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;${&lt;/span&gt;BASISU_SRC_LIST&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# In order to find header files, we have to tell the compiler in which &lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;# folders he&#39;s got to search: include directories&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;target_include_directories&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
    basisu
    &lt;span class=&quot;token namespace&quot;&gt;PUBLIC&lt;/span&gt;
    basis_universal/transcoder
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In theory you can now &lt;a href=&quot;https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/#configure-build-and-generate&quot;&gt;invoke CMake&lt;/a&gt; and build the library. Unfortunately in order to get all platform libraries in a way they work in Unity I had fix some more things:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;macOS Unity requires a &lt;code&gt;.bundle&lt;/code&gt; file library without the &lt;code&gt;lib&lt;/code&gt; prefix.&lt;/li&gt;
&lt;li&gt;iOS
&lt;ul&gt;
&lt;li&gt;lib has to be static&lt;/li&gt;
&lt;li&gt;A special toolchain is needed&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Android
&lt;ul&gt;
&lt;li&gt;special toolchain&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Windows
&lt;ul&gt;
&lt;li&gt;Had to setup everything on another machine&lt;/li&gt;
&lt;li&gt;Enable &lt;code&gt;__declspec&lt;/code&gt; via compile definition (pre-processor) when using MSVC compiler&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Installing
&lt;ul&gt;
&lt;li&gt;Install lib to correct destination folders in Unity project&lt;/li&gt;
&lt;li&gt;Copy the sources for WebGL builds&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I won&#39;t go into all details, but I want to point out the &lt;em&gt;Installing&lt;/em&gt; part.&lt;/p&gt;
&lt;h3 id=&quot;install-to-correct-destination&quot; tabindex=&quot;-1&quot;&gt;Install to correct destination&lt;/h3&gt;
&lt;p&gt;Within the consuming Unity project&#39;s Assets folder, we need our libraries at certain destinations. Here&#39;s an (incomplete) overview how it should look like:&lt;/p&gt;
&lt;pre&gt;&lt;code&gt;Plugins
├── Android
│   └── libs
│       └── arm64-v8a
│           └── libbasisu.so
├── WebGL
│   ├── basisu_wrapper.cpp
│   ├── basisu_transcoder.cpp
│   └── ...some more header files
├── iOS
│   └── libbasisu.a
├── x86
│   └── basisu.dll
└── x86_64
    ├── basisu.dll
    ├── basisu.bundle
    └── basisu.so
&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;In order not to have to manually copy every library file into the Unity project, I tweaked the CMake install target.&lt;/p&gt;
&lt;pre class=&quot;language-cmake&quot;&gt;&lt;code class=&quot;language-cmake&quot;&gt;...
&lt;span class=&quot;token comment&quot;&gt;# first, add a path parameter/option where the user can provide the path&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; BASIS_UNIVERSAL_UNITY_PATH &lt;span class=&quot;token string&quot;&gt;&quot;&quot;&lt;/span&gt; &lt;span class=&quot;token variable&quot;&gt;CACHE&lt;/span&gt; PATH &lt;span class=&quot;token string&quot;&gt;&quot;Path locating the BasisUniversalUnity package source. When installing, native libraries will get injected there&quot;&lt;/span&gt; &lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Set the output sub-path within the package project&lt;/span&gt;
&lt;span class=&quot;token comment&quot;&gt;# In this case for standalone x64 targets&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; DEST_PLUGIN_PATH &lt;span class=&quot;token string&quot;&gt;&quot;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;BASIS_UNIVERSAL_UNITY_PATH&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;/Runtime/Plugins/x86_64&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# We can check if it&#39;s correct by logging it&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;message&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;STATUS &lt;span class=&quot;token string&quot;&gt;&quot;Will install native libs to &lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;DEST_PLUGIN_PATH&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Check if the path actually exists&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; &lt;span class=&quot;token operator&quot;&gt;NOT&lt;/span&gt; EXISTS &lt;span class=&quot;token punctuation&quot;&gt;${&lt;/span&gt;DEST_PLUGIN_PATH&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;token keyword&quot;&gt;message&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;SEND_ERROR &lt;span class=&quot;token string&quot;&gt;&quot;Invalid path!&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;endif&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Finally tell CMake to install to our path&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;install&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;TARGETS basisu DESTINATION &lt;span class=&quot;token punctuation&quot;&gt;${&lt;/span&gt;DEST_PLUGIN_PATH&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;This was a huge time saver!&lt;/p&gt;
&lt;h3 id=&quot;webgl&quot; tabindex=&quot;-1&quot;&gt;WebGL&lt;/h3&gt;
&lt;p&gt;WebGL is a special case when it comes to native libraries. You cannot build/provide a pre-compiled library for WebGL, but you place all C++ source files within the &lt;code&gt;Assets/Plugins/WebGL&lt;/code&gt; folder. Unity will detect and include them when compiling the rest of the game code (which is also C++ generated from C# via &lt;a href=&quot;https://docs.unity3d.com/Manual/IL2CPP.html&quot;&gt;IL2CPP&lt;/a&gt;).&lt;/p&gt;
&lt;p&gt;All we need to do is copy the right source files over to this destination, which I also did via CMake:&lt;/p&gt;
&lt;pre class=&quot;language-cmake&quot;&gt;&lt;code class=&quot;language-cmake&quot;&gt;&lt;span class=&quot;token comment&quot;&gt;# Set the right sub-folder&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt; UNITY_PLUGIN_DIR_WEBGL &lt;span class=&quot;token string&quot;&gt;&quot;&lt;span class=&quot;token interpolation&quot;&gt;&lt;span class=&quot;token punctuation&quot;&gt;${&lt;/span&gt;&lt;span class=&quot;token variable&quot;&gt;BASIS_UNIVERSAL_UNITY_PATH&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;&lt;/span&gt;/Runtime/Plugins/WebGL&quot;&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;token comment&quot;&gt;# Let the install target copy the listed source files&lt;/span&gt;
&lt;span class=&quot;token keyword&quot;&gt;install&lt;/span&gt;&lt;span class=&quot;token punctuation&quot;&gt;(&lt;/span&gt;
FILES
    &lt;span class=&quot;token comment&quot;&gt;# The two source files&lt;/span&gt;
    basisu_wrapper/basisu_wrapper.cpp
    basis_universal/transcoder/basisu_transcoder.cpp

    &lt;span class=&quot;token comment&quot;&gt;# Plus all the header files they reference&lt;/span&gt;
    basis_universal/transcoder/basisu_transcoder.h
    basis_universal/transcoder/basisu_transcoder_internal.h
    basis_universal/transcoder/basisu_global_selector_cb.h
    basis_universal/transcoder/basisu_transcoder_tables_bc7_m6.inc
    basis_universal/transcoder/basisu_global_selector_palette.h
    basis_universal/transcoder/basisu.h
    basis_universal/transcoder/basisu_transcoder_tables_dxt1_6.inc
    basis_universal/transcoder/basisu_file_headers.h
    basis_universal/transcoder/basisu_transcoder_tables_dxt1_5.inc
DESTINATION
    &lt;span class=&quot;token punctuation&quot;&gt;${&lt;/span&gt;UNITY_PLUGIN_DIR_WEBGL&lt;span class=&quot;token punctuation&quot;&gt;}&lt;/span&gt;
&lt;span class=&quot;token punctuation&quot;&gt;)&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;
&lt;p&gt;The only down-side is that this is always done as part of another platform&#39;s build (in this case redundantly every other platform), but that&#39;s not really a problem.&lt;/p&gt;
&lt;h3 id=&quot;configure-build-and-generate&quot; tabindex=&quot;-1&quot;&gt;Configure build and generate&lt;/h3&gt;
&lt;p&gt;Now we have a final &lt;a href=&quot;https://github.com/atteneder/BasisUniversalUnityBuild/blob/main/CMakeLists.txt&quot;&gt;CMakeLists.txt&lt;/a&gt; and we are ready to generate all Xcode / Visual Studio projects or Makefiles and fire up the compilers.&lt;/p&gt;
&lt;p&gt;The extensive documentation how this is done can be found at the &lt;a href=&quot;https://github.com/atteneder/BasisUniversalUnityBuild&quot;&gt;BasisUniversalUnityBuild project site&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;future-work&quot; tabindex=&quot;-1&quot;&gt;Future work&lt;/h2&gt;
&lt;h3 id=&quot;linux&quot; tabindex=&quot;-1&quot;&gt;Linux&lt;/h3&gt;
&lt;p&gt;I haven&#39;t built the Linux libraries yet. Definitely will, especially since the Unity Linux Editor is coming out of preview later this year.&lt;/p&gt;
&lt;h3 id=&quot;continuous-integration&quot; tabindex=&quot;-1&quot;&gt;Continuous Integration&lt;/h3&gt;
&lt;p&gt;Although I made some small improvements, building a library for so many platforms requires a lot of small steps. It would be nice to further automate the building process, so updating the library is fast and less error prone.&lt;/p&gt;
&lt;p&gt;This means either having multiple build workers with different platforms or doing more cross-compiling. I already cross-compiled for iOS and Android from my machine (macOS).Adding Windows and Linux as target platform would be nice for development. On the other hand I&#39;d love to be able to cross-compile all variants from one server platform (preferably Linux). Maybe I&#39;ll look into that some day.&lt;/p&gt;
&lt;h3 id=&quot;unity-package&quot; tabindex=&quot;-1&quot;&gt;Unity Package&lt;/h3&gt;
&lt;p&gt;To be able to re-use the library (all native libs plus the C# interface code) amongst multiple Unity projects easily, the proper way is to create a custom Unity package. I did this in this particular case ( see &lt;a href=&quot;https://github.com/atteneder/BasisUniversalUnity&quot;&gt;BasisUniversalUnity on GitHub&lt;/a&gt; ) and plan to write about it in the future.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://docs.unity3d.com/Manual/NativePlugins.html&quot;&gt;https://docs.unity3d.com/Manual/NativePlugins.html&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/#fnref1:1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://github.com/BinomialLLC/basis_universal#transcoder-details&quot;&gt;https://github.com/BinomialLLC/basis_universal#transcoder-details&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/cross-platform-cc++-plugins-in-unity/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
  
  <entry>
    <title>Basis Universal Texture Format Introduction</title>
    
    <link href="https://pixel.engineer/posts/basis-universal-texture-format-introduction/"/>
    <updated>2019-06-19T00:00:00Z</updated>
    <id>https://pixel.engineer/posts/basis-universal-texture-format-introduction/</id>
    <content type="html">&lt;blockquote&gt;
&lt;p&gt;TL;DR: Basis Universal is &lt;strong&gt;the&lt;/strong&gt; image format for cross platform/API usage in graphics/GPU contexts.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;It is also super useful for creating a platform/API/framework agnostic 3D content format. For textures, it is the missing piece in that puzzle.&lt;/p&gt;
&lt;h3 id=&quot;intro&quot; tabindex=&quot;-1&quot;&gt;Intro&lt;/h3&gt;
&lt;p&gt;It&#39;s pretty straightforward to put an image on a website. Save it as a Jpeg or PNG, add an &lt;code&gt;&amp;lt;img&amp;gt;&lt;/code&gt; tag to your HTML and it&#39;ll work on all browsers across all platforms.&lt;/p&gt;
&lt;p&gt;Some people (me included) also want to do this in graphics applications (e.g. 2D/3D game) that run on GPUs. To do this, you need the texture in a special format, the graphics API and GPU can work with.&lt;/p&gt;
&lt;p&gt;The objectives are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Images/Textures to load as fast as possible
&lt;ul&gt;
&lt;li&gt;Small file/download/storage size&lt;/li&gt;
&lt;li&gt;Small runtime load overhead&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;Straightforward content creation pipeline&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;gpu-texture-formats&quot; tabindex=&quot;-1&quot;&gt;GPU texture formats&lt;/h3&gt;
&lt;p&gt;GPUs can handle only certain types of texture data.&lt;br /&gt;
Apart from uncompressed formats (bitmaps; huge in size) there are a couple of compressed formats, also called GPU-friendly formats. They have a lossy compression and thus are small in size (~8 times smaller than bitmaps), yet GPU texture units are able to access them very efficiently (random access). Examples are DXT1/BC1&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn1&quot; id=&quot;fnref1&quot;&gt;[1]&lt;/a&gt;&lt;/sup&gt;, PVRTC&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn2&quot; id=&quot;fnref2&quot;&gt;[2]&lt;/a&gt;&lt;/sup&gt;, ETC2&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn3&quot; id=&quot;fnref3&quot;&gt;[3]&lt;/a&gt;&lt;/sup&gt; or ASTC&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn4&quot; id=&quot;fnref4&quot;&gt;[4]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;Unfortunately there isn&#39;t a single compressed GPU texture format that is supported throughout all GPUs and graphic APIs.&lt;/p&gt;
&lt;p&gt;If you want to provide a broad range of target devices/platforms with their respective ideal format choice, you have to encode every texture into a bunch of different formats.&lt;/p&gt;
&lt;p&gt;To make matters worse, I&#39;m not aware of a single encoding tool that supports all or most formats, so you&#39;ll also need to install a handful of encoders.&lt;/p&gt;
&lt;p&gt;Note: If you&#39;re using a game engine or graphics framework (like Unity3D), chances are it assists you with automatic conversions by abstracting them away...maybe not always in the most efficient way.&lt;/p&gt;
&lt;h3 id=&quot;decoding-%2F-encoding-at-runtime&quot; tabindex=&quot;-1&quot;&gt;Decoding / Encoding at runtime&lt;/h3&gt;
&lt;p&gt;If your GPU does not support the format you have directly, you have the option to convert it at runtime. For example decode a PNG file into an uncompressed texture. The smaller PNG gives you a decent download size, but it has two downsides:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Decoding is slow&lt;/li&gt;
&lt;li&gt;The decompressed bitmap will use a lot of video RAM at the end&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The second point could be fixed if you encode the bitmap into a supported GPU-friendly format after decoding, but that would make the loading process even slower.&lt;/p&gt;
&lt;h3 id=&quot;enter-basis-universal-and-transcoding&quot; tabindex=&quot;-1&quot;&gt;Enter Basis Universal and Transcoding&lt;/h3&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/BinomialLLC/basis_universal&quot;&gt;Basis Universal&lt;/a&gt; is a supercompressed intermediate texture format. BasisU files can be transcoded into GPU friendly formats at runtime.&lt;/p&gt;
&lt;p&gt;The transcoding is quite fast and it does &lt;strong&gt;not&lt;/strong&gt; create an interim bitmap.&lt;/p&gt;
&lt;p&gt;For technical details, jump right to the repository/code&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn5&quot; id=&quot;fnref5&quot;&gt;[5]&lt;/a&gt;&lt;/sup&gt; or read this related paper&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn6&quot; id=&quot;fnref6&quot;&gt;[6]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;So BasisU fulfills all requirements&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Small file sizes for storage/download (compression)&lt;/li&gt;
&lt;li&gt;Fast transcoding&lt;/li&gt;
&lt;li&gt;Low video RAM usage (due to GPU-friendly formats)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Seriously, these &lt;a href=&quot;https://github.com/BinomialLLC/basis_universal/blob/lion_bench/bench/bench.txt&quot;&gt;benchmarks&lt;/a&gt; speak for themselves.&lt;/p&gt;
&lt;p&gt;And on top of that:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It&#39;s open source! (Apache License 2.0)&lt;/li&gt;
&lt;li&gt;The creators of it (Binomial&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn7&quot; id=&quot;fnref7&quot;&gt;[7]&lt;/a&gt;&lt;/sup&gt;) are working with the Khronos group on the specification to make it an industry standard.&lt;/li&gt;
&lt;li&gt;It is supported by Google&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn8&quot; id=&quot;fnref8&quot;&gt;[8]&lt;/a&gt;&lt;/sup&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;basis-universal-for-unity&quot; tabindex=&quot;-1&quot;&gt;Basis Universal for Unity&lt;/h3&gt;
&lt;p&gt;When BasisU was released, I got excited and created a small wrapper/library so you can load it in the &lt;a href=&quot;http://unity3d.com/&quot;&gt;Unity3D&lt;/a&gt; game engine. You can find it on my GitHub:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/atteneder/BasisUniversalUnity&quot;&gt;https://github.com/atteneder/BasisUniversalUnity&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Let me know what you think. I&#39;ll try to prevent it from going to the graveyard of side projects before it is usable.&lt;/p&gt;
&lt;p&gt;I think that if BasisU gains the traction it has the potential to, eventually Unity will support it by default and maybe even use it for texture/asset transmission internally.&lt;/p&gt;
&lt;h3 id=&quot;generic-3d-assets%3A-gltf&quot; tabindex=&quot;-1&quot;&gt;Generic 3D assets: glTF&lt;/h3&gt;
&lt;p&gt;As mentioned above, efficient texture transmission is key for having a universal 3D asset format.&lt;/p&gt;
&lt;p&gt;We&#39;re talking about a &amp;quot;last mile&amp;quot; format, with the intention to deliver final content to the end user. Not a format to deliver assets between digital content creation applications.&lt;/p&gt;
&lt;p&gt;The Khronos group made an effort to give us a format like this called glTF&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn9&quot; id=&quot;fnref9&quot;&gt;[9]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;It is getting some attention lately. For example you can load it in Google&#39;s &lt;a href=&quot;https://developers.google.com/web/updates/2019/02/model-viewer&quot;&gt;model-viewer web component&lt;/a&gt; and view it directly in 3D or augmented reality. Also facebook does support posting 3D content via glTF-binary&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn10&quot; id=&quot;fnref10&quot;&gt;[10]&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;
&lt;p&gt;At the moment textures in glTF are encoded in either PNG or Jpeg, which suck, as I explained before.&lt;/p&gt;
&lt;p&gt;But Khronos is working on including BasisU in glTF&lt;sup class=&quot;footnote-ref&quot;&gt;&lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fn11&quot; id=&quot;fnref11&quot;&gt;[11]&lt;/a&gt;&lt;/sup&gt;, so the future for this format is bright, once the industry starts to adopt to it.&lt;/p&gt;
&lt;p&gt;Oh yeah, I&#39;ve also created this Unity glTF loading library called glTFast (because it&#39;s really fast). I&#39;ll try to get BasisU support in there as well. Also on my GitHub:&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://github.com/atteneder/glTFast&quot;&gt;https://github.com/atteneder/glTFast&lt;/a&gt;&lt;/p&gt;
&lt;h3 id=&quot;final-notes&quot; tabindex=&quot;-1&quot;&gt;Final notes&lt;/h3&gt;
&lt;p&gt;To stay up to date on this topic I recommend to follow the following sources:&lt;/p&gt;
&lt;p&gt;The &lt;a href=&quot;https://github.com/KhronosGroup/glTF/pull/1612&quot;&gt;glTF draft pull request&lt;/a&gt; regarding compressed texture formats.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://x.com/sehurlburt&quot;&gt;Stephanie Hurlburt&lt;/a&gt; and &lt;a href=&quot;https://x.com/richgel999&quot;&gt;Richard Geldreich&lt;/a&gt;, the people behind &lt;a href=&quot;https://x.com/_binomial&quot;&gt;Binomial&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Richard also has a &lt;a href=&quot;http://richg42.blogspot.com/&quot;&gt;blog&lt;/a&gt;, where (amongst other things) he gives lots of insight into texture encoding details.&lt;/p&gt;
&lt;p&gt;Star/follow the repositories of &lt;a href=&quot;https://github.com/BinomialLLC/basis_universal&quot;&gt;Basis Universal&lt;/a&gt; and &lt;a href=&quot;https://github.com/KhronosGroup/glTF&quot;&gt;glTF&lt;/a&gt; on GitHub.&lt;/p&gt;
&lt;p&gt;If you liked this read, feel free to&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;https://ko-fi.com/C0C3BW7G&quot;&gt;&lt;img src=&quot;https://www.ko-fi.com/img/githubbutton_sm.svg&quot; alt=&quot;ko-fi&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;hr class=&quot;footnotes-sep&quot; /&gt;
&lt;section class=&quot;footnotes&quot;&gt;
&lt;ol class=&quot;footnotes-list&quot;&gt;
&lt;li id=&quot;fn1&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/S3_Texture_Compression&quot;&gt;https://en.wikipedia.org/wiki/S3_Texture_Compression&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref1&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn2&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://www.imgtec.com/blog/pvrtc-the-most-efficient-texture-compression-standard-for-the-mobile-graphics-world/&quot;&gt;https://www.imgtec.com/blog/pvrtc-the-most-efficient-texture-compression-standard-for-the-mobile-graphics-world/&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref2&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn3&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://en.wikipedia.org/wiki/Ericsson_Texture_Compression&quot;&gt;https://en.wikipedia.org/wiki/Ericsson_Texture_Compression&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref3&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn4&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://www.khronos.org/opengl/wiki/ASTC_Texture_Compression&quot;&gt;https://www.khronos.org/opengl/wiki/ASTC_Texture_Compression&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref4&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn5&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://github.com/BinomialLLC/basis_universal&quot;&gt;https://github.com/BinomialLLC/basis_universal&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref5&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn6&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;http://gamma.cs.unc.edu/GST/gst.pdf&quot;&gt;http://gamma.cs.unc.edu/GST/gst.pdf&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref6&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn7&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;http://www.binomial.info/&quot;&gt;http://www.binomial.info&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref7&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn8&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://opensource.googleblog.com/2019/05/google-and-binomial-partner-to-open.html&quot;&gt;https://opensource.googleblog.com/2019/05/google-and-binomial-partner-to-open.html&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref8&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn9&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://www.khronos.org/gltf/&quot;&gt;https://www.khronos.org/gltf/&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref9&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn10&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://developers.facebook.com/docs/sharing/3d-posts/glb-tutorials/&quot;&gt;https://developers.facebook.com/docs/sharing/3d-posts/glb-tutorials/&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref10&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li id=&quot;fn11&quot; class=&quot;footnote-item&quot;&gt;&lt;p&gt;&lt;a href=&quot;https://www.khronos.org/blog/google-and-binomial-contribute-basis-universal-texture-format-to-khronos-gltf-3d-transmission-open-standard&quot;&gt;https://www.khronos.org/blog/google-and-binomial-contribute-basis-universal-texture-format-to-khronos-gltf-3d-transmission-open-standard&lt;/a&gt; &lt;a href=&quot;https://pixel.engineer/posts/basis-universal-texture-format-introduction/#fnref11&quot; class=&quot;footnote-backref&quot;&gt;↩︎&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;/section&gt;
</content>
  </entry>
</feed>