Jekyll2022-04-08T13:50:26+00:00https://iamgweej.github.io//feed.xmlgweej-nostdlib development for fun and profit!Zig Ray Tracing In A Weekend2020-09-28T01:00:00+00:002020-09-28T01:00:00+00:00https://iamgweej.github.io//jekyll/update/2020/09/28/zig-ray-tracing<p>For a while, I’ve been wanting to hone my Zig skills and also have some cool product to show for it. So I tried follwing Peter Shirley’s <a href="https://raytracing.github.io/books/RayTracingInOneWeekend.html"><em>Ray Tracing in One Weekend</em></a>, but ofcourse, adapting it to Zig.</p>
<p>First of all, I must say, that Shirley’s treatment is just <em>amazing</em>. I learnt a lot, and the hands on experience of generating images at each stage after adding more and more features really motivated me to keep going.</p>
<p>Secondly, doing this mini-project made me appreciate Zig’s simplicity a lot. I really like the “interface pattern” that Zig uses. I find it more verbose than other languages’ implementations for runtime polymorphism. No more sneaky vtables and dynamic dispatch lurking in the shadows. What you see is what you get.</p>
<p>It’s a pretty short post, so I think ill just add an image I generated, and <a href="https://github.com/iamgweej/zigrtrc">link</a> to to projects github:</p>
<p><img src="https://raw.githubusercontent.com/iamgweej/iamgweej.github.io/master/_images/ZigRayTracingInAWeekend.png" alt="Outcome" /></p>For a while, I’ve been wanting to hone my Zig skills and also have some cool product to show for it. So I tried follwing Peter Shirley’s Ray Tracing in One Weekend, but ofcourse, adapting it to Zig.Zigtastic Async2020-07-07T04:00:00+00:002020-07-07T04:00:00+00:00https://iamgweej.github.io//jekyll/update/2020/07/07/zigtastic-async<h2 id="goal">Goal</h2>
<p>The goal of this post is to give us some basic intuition about the machine code generated by the <a href="https://ziglang.org/">Zig</a> compiler when we use <em>asynchronous functions</em>. We’ll start by digging into the simplest asynchronous function there is, slowly adding features like local variables, parameters, multi-threaded support, <code class="language-plaintext highlighter-rouge">await</code> and more.</p>
<p>The only tool I’ll use in this expedition is the <a href="https://godbolt.org/">Godbolt Compiler Explorer</a>, and maybe an <a href="https://www.felixcloutier.com/x86/">x86_64 instruction reference</a>.</p>
<p>I’m going to assume some standard knowledge and mention concepts like registers, memory, stack frames and the heap without any elaboration. I’m also not going to into details about the syntax of Zig, C or intel x86_64 assembly. I don’t think knowing Zig is a preliminary for this blogpost, since the syntax is pretty straightforward in my opinion, but feel free to check the <a href="https://ziglang.org/documentation/0.6.0/">official Zig language documentation</a>.</p>
<h2 id="what-the-fuck-is-zig">What the fuck is Zig</h2>
<p>So. What is Zig? According to their site:</p>
<blockquote>
<p>Zig is a general-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.</p>
</blockquote>
<p>Sounds cool, doesn’t tell you a whole lot when you think about it. This is not a Zig tutorial or spotlight, so I won’t go into a long monologue about why I think it’s actually a great programming language, I’ll just mention the main features:</p>
<ul>
<li>It’s a <a href="https://en.wikipedia.org/wiki/Compiled_language">compiled language</a>. That is, Zig source code is used to generate machine code that runs <em>natively</em> on the target machine. No VM, no runtime.</li>
<li>It has no hidden control flow: if something doesn’t look like a function call, it isn’t a function call.</li>
<li>It has four build modes: “debug”, “release-safe”, “release-fast” and “release-small”. In this blogpost, we’ll compile with “debug”.</li>
<li>It supports coroutines.</li>
</ul>
<h2 id="why-the-fuck-is-zig">Why the fuck is Zig</h2>
<p>The reason I chose Zig as the focus of my first few blogposts about asynchronous programming in compiled languages is its <em>simplicity</em>.</p>
<p>Zig supports very “low-level” features of coroutines, using only four keywords: <code class="language-plaintext highlighter-rouge">async</code>, <code class="language-plaintext highlighter-rouge">suspend</code>, <code class="language-plaintext highlighter-rouge">resume</code> and <code class="language-plaintext highlighter-rouge">await</code>. Essentially, it is exactly the “bare-minimum” of asynchronous programming. No hidden event loop, no <code class="language-plaintext highlighter-rouge">yield</code>, just “napping functions”. This is a great candidate for the first implementation to look into, since C++ for example supports generators and <code class="language-plaintext highlighter-rouge">yield</code>s in its standard.</p>
<h2 id="environment">Environment</h2>
<p>I’m going to use Godbolt’s Zig 0.6.0 compiler, and compile in <code class="language-plaintext highlighter-rouge">debug</code> and <code class="language-plaintext highlighter-rouge">--single-threaded</code>.</p>
<h2 id="a-simple-program">A simple program</h2>
<p>Let’s start with simplest async program imaginable:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">afoo</span><span class="p">()</span> <span class="k">void</span> <span class="p">{</span>
<span class="k">suspend</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">export</span> <span class="k">fn</span> <span class="n">caller</span><span class="p">()</span> <span class="k">void</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">frame</span> <span class="o">=</span> <span class="k">async</span> <span class="n">afoo</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>We are going to go through the <em>entire</em> blob of machine-code generated by this small snippet. It might be a bit long, but I found it very educating. Afterwards, we’ll sum up our findings and list the questions that we didn’t answer yet.</p>
<p>We’ll compile it with <code class="language-plaintext highlighter-rouge">--single-threaded</code>, since there is a slight difference I saw when compiling without this flag, which might complicate things. Don’t worry, we’ll get back to that!</p>
<p>Stick this sucker in Godbolt, and let’s look at the <a href="https://godbolt.org/z/8NuCF2">output</a>…</p>
<p>
<details>
<summary><b>Click me for assembly and shit</b></summary>
<iframe width="100%" height="1000px" src="https://godbolt.org/e#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,j:1,lang:zig,source:'//+Type+your+code+here,+or+load+an+example.%0Afn+afoo()+void+%7B%0A++++suspend%3B%0A%7D%0A%0Aexport+fn+caller()+void+%7B%0A++++var+frame+%3D+async+afoo()%3B%0A%7D%0A'),l:'5',n:'0',o:'Zig+source+%231',t:'0')),k:100,l:'4',m:32.861635220125784,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:z060,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'0',trim:'1'),fontScale:14,j:1,lang:zig,libs:!(),options:'--single-threaded',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'zig+0.6.0+(Editor+%231,+Compiler+%231)+Zig',t:'0')),header:(),l:'4',m:67.13836477987421,n:'0',o:'',s:0,t:'0')),l:'3',n:'0',o:'',t:'0')),version:4"></iframe>
</details>
</p>
<p>Well. That’s a lot of code. We’ll go through it step by step.</p>
<h2 id="analysis">Analysis</h2>
<p>It starts with the standard prolog:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">caller:</span>
<span class="nf">push</span> <span class="nb">rbp</span>
<span class="nf">mov</span> <span class="nb">rbp</span><span class="p">,</span> <span class="nb">rsp</span>
<span class="nf">sub</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">32</span>
</code></pre></div></div>
<p>which means the <code class="language-plaintext highlighter-rouge">frame</code> variable is 32 bytes wide. We’ll write that down, and update it as we go.</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">afoo_frame</span> <span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">unknown</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="p">};</span>
</code></pre></div></div>
<p>Godbolt tells us which line generates which instructions. So we see that the line:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">var</span> <span class="n">frame</span> <span class="o">=</span> <span class="k">async</span> <span class="n">afoo</span><span class="p">();</span>
</code></pre></div></div>
<p>generates:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nf">movabs</span> <span class="nb">rax</span><span class="p">,</span> <span class="nv">offset</span> <span class="nv">afoo</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">32</span><span class="p">],</span> <span class="nb">rax</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">24</span><span class="p">],</span> <span class="mi">0</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">16</span><span class="p">],</span> <span class="mi">0</span>
<span class="nf">lea</span> <span class="nb">rdi</span><span class="p">,</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">32</span><span class="p">]</span>
<span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="o">-</span><span class="mi">3</span>
<span class="nf">call</span> <span class="nv">afoo</span>
</code></pre></div></div>
<p>There are quite a few things going on here. We put the offset of <code class="language-plaintext highlighter-rouge">afoo()</code> in the first 8 bytes of the <code class="language-plaintext highlighter-rouge">afoo_frame</code> struct. We also put zeros in the 8th and 16th offsets of the structure. This makes us suspect that the structure is actually 24 bytes wide, and <code class="language-plaintext highlighter-rouge">sub rsp, 32</code> was emitted to keep the stack 16-bytes aligned (or something of that sort). This makes us guess it looks a bit like this:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="nc">afoo_frame</span> <span class="p">{</span>
<span class="n">fptr_t</span> <span class="n">func</span><span class="p">;</span> <span class="c1">// The async function the frame holds</span>
<span class="kt">uint64_t</span> <span class="n">unknown2</span><span class="p">;</span> <span class="c1">// Initialized to 0</span>
<span class="kt">uint64_t</span> <span class="n">unknown3</span><span class="p">;</span> <span class="c1">// Initialized to 0</span>
<span class="p">};</span>
</code></pre></div></div>
<p>What’s the point of putting the function pointer in the frame? Remember, when we use <code class="language-plaintext highlighter-rouge">async</code>, we can continue a frame without knowing which function we’re calling. We’re continuing a <em>frame</em>, not a specific <em>function</em>. This means the frame has to hold, in some manner, the function we’re about to call.</p>
<p>Now, the actual call to <code class="language-plaintext highlighter-rouge">afoo()</code> looks like this:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">afoo</span><span class="p">(</span>
<span class="o">&</span><span class="n">frame</span> <span class="c1">// passed in %rdi%</span>
<span class="o">-</span><span class="mi">3</span><span class="p">,</span> <span class="c1">// passed in %rsi%</span>
<span class="p">);</span>
</code></pre></div></div>
<p>That’s odd. <code class="language-plaintext highlighter-rouge">afoo()</code> is supposed to be a <code class="language-plaintext highlighter-rouge">void (*)(void)</code> function, why is taking two parameters?</p>
<p>Well, the <code class="language-plaintext highlighter-rouge">&frame</code> parameter actually makes sense. Remember a <em>function frame</em> holds all of its parameters and arguments. Since coroutines are invoked <em>asynchronously</em>, they can’t push that data on the stack, cause stack might change when they are called again. They <em>have</em> to recieve a pointer to their frame, and trust their caller that this pointer is valid throughout all of their execution. My guess is that in a “real” asynchrounous program those frames will be <em>heap allocated</em>, to ensure the frame’s lifetime throughout their program. In our case, the compiler knows that this frame will only exist when <code class="language-plaintext highlighter-rouge">caller()</code> is running, so it’s making it stack allocated instead. We’ll come back to this topic later to check our assumption.</p>
<p>Now, the <code class="language-plaintext highlighter-rouge">rsi</code> parameter is a bit peculiar. I think it’s related to Zig’s <em>safety guarentees</em> or the fact we’re in debug mode. I think <code class="language-plaintext highlighter-rouge">rsi</code> is used as a parameter for the <code class="language-plaintext highlighter-rouge">panic()</code> function. This assumption is based on the fact that the only reference to it is in the <code class="language-plaintext highlighter-rouge">panic</code> assembly generated:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">panic:</span>
<span class="nf">push</span> <span class="nb">rbp</span>
<span class="nf">mov</span> <span class="nb">rbp</span><span class="p">,</span> <span class="nb">rsp</span>
<span class="nf">sub</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">16</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">8</span><span class="p">],</span> <span class="nb">rsi</span>
<span class="nf">call</span> <span class="nv">zig_panic</span>
</code></pre></div></div>
<p>With that assumption in mind, I’m going to rudely ignore it throughout this article (please forgive me).</p>
<p>Now, let’s take a look at the start of the <code class="language-plaintext highlighter-rouge">afoo()</code> function:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">afoo:</span>
<span class="nf">push</span> <span class="nb">rbp</span>
<span class="nf">mov</span> <span class="nb">rbp</span><span class="p">,</span> <span class="nb">rsp</span>
<span class="nf">sub</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">32</span>
<span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="nf">add</span> <span class="nb">rax</span><span class="p">,</span> <span class="mi">16</span>
<span class="nf">mov</span> <span class="nb">rcx</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="nf">add</span> <span class="nb">rcx</span><span class="p">,</span> <span class="mi">8</span>
<span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span> <span class="o">+</span> <span class="mi">8</span><span class="p">]</span>
<span class="nf">test</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nb">rdx</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">8</span><span class="p">],</span> <span class="nb">rax</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">16</span><span class="p">],</span> <span class="nb">rcx</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">24</span><span class="p">],</span> <span class="nb">rdx</span>
<span class="nf">je</span> <span class="nv">.LBB2_1</span>
<span class="nf">jmp</span> <span class="nv">.LBB2_8</span>
</code></pre></div></div>
<p>I trust you all to know your basic x86_64, so let’s zoom through this. We have the standard prolog, allocating 32 bytes on the stack. Remembering that <code class="language-plaintext highlighter-rouge">&frame</code> as passed in <code class="language-plaintext highlighter-rouge">%rdi%</code>, we understand that this code:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="nf">add</span> <span class="nb">rax</span><span class="p">,</span> <span class="mi">16</span>
<span class="nf">mov</span> <span class="nb">rcx</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="nf">add</span> <span class="nb">rcx</span><span class="p">,</span> <span class="mi">8</span>
<span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span> <span class="o">+</span> <span class="mi">8</span><span class="p">]</span>
<span class="c1">; -- snip --</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">8</span><span class="p">],</span> <span class="nb">rax</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">16</span><span class="p">],</span> <span class="nb">rcx</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">24</span><span class="p">],</span> <span class="nb">rdx</span>
</code></pre></div></div>
<p>Translates to something of this fashion:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">afoo</span><span class="p">(...)</span> <span class="p">{</span>
<span class="kt">uint64_t</span> <span class="o">*</span><span class="n">unknown3_ptr</span> <span class="o">=</span> <span class="o">&</span><span class="n">frame_ptr</span><span class="o">-></span><span class="n">unknown3</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="o">*</span><span class="n">unknown2_ptr</span> <span class="o">=</span> <span class="o">&</span><span class="n">frame_ptr</span><span class="o">-></span><span class="n">unknown2</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">unknown2</span> <span class="o">=</span> <span class="n">frame_ptr</span><span class="o">-></span><span class="n">unknown2</span><span class="p">;</span>
<span class="c1">// -- snip --</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Cool. Now, we have a have a branching:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="nf">mov</span> <span class="nb">rcx</span><span class="p">,</span> <span class="nb">rdi</span>
<span class="c1">; -- snip --</span>
<span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rdi</span> <span class="o">+</span> <span class="mi">8</span><span class="p">]</span>
<span class="nf">test</span> <span class="nb">rdx</span><span class="p">,</span> <span class="nb">rdx</span>
<span class="c1">; -- snip -- </span>
<span class="nf">je</span> <span class="nv">.LBB2_1</span>
<span class="nf">jmp</span> <span class="nv">.LBB2_8</span>
</code></pre></div></div>
<p>This suggests that <code class="language-plaintext highlighter-rouge">frame.unknown2</code> is actually a <em>branch identifier</em>. I like to think of coroutines like this: every <em>suspension point</em> “splits” the function into two code blocks. The code following the suspension point and the code preceeding it. When <code class="language-plaintext highlighter-rouge">resume</code>ing the frame, we have to know which block it has to continue from. Our current assumption is that Zig acheives this using a <em>branch identifier</em> saved in the frame, telling it where to resume next.</p>
<p>Lets rewrite our structure according to this assumption:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">afoo_frame</span> <span class="p">{</span>
<span class="n">fptr_t</span> <span class="n">func</span><span class="p">;</span> <span class="c1">// The async function the frame holds</span>
<span class="kt">uint64_t</span> <span class="n">branch_id</span><span class="p">;</span> <span class="c1">// The branch to execute the next time the frame gets resumed</span>
<span class="kt">uint64_t</span> <span class="n">unknown3</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="nf">afoo</span><span class="p">(...)</span> <span class="p">{</span>
<span class="kt">uint64_t</span> <span class="o">*</span><span class="n">unknown3_ptr</span> <span class="o">=</span> <span class="o">&</span><span class="n">frame_ptr</span><span class="o">-></span><span class="n">unknown3</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="o">*</span><span class="n">branch_id_ptr</span> <span class="o">=</span> <span class="o">&</span><span class="n">frame_ptr</span><span class="o">-></span><span class="n">branch_id</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">stored_branch_id</span> <span class="o">=</span> <span class="n">frame_ptr</span><span class="o">-></span><span class="n">branch_id</span><span class="p">;</span>
<span class="c1">// -- snip --</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Let’s keep going! First, let’s look at what happens in our flow, that is, <code class="language-plaintext highlighter-rouge">stored_branch_id</code> is 0. In that case, we jump to <code class="language-plaintext highlighter-rouge">.LBB2_1</code>:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_1:</span>
<span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">16</span><span class="p">]</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rax</span><span class="p">],</span> <span class="mi">1</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rax</span><span class="p">],</span> <span class="mi">2</span>
<span class="nf">add</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">32</span>
<span class="nf">pop</span> <span class="nb">rbp</span>
<span class="nf">ret</span>
</code></pre></div></div>
<p>Ok, there’s something odd happening here. We set <code class="language-plaintext highlighter-rouge">*branch_id_ptr</code> to 1, and then immediatly set it to 2. My guess is that this is a result of us using the <code class="language-plaintext highlighter-rouge">--single-threaded</code> flag. My assumption that this is some sort of <a href="https://en.wikipedia.org/wiki/Spinlock">spinlock</a> or <em>condition variable</em>, to prevent the running of the same frame in several threads. Let’s check that assumption later! For now, we’ll ignore it.</p>
<p>We can now “split” our <code class="language-plaintext highlighter-rouge">afoo()</code> function to two branches:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="n">afoo</span><span class="p">()</span> <span class="k">void</span> <span class="p">{</span>
<span class="c">// branch_id 0</span>
<span class="k">suspend</span><span class="p">;</span>
<span class="c">// branch_id 2</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Now, we clean up the stack frame and <code class="language-plaintext highlighter-rouge">ret</code>. This is important: A <code class="language-plaintext highlighter-rouge">suspend</code> translates to a <code class="language-plaintext highlighter-rouge">ret</code>. It doesnt “suspend” anything. It just returns to the caller, like any “normal” function!</p>
<p>Let’s check out the other branch:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_8:</span>
<span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">24</span><span class="p">]</span>
<span class="nf">sub</span> <span class="nb">rax</span><span class="p">,</span> <span class="mi">1</span>
<span class="nf">je</span> <span class="nv">.LBB2_3</span>
<span class="nf">jmp</span> <span class="nv">.LBB2_9</span>
</code></pre></div></div>
<p>Hmm. It checks that <code class="language-plaintext highlighter-rouge">stored_branch_id</code> is not 1. Our standing assumption that <code class="language-plaintext highlighter-rouge">frame.branch_id == 1</code> is a <em>flag</em>, implying that <em>another thread is running this frame currently</em>. Indeed, if we follow the case <code class="language-plaintext highlighter-rouge">stored_branch_id == 1</code> we arrive at:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_3:</span>
<span class="nf">xor</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nf">mov</span> <span class="nb">esi</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nf">movabs</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nv">offset</span> <span class="nv">__unnamed_2</span>
<span class="nf">call</span> <span class="nv">panic</span>
<span class="nl">__unnamed_5:</span>
<span class="nf">.asciz</span> <span class="err">"</span><span class="nv">resumed</span> <span class="nv">a</span> <span class="nv">non</span><span class="o">-</span><span class="nv">suspended</span> <span class="nv">function</span><span class="err">"</span>
<span class="nl">__unnamed_2:</span>
<span class="nf">.quad</span> <span class="nv">__unnamed_5</span>
<span class="nf">.quad</span> <span class="mi">32</span>
</code></pre></div></div>
<p>We see that in that case, <code class="language-plaintext highlighter-rouge">panic()</code> is called with <code class="language-plaintext highlighter-rouge">%esi% = 0</code> and <code class="language-plaintext highlighter-rouge">%rdi%</code> pointing to the <em>Pascal String</em> containing the error message <code class="language-plaintext highlighter-rouge">"resumed a non-suspended function"</code>. This confirms our suspicion: in multithreaded build, <code class="language-plaintext highlighter-rouge">branch_id == 1</code> implies that the frame is <em>currently running</em>.</p>
<p>This brings up another thing that have been bothering me. Why both <code class="language-plaintext highlighter-rouge">&frame_ptr->branch_id</code> and <code class="language-plaintext highlighter-rouge">frame_ptr->branch_id</code> are stored in the local stack frame of <code class="language-plaintext highlighter-rouge">afoo()</code>? My guess is that the storing and loading of <code class="language-plaintext highlighter-rouge">frame_ptr->branch_id</code> in multithreaded builds happens <em>atomically</em>, to prevent race conditions. Let’s add that assumption to our ever-growing list of stuff to check.</p>
<p>Well, lets continue. If no one else is running the frame, we arrive at another check:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_9:</span>
<span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">24</span><span class="p">]</span>
<span class="nf">sub</span> <span class="nb">rax</span><span class="p">,</span> <span class="mi">2</span>
<span class="nf">je</span> <span class="nv">.LBB2_4</span>
<span class="nf">jmp</span> <span class="nv">.LBB2_2</span>
</code></pre></div></div>
<p>This checks if the current branch identifier is 2, that is, we are just after the first (and only) suspend point. If we’re not after that one, we arrive at the following <code class="language-plaintext highlighter-rouge">panic()</code> call:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_2:</span>
<span class="nf">xor</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nf">mov</span> <span class="nb">esi</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nf">movabs</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nv">offset</span> <span class="nv">__unnamed_1</span>
<span class="nf">call</span> <span class="nv">panic</span>
<span class="nl">__unnamed_4:</span>
<span class="nf">.asciz</span> <span class="err">"</span><span class="nv">resumed</span> <span class="nv">an</span> <span class="nv">async</span> <span class="nv">function</span> <span class="nv">which</span> <span class="nb">al</span><span class="nv">ready</span> <span class="nv">returned</span><span class="err">"</span>
<span class="nl">__unnamed_1:</span>
<span class="nf">.quad</span> <span class="nv">__unnamed_4</span>
<span class="nf">.quad</span> <span class="mi">48</span>
</code></pre></div></div>
<p>Makes sense: If the branch identifier is not 0 (the first branch), not 1 (the coroutine is currnetly running) and not 2 (the last branch), it means that this frame has already returned. Cool.</p>
<p>If the branch identifier is 2, we arrive at:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_4:</span>
<span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">16</span><span class="p">]</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rax</span><span class="p">],</span> <span class="mi">1</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rax</span><span class="p">],</span> <span class="o">-</span><span class="mi">1</span>
<span class="nf">mov</span> <span class="nb">rcx</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">8</span><span class="p">]</span>
<span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rcx</span><span class="p">]</span>
<span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rdx</span>
<span class="nf">not</span> <span class="nb">rsi</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rcx</span><span class="p">],</span> <span class="nb">rsi</span>
<span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="nb">rdx</span>
<span class="nf">sub</span> <span class="nb">rsi</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span>
<span class="nf">mov</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">32</span><span class="p">],</span> <span class="nb">rdx</span>
<span class="nf">je</span> <span class="nv">.LBB2_5</span>
<span class="nf">jmp</span> <span class="nv">.LBB2_10</span>
</code></pre></div></div>
<p>Let’s break this down:</p>
<ul>
<li>We put 1 in <code class="language-plaintext highlighter-rouge">frame->branch_id</code>. This means we’re letting other threads know that <em>this frame is now running</em>.</li>
<li>We put -1 in <code class="language-plaintext highlighter-rouge">frame->branch_id</code>. That value is probably used to indiciate that <em>this frame is done</em>. The next time someone will try to <code class="language-plaintext highlighter-rouge">resume</code> this frame, the program will panic, since it’s branch identifier is not 0, 1 or 2.</li>
<li>We store <code class="language-plaintext highlighter-rouge">unknown3_ptr</code> in a local variable: <code class="language-plaintext highlighter-rouge">original_u3 = *unknown3_ptr</code>.</li>
<li>We take <code class="language-plaintext highlighter-rouge">unknown3_ptr</code>, and <em>bitwise not</em> it: <code class="language-plaintext highlighter-rouge">*unknown3_ptr = ~(*unknown3_ptr)</code>.</li>
<li>We check if <code class="language-plaintext highlighter-rouge">original_u3</code> was was equal to -1, and branch accordingly.</li>
</ul>
<p>Now, this is <em>strange</em>. Really strange. The first two steps make sense, we already understand them pretty well: we set the flag saying that this frame is running, and then mark the frame as complete. It’s not that clear what’s going on with the other three, so let’s add them to our mystery list.</p>
<p>The branching is even more peculiar. If <code class="language-plaintext highlighter-rouge">original_u3 == -1</code>, we <code class="language-plaintext highlighter-rouge">panic()</code> with the following message: <code class="language-plaintext highlighter-rouge">"async function returned twice"</code>, as we can see in:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_5:</span>
<span class="nf">xor</span> <span class="nb">eax</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nf">mov</span> <span class="nb">esi</span><span class="p">,</span> <span class="nb">eax</span>
<span class="nf">movabs</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nv">offset</span> <span class="nv">__unnamed_3</span>
<span class="nf">call</span> <span class="nv">panic</span>
<span class="nl">__unnamed_6:</span>
<span class="nf">.asciz</span> <span class="err">"</span><span class="nv">async</span> <span class="nv">function</span> <span class="nv">returned</span> <span class="nv">twice</span><span class="err">"</span>
<span class="nl">__unnamed_3:</span>
<span class="nf">.quad</span> <span class="nv">__unnamed_6</span>
<span class="nf">.quad</span> <span class="mi">29</span>
</code></pre></div></div>
<p>Otherwise, we check if <code class="language-plaintext highlighter-rouge">original_u3</code> is actually 0:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_10:</span>
<span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">32</span><span class="p">]</span>
<span class="nf">test</span> <span class="nb">rax</span><span class="p">,</span> <span class="nb">rax</span>
<span class="nf">je</span> <span class="nv">.LBB2_6</span>
<span class="nf">jmp</span> <span class="nv">.LBB2_7</span>
</code></pre></div></div>
<p>If it is, we just return, nothing special:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_6:</span>
<span class="nf">add</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">32</span>
<span class="nf">pop</span> <span class="nb">rbp</span>
<span class="nf">ret</span>
</code></pre></div></div>
<p>Now here’s the weird part. If <code class="language-plaintext highlighter-rouge">original_u3</code> <em>isn’t</em> 0, we clear the stack frame and perform an <em>absolute jump</em> to the address stored in <code class="language-plaintext highlighter-rouge">original_u3</code>:</p>
<div class="language-nasm highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nl">.LBB2_7:</span>
<span class="nf">mov</span> <span class="nb">rax</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">32</span><span class="p">]</span>
<span class="nf">mov</span> <span class="nb">rcx</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rbp</span> <span class="o">-</span> <span class="mi">32</span><span class="p">]</span>
<span class="nf">mov</span> <span class="nb">rdx</span><span class="p">,</span> <span class="kt">qword</span> <span class="nv">ptr</span> <span class="p">[</span><span class="nb">rcx</span><span class="p">]</span>
<span class="nf">mov</span> <span class="nb">rsi</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span>
<span class="nf">mov</span> <span class="nb">rdi</span><span class="p">,</span> <span class="nb">rax</span>
<span class="nf">add</span> <span class="nb">rsp</span><span class="p">,</span> <span class="mi">32</span>
<span class="nf">pop</span> <span class="nb">rbp</span>
<span class="nf">jmp</span> <span class="nb">rdx</span>
</code></pre></div></div>
<p>So that’s it! We went over the <em>entire</em> machine code generated by the most simple asynchronous program. Let’s sum up what our assumption are so far.</p>
<ul>
<li>An asynchronous function takes at least one parameter, which is a pointer to it’s <em>frame</em>.</li>
<li>An asynchrnous function is composed of <em>branches</em>, one for each code segment between two supsension points.</li>
<li>Each branch has a specific <em>branch identifier</em>: a number that corresponds to that specific branch.</li>
<li>There are a couple of special identifiers:
<ul>
<li>0 is the first branch identifier, which represents the first call to the function.</li>
<li>1 is a flag, which signals that this frame is <em>currently running</em>, maybe on <em>another thread</em>.</li>
</ul>
</li>
<li>The frame of an empty function holds the following data:
<ul>
<li>A pointer to the asynchronous function the frame holds, which we will call <code class="language-plaintext highlighter-rouge">func</code>.</li>
<li>The frame’s current branch identifier. This is the next branch that will execute when <code class="language-plaintext highlighter-rouge">resume</code> is used on the frame. We will call this member <code class="language-plaintext highlighter-rouge">branch_id</code></li>
<li>An unknown qword, which we will call <code class="language-plaintext highlighter-rouge">unknown3</code>.</li>
</ul>
</li>
<li>In “debug” builds, there are several <em>safety-guarentees</em>: A frame can’t run in multiple threads at the same time, and you can’t <code class="language-plaintext highlighter-rouge">resume</code> a frame that have already <code class="language-plaintext highlighter-rouge">return</code>ed, or <code class="language-plaintext highlighter-rouge">return</code> from the same frame twice.</li>
</ul>
<p>We also have a few questions that arose during our research:</p>
<ul>
<li>What is the purpose of <code class="language-plaintext highlighter-rouge">unknown3</code>?</li>
<li>Is access to <code class="language-plaintext highlighter-rouge">branch_id</code> in multithreaded builds atomic?</li>
<li>What happens when multiple threads try to <code class="language-plaintext highlighter-rouge">resume</code> the same frame?</li>
<li>What is the absolute jump to <code class="language-plaintext highlighter-rouge">*unknown3</code> used for?</li>
</ul>
<p>Those are some pretty tough questions, we’ll try to tackle them one by one. The next features we’re going to take a look at is understanding what happens when we <code class="language-plaintext highlighter-rouge">resume</code> a frame, and the we’ll start adding <em>arguments</em> and <em>local variables</em> to our coroutine.</p>
<p>That was exhausting, but if you reached this far, good job! I hope youv’e learnt something new (I sure did).</p>Goal The goal of this post is to give us some basic intuition about the machine code generated by the Zig compiler when we use asynchronous functions. We’ll start by digging into the simplest asynchronous function there is, slowly adding features like local variables, parameters, multi-threaded support, await and more. The only tool I’ll use in this expedition is the Godbolt Compiler Explorer, and maybe an x86_64 instruction reference. I’m going to assume some standard knowledge and mention concepts like registers, memory, stack frames and the heap without any elaboration. I’m also not going to into details about the syntax of Zig, C or intel x86_64 assembly. I don’t think knowing Zig is a preliminary for this blogpost, since the syntax is pretty straightforward in my opinion, but feel free to check the official Zig language documentation. What the fuck is Zig So. What is Zig? According to their site: Zig is a general-purpose programming language and toolchain for maintaining robust, optimal, and reusable software. Sounds cool, doesn’t tell you a whole lot when you think about it. This is not a Zig tutorial or spotlight, so I won’t go into a long monologue about why I think it’s actually a great programming language, I’ll just mention the main features: It’s a compiled language. That is, Zig source code is used to generate machine code that runs natively on the target machine. No VM, no runtime. It has no hidden control flow: if something doesn’t look like a function call, it isn’t a function call. It has four build modes: “debug”, “release-safe”, “release-fast” and “release-small”. In this blogpost, we’ll compile with “debug”. It supports coroutines. Why the fuck is Zig The reason I chose Zig as the focus of my first few blogposts about asynchronous programming in compiled languages is its simplicity. Zig supports very “low-level” features of coroutines, using only four keywords: async, suspend, resume and await. Essentially, it is exactly the “bare-minimum” of asynchronous programming. No hidden event loop, no yield, just “napping functions”. This is a great candidate for the first implementation to look into, since C++ for example supports generators and yields in its standard. Environment I’m going to use Godbolt’s Zig 0.6.0 compiler, and compile in debug and --single-threaded. A simple program Let’s start with simplest async program imaginable: fn afoo() void { suspend; } export fn caller() void { var frame = async afoo(); } We are going to go through the entire blob of machine-code generated by this small snippet. It might be a bit long, but I found it very educating. Afterwards, we’ll sum up our findings and list the questions that we didn’t answer yet. We’ll compile it with --single-threaded, since there is a slight difference I saw when compiling without this flag, which might complicate things. Don’t worry, we’ll get back to that! Stick this sucker in Godbolt, and let’s look at the output… Click me for assembly and shit Well. That’s a lot of code. We’ll go through it step by step. Analysis It starts with the standard prolog: caller: push rbp mov rbp, rsp sub rsp, 32 which means the frame variable is 32 bytes wide. We’ll write that down, and update it as we go. struct afoo_frame { uint8_t unknown[32]; }; Godbolt tells us which line generates which instructions. So we see that the line: var frame = async afoo(); generates: movabs rax, offset afoo mov qword ptr [rbp - 32], rax mov qword ptr [rbp - 24], 0 mov qword ptr [rbp - 16], 0 lea rdi, [rbp - 32] mov rsi, -3 call afoo There are quite a few things going on here. We put the offset of afoo() in the first 8 bytes of the afoo_frame struct. We also put zeros in the 8th and 16th offsets of the structure. This makes us suspect that the structure is actually 24 bytes wide, and sub rsp, 32 was emitted to keep the stack 16-bytes aligned (or something of that sort). This makes us guess it looks a bit like this: struct afoo_frame { fptr_t func; // The async function the frame holds uint64_t unknown2; // Initialized to 0 uint64_t unknown3; // Initialized to 0 }; What’s the point of putting the function pointer in the frame? Remember, when we use async, we can continue a frame without knowing which function we’re calling. We’re continuing a frame, not a specific function. This means the frame has to hold, in some manner, the function we’re about to call. Now, the actual call to afoo() looks like this: afoo( &frame // passed in %rdi% -3, // passed in %rsi% ); That’s odd. afoo() is supposed to be a void (*)(void) function, why is taking two parameters? Well, the &frame parameter actually makes sense. Remember a function frame holds all of its parameters and arguments. Since coroutines are invoked asynchronously, they can’t push that data on the stack, cause stack might change when they are called again. They have to recieve a pointer to their frame, and trust their caller that this pointer is valid throughout all of their execution. My guess is that in a “real” asynchrounous program those frames will be heap allocated, to ensure the frame’s lifetime throughout their program. In our case, the compiler knows that this frame will only exist when caller() is running, so it’s making it stack allocated instead. We’ll come back to this topic later to check our assumption. Now, the rsi parameter is a bit peculiar. I think it’s related to Zig’s safety guarentees or the fact we’re in debug mode. I think rsi is used as a parameter for the panic() function. This assumption is based on the fact that the only reference to it is in the panic assembly generated: panic: push rbp mov rbp, rsp sub rsp, 16 mov qword ptr [rbp - 8], rsi call zig_panic With that assumption in mind, I’m going to rudely ignore it throughout this article (please forgive me). Now, let’s take a look at the start of the afoo() function: afoo: push rbp mov rbp, rsp sub rsp, 32 mov rax, rdi add rax, 16 mov rcx, rdi add rcx, 8 mov rdx, qword ptr [rdi + 8] test rdx, rdx mov qword ptr [rbp - 8], rax mov qword ptr [rbp - 16], rcx mov qword ptr [rbp - 24], rdx je .LBB2_1 jmp .LBB2_8 I trust you all to know your basic x86_64, so let’s zoom through this. We have the standard prolog, allocating 32 bytes on the stack. Remembering that &frame as passed in %rdi%, we understand that this code: mov rax, rdi add rax, 16 mov rcx, rdi add rcx, 8 mov rdx, qword ptr [rdi + 8] ; -- snip -- mov qword ptr [rbp - 8], rax mov qword ptr [rbp - 16], rcx mov qword ptr [rbp - 24], rdx Translates to something of this fashion: void afoo(...) { uint64_t *unknown3_ptr = &frame_ptr->unknown3; uint64_t *unknown2_ptr = &frame_ptr->unknown2; uint64_t unknown2 = frame_ptr->unknown2; // -- snip -- } Cool. Now, we have a have a branching: mov rcx, rdi ; -- snip -- mov rdx, qword ptr [rdi + 8] test rdx, rdx ; -- snip -- je .LBB2_1 jmp .LBB2_8 This suggests that frame.unknown2 is actually a branch identifier. I like to think of coroutines like this: every suspension point “splits” the function into two code blocks. The code following the suspension point and the code preceeding it. When resumeing the frame, we have to know which block it has to continue from. Our current assumption is that Zig acheives this using a branch identifier saved in the frame, telling it where to resume next. Lets rewrite our structure according to this assumption: struct afoo_frame { fptr_t func; // The async function the frame holds uint64_t branch_id; // The branch to execute the next time the frame gets resumed uint64_t unknown3; }; void afoo(...) { uint64_t *unknown3_ptr = &frame_ptr->unknown3; uint64_t *branch_id_ptr = &frame_ptr->branch_id; uint64_t stored_branch_id = frame_ptr->branch_id; // -- snip -- } Let’s keep going! First, let’s look at what happens in our flow, that is, stored_branch_id is 0. In that case, we jump to .LBB2_1: .LBB2_1: mov rax, qword ptr [rbp - 16] mov qword ptr [rax], 1 mov qword ptr [rax], 2 add rsp, 32 pop rbp ret Ok, there’s something odd happening here. We set *branch_id_ptr to 1, and then immediatly set it to 2. My guess is that this is a result of us using the --single-threaded flag. My assumption that this is some sort of spinlock or condition variable, to prevent the running of the same frame in several threads. Let’s check that assumption later! For now, we’ll ignore it. We can now “split” our afoo() function to two branches: fn afoo() void { // branch_id 0 suspend; // branch_id 2 } Now, we clean up the stack frame and ret. This is important: A suspend translates to a ret. It doesnt “suspend” anything. It just returns to the caller, like any “normal” function! Let’s check out the other branch: .LBB2_8: mov rax, qword ptr [rbp - 24] sub rax, 1 je .LBB2_3 jmp .LBB2_9 Hmm. It checks that stored_branch_id is not 1. Our standing assumption that frame.branch_id == 1 is a flag, implying that another thread is running this frame currently. Indeed, if we follow the case stored_branch_id == 1 we arrive at: .LBB2_3: xor eax, eax mov esi, eax movabs rdi, offset __unnamed_2 call panic __unnamed_5: .asciz "resumed a non-suspended function" __unnamed_2: .quad __unnamed_5 .quad 32 We see that in that case, panic() is called with %esi% = 0 and %rdi% pointing to the Pascal String containing the error message "resumed a non-suspended function". This confirms our suspicion: in multithreaded build, branch_id == 1 implies that the frame is currently running. This brings up another thing that have been bothering me. Why both &frame_ptr->branch_id and frame_ptr->branch_id are stored in the local stack frame of afoo()? My guess is that the storing and loading of frame_ptr->branch_id in multithreaded builds happens atomically, to prevent race conditions. Let’s add that assumption to our ever-growing list of stuff to check. Well, lets continue. If no one else is running the frame, we arrive at another check: .LBB2_9: mov rax, qword ptr [rbp - 24] sub rax, 2 je .LBB2_4 jmp .LBB2_2 This checks if the current branch identifier is 2, that is, we are just after the first (and only) suspend point. If we’re not after that one, we arrive at the following panic() call: .LBB2_2: xor eax, eax mov esi, eax movabs rdi, offset __unnamed_1 call panic __unnamed_4: .asciz "resumed an async function which already returned" __unnamed_1: .quad __unnamed_4 .quad 48 Makes sense: If the branch identifier is not 0 (the first branch), not 1 (the coroutine is currnetly running) and not 2 (the last branch), it means that this frame has already returned. Cool. If the branch identifier is 2, we arrive at: .LBB2_4: mov rax, qword ptr [rbp - 16] mov qword ptr [rax], 1 mov qword ptr [rax], -1 mov rcx, qword ptr [rbp - 8] mov rdx, qword ptr [rcx] mov rsi, rdx not rsi mov qword ptr [rcx], rsi mov rsi, rdx sub rsi, -1 mov qword ptr [rbp - 32], rdx je .LBB2_5 jmp .LBB2_10 Let’s break this down: We put 1 in frame->branch_id. This means we’re letting other threads know that this frame is now running. We put -1 in frame->branch_id. That value is probably used to indiciate that this frame is done. The next time someone will try to resume this frame, the program will panic, since it’s branch identifier is not 0, 1 or 2. We store unknown3_ptr in a local variable: original_u3 = *unknown3_ptr. We take unknown3_ptr, and bitwise not it: *unknown3_ptr = ~(*unknown3_ptr). We check if original_u3 was was equal to -1, and branch accordingly. Now, this is strange. Really strange. The first two steps make sense, we already understand them pretty well: we set the flag saying that this frame is running, and then mark the frame as complete. It’s not that clear what’s going on with the other three, so let’s add them to our mystery list. The branching is even more peculiar. If original_u3 == -1, we panic() with the following message: "async function returned twice", as we can see in: .LBB2_5: xor eax, eax mov esi, eax movabs rdi, offset __unnamed_3 call panic __unnamed_6: .asciz "async function returned twice" __unnamed_3: .quad __unnamed_6 .quad 29 Otherwise, we check if original_u3 is actually 0: .LBB2_10: mov rax, qword ptr [rbp - 32] test rax, rax je .LBB2_6 jmp .LBB2_7 If it is, we just return, nothing special: .LBB2_6: add rsp, 32 pop rbp ret Now here’s the weird part. If original_u3 isn’t 0, we clear the stack frame and perform an absolute jump to the address stored in original_u3: .LBB2_7: mov rax, qword ptr [rbp - 32] mov rcx, qword ptr [rbp - 32] mov rdx, qword ptr [rcx] mov rsi, -2 mov rdi, rax add rsp, 32 pop rbp jmp rdx So that’s it! We went over the entire machine code generated by the most simple asynchronous program. Let’s sum up what our assumption are so far. An asynchronous function takes at least one parameter, which is a pointer to it’s frame. An asynchrnous function is composed of branches, one for each code segment between two supsension points. Each branch has a specific branch identifier: a number that corresponds to that specific branch. There are a couple of special identifiers: 0 is the first branch identifier, which represents the first call to the function. 1 is a flag, which signals that this frame is currently running, maybe on another thread. The frame of an empty function holds the following data: A pointer to the asynchronous function the frame holds, which we will call func. The frame’s current branch identifier. This is the next branch that will execute when resume is used on the frame. We will call this member branch_id An unknown qword, which we will call unknown3. In “debug” builds, there are several safety-guarentees: A frame can’t run in multiple threads at the same time, and you can’t resume a frame that have already returned, or return from the same frame twice. We also have a few questions that arose during our research: What is the purpose of unknown3? Is access to branch_id in multithreaded builds atomic? What happens when multiple threads try to resume the same frame? What is the absolute jump to *unknown3 used for? Those are some pretty tough questions, we’ll try to tackle them one by one. The next features we’re going to take a look at is understanding what happens when we resume a frame, and the we’ll start adding arguments and local variables to our coroutine. That was exhausting, but if you reached this far, good job! I hope youv’e learnt something new (I sure did).Async Explorer2020-06-20T18:50:00+00:002020-06-20T18:50:00+00:00https://iamgweej.github.io//jekyll/update/2020/06/20/async-explorer<p>Have you ever looked up into the stars and wondered, “How the fuck is that feature implemented”? In this series, I’ll (hopefully) dive into the implementation of coroutines for several (compiled) programming languages.</p>
<p>A short disclaimer: I’m not too sharp on the details of some (or actually any) of these implementations. Most of this will just be me rambling and looking at compiler source code/the output of <a href="https://godbolt.org/">the Godbolt Compiler Explorer</a>. I’ll try to validate every claim I’ll post here, but some mistakes are sure to sneak their way into one of these. Feel free to point them up and I’ll fix them as soon as I can.</p>
<h2 id="purpose">Purpose</h2>
<p>The purpose of this post is to set a few definitions for my upcoming posts. I’ll explain what I mean when I use words like “coroutine” or “asynchronous”, which will hopefully allow me to use them freely in the following posts.</p>
<p>Afterwards, I’ll give a few short examples of how to coroutines “look like” in several programming languages, like C++, Rust, Zig and the LLVM IR.</p>
<p>I hope I’ll also be able to give you a rough idea of what coroutines are useful for, but it’s not the main goal of this post.</p>
<p>This is not an “asyncio” tutorial or anything like that, just me messing with some programming languages.</p>
<h2 id="what-is-async">What is async?</h2>
<p>If you’re like me, you probably heard terms like “async” and “coroutines” mentioned a lot, and maybe even read (or wrote) some asynchronious code, but didn’t really understand what’s going on behind the scenes. At least I didn’t (and still don’t). This series of posts will try to fix that!</p>
<p>Let’s consult <a href="https://en.wikipedia.org/wiki/Coroutine">Wikipedia</a>!</p>
<blockquote>
<p>Coroutines are computer program components that generalize subroutines for <a href="https://en.wikipedia.org/wiki/Cooperative_multitasking">non-preemptive multitasking</a>, by allowing execution to be suspended and resumed.</p>
</blockquote>
<p>Let’s break it down a little.</p>
<p>For me, a subroutine is a piece of code that accepts parameters, and “executes”. That is, processes that parameters, produces side effects, and returns a value. Now, according to Wikipedia, a coroutine is a subroutine that can be “suspended and resumed”. The way I understand that, I’m getting a picture of a subroutine “taking a nap”, which we can later wake up and continue. Notice that I’m explicitly choosing to not mention <a href="https://en.wikipedia.org/wiki/Multithreading_(computer_architecture)">multithreading</a> or <a href="https://en.wikipedia.org/wiki/Scheduling_(computing)">scheduling</a>. These concepts, which can be similar (and maybe I’ll explore that in a future post), are not in my scope currently.</p>
<p>Let’s look at an example written in <a href="https://ziglang.org/">Zig</a>. I think the way Zig supports async is great for a first example, more so than Python or C++, because it’s very “low-level” - It shows pretty clearly what we’re dealing with when we’re using coroutines:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// This prints stuff</span>
<span class="k">const</span> <span class="n">warn</span> <span class="o">=</span> <span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">).</span><span class="py">debug</span><span class="p">.</span><span class="py">warn</span><span class="p">;</span>
<span class="k">fn</span> <span class="n">print_in_parts</span><span class="p">(</span><span class="n">x</span><span class="p">:</span> <span class="kt">i32</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="kt">i32</span><span class="p">)</span> <span class="kt">i32</span> <span class="p">{</span>
<span class="n">warn</span><span class="p">(</span><span class="s">"1) [print_in_parts] Hello there! Im taking a nap.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">.</span><span class="p">{});</span>
<span class="k">suspend</span><span class="p">;</span>
<span class="n">warn</span><span class="p">(</span><span class="s">"3) [print_in_parts] Why did you wake me up? Im going back to sleep.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">.</span><span class="p">{});</span>
<span class="k">suspend</span><span class="p">;</span>
<span class="n">warn</span><span class="p">(</span><span class="s">"5) [print_in_parts] Fine, here you go.</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">.</span><span class="p">{});</span>
<span class="k">return</span> <span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">async_main</span><span class="p">()</span> <span class="k">void</span> <span class="p">{</span>
<span class="k">var</span> <span class="n">print_in_parts_frame</span> <span class="o">=</span> <span class="k">async</span> <span class="n">print_in_parts</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
<span class="n">warn</span><span class="p">(</span><span class="s">"2) [async_main] Wake up!</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">.</span><span class="p">{});</span>
<span class="k">resume</span> <span class="n">print_in_parts_frame</span><span class="p">;</span>
<span class="n">warn</span><span class="p">(</span><span class="s">"4) [async_main] Grab a brush and put a little (makeup)!</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">.</span><span class="p">{});</span>
<span class="k">resume</span> <span class="n">print_in_parts_frame</span><span class="p">;</span>
<span class="k">var</span> <span class="n">result</span> <span class="o">=</span> <span class="k">await</span> <span class="n">print_in_parts_frame</span><span class="p">;</span>
<span class="n">warn</span><span class="p">(</span><span class="s">"6) [async_main] print_in_parts(1,2) == {}</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="o">.</span><span class="p">{</span><span class="n">result</span><span class="p">});</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Let’s review the flow of this program. We call <code class="language-plaintext highlighter-rouge">print_in_parts()</code> using the <code class="language-plaintext highlighter-rouge">async</code> keyword, and at that point, <code class="language-plaintext highlighter-rouge">print_in_parts()</code> begins executing. Afterwards, using the keyword <code class="language-plaintext highlighter-rouge">suspend</code>, <code class="language-plaintext highlighter-rouge">print_in_parts()</code> forfeits it’s context, which gives <code class="language-plaintext highlighter-rouge">amain()</code> an opaque <code class="language-plaintext highlighter-rouge">frame</code>, which it can use to “wake up” <code class="language-plaintext highlighter-rouge">print_in_parts()</code>. As we can see <code class="language-plaintext highlighter-rouge">print_in_parts()</code> forfeits it’s context again, and finally, it returns a return value, using the <code class="language-plaintext highlighter-rouge">return</code> keyword. At that point, <code class="language-plaintext highlighter-rouge">amain()</code> retrieves that value using the <code class="language-plaintext highlighter-rouge">await</code> keyword.</p>
<p>So the expected output is something like this:</p>
<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code>D:\Projects\AsyncExplorer\examples> .\simple_async.exe
1) [print_in_parts] Hello there! Im taking a nap.
2) [async_main] Wake up!
3) [print_in_parts] Why did you wake me up? Im going back to sleep.
4) [async_main] Grab a brush and put a little (makeup)!
5) [print_in_parts] Fine, here you go.
6) [async_main] print_in_parts(1,2) == 3
</code></pre></div></div>
<p>Also, it might be worth noting that this program is completely <em>single threaded</em>. There is no hidden thread creation or synchronization. Pretty cool in my opinion.</p>
<p>Phew. That was exhausting. But I think we have a pretty solid grip on what the basics of “async” means.</p>
<h2 id="use-cases">Use cases</h2>
<p>Now I hope we understand the <em>definition</em> of a coroutine, but what are the uses of these lazy little gremlins? Are they in some way <em>stronger</em> than our classical subroutines?</p>
<p>The short answer is <em>No</em>. Coroutines (and more generally, every “code construct” of that sort) doesn’t allow us to calculate <em>more things</em>. But we are developers, not computer scientists. What we care about is having extensible, modular, well designed solutions to our problems. That means, we like <em>modelling</em> our problems in such ways that let us interact with them conviniently through code. For example through Object-Oriented design, Design Patterns, and in our cases, coroutines.</p>
<p>Im not going to go into the use cases of coroutines too much, since it’s not the point of this series. I will, though, give a quick run down of what our “break taking” functions can be useful for:</p>
<ul>
<li>Modelling <a href="https://en.wikipedia.org/wiki/Generator_(computer_programming)">Generators</a>.</li>
<li>Implementing <a href="https://en.wikipedia.org/wiki/Cooperative_multitasking">Cooperative Multitasking</a>.</li>
<li>Using <a href="https://en.wikipedia.org/wiki/Asynchronous_I/O">Asynchronous IO</a>.</li>
<li>A lot of other cool stuff.</li>
</ul>
<h2 id="examples">Examples</h2>
<p>So now I have a solid grasp of what “async” and “coroutines” mean. But I’m still missing the “feel” for how it looks like in some other programming languages. So let’s take a look!</p>
<h3 id="zig">Zig</h3>
<p>We already saw an example of the basic usage of async in Zig, but let’s take another one from the <a href="https://ziglang.org/documentation/0.6.0/">official documentation</a>:</p>
<div class="language-zig highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="n">std</span> <span class="o">=</span> <span class="nb">@import</span><span class="p">(</span><span class="s">"std"</span><span class="p">);</span>
<span class="k">const</span> <span class="n">assert</span> <span class="o">=</span> <span class="n">std</span><span class="p">.</span><span class="py">debug</span><span class="p">.</span><span class="py">assert</span><span class="p">;</span>
<span class="k">var</span> <span class="n">the_frame</span><span class="p">:</span> <span class="k">anyframe</span> <span class="o">=</span> <span class="k">undefined</span><span class="p">;</span>
<span class="k">var</span> <span class="n">final_result</span><span class="p">:</span> <span class="kt">i32</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">test</span> <span class="s">"async function await"</span> <span class="p">{</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'a'</span><span class="p">);</span>
<span class="mi">_</span> <span class="o">=</span> <span class="k">async</span> <span class="n">amain</span><span class="p">();</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'f'</span><span class="p">);</span>
<span class="k">resume</span> <span class="n">the_frame</span><span class="p">;</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'i'</span><span class="p">);</span>
<span class="n">assert</span><span class="p">(</span><span class="n">final_result</span> <span class="o">==</span> <span class="mi">1234</span><span class="p">);</span>
<span class="n">assert</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="py">mem</span><span class="p">.</span><span class="nf">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span> <span class="o">&</span><span class="n">seq_points</span><span class="p">,</span> <span class="s">"abcdefghi"</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">amain</span><span class="p">()</span> <span class="k">void</span> <span class="p">{</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'b'</span><span class="p">);</span>
<span class="k">var</span> <span class="n">f</span> <span class="o">=</span> <span class="k">async</span> <span class="n">another</span><span class="p">();</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'e'</span><span class="p">);</span>
<span class="n">final_result</span> <span class="o">=</span> <span class="k">await</span> <span class="n">f</span><span class="p">;</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'h'</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">fn</span> <span class="n">another</span><span class="p">()</span> <span class="kt">i32</span> <span class="p">{</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'c'</span><span class="p">);</span>
<span class="k">suspend</span> <span class="p">{</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'d'</span><span class="p">);</span>
<span class="n">the_frame</span> <span class="o">=</span> <span class="nb">@frame</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">seq</span><span class="p">(</span><span class="sc">'g'</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">1234</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">var</span> <span class="n">seq_points</span> <span class="o">=</span> <span class="p">[</span><span class="mi">_</span><span class="p">]</span><span class="kt">u8</span><span class="p">{</span><span class="mi">0</span><span class="p">}</span> <span class="o">**</span> <span class="s">"abcdefghi"</span><span class="p">.</span><span class="py">len</span><span class="p">;</span>
<span class="k">var</span> <span class="n">seq_index</span><span class="p">:</span> <span class="kt">usize</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">fn</span> <span class="n">seq</span><span class="p">(</span><span class="n">c</span><span class="p">:</span> <span class="kt">u8</span><span class="p">)</span> <span class="k">void</span> <span class="p">{</span>
<span class="n">seq_points</span><span class="p">[</span><span class="n">seq_index</span><span class="p">]</span> <span class="o">=</span> <span class="n">c</span><span class="p">;</span>
<span class="n">seq_index</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This example is a bit more compilicated, but try to follow the flow! We can see in the <code class="language-plaintext highlighter-rouge">test</code> section what the output sequence looks like, so it should be pretty simple following the state of the program.</p>
<h3 id="c">C++</h3>
<p>In C++20, there are plans to add support for <em>coroutines</em>. Let’s take a look at the examples given by <a href="https://en.cppreference.com/w/cpp/language/coroutines">cppreference</a>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// uses the co_await operator to suspend execution until resumed </span>
<span class="n">task</span><span class="o"><></span> <span class="n">tcp_echo_server</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">char</span> <span class="n">data</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(;;)</span> <span class="p">{</span>
<span class="kt">size_t</span> <span class="n">n</span> <span class="o">=</span> <span class="n">co_await</span> <span class="n">socket</span><span class="p">.</span><span class="n">async_read_some</span><span class="p">(</span><span class="n">buffer</span><span class="p">(</span><span class="n">data</span><span class="p">));</span>
<span class="n">co_await</span> <span class="n">async_write</span><span class="p">(</span><span class="n">socket</span><span class="p">,</span> <span class="n">buffer</span><span class="p">(</span><span class="n">data</span><span class="p">,</span> <span class="n">n</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// uses the keyword co_yield to suspend execution returning a value</span>
<span class="n">generator</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">iota</span><span class="p">(</span><span class="kt">int</span> <span class="n">n</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="p">{</span>
<span class="k">while</span><span class="p">(</span><span class="nb">true</span><span class="p">)</span>
<span class="n">co_yield</span> <span class="n">n</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">// uses the keyword co_return to complete execution returning a value</span>
<span class="n">lazy</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">f</span><span class="p">()</span> <span class="p">{</span>
<span class="n">co_return</span> <span class="mi">7</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Well, this seems pretty similar to how we would use coroutines in Zig, with one added functionality: we can <code class="language-plaintext highlighter-rouge">co_yield</code> intermediate values during our execution. That wasn’t in our description of coroutines earlier, but it’s a feature a lot of languages and frameworks choose to support, so it’s pretty interesting to see how they implement that.</p>
<p>Another thing to note about coroutines in C++, is that C++ supports <a href="https://en.cppreference.com/w/cpp/language/throw"><em>exceptions</em></a>. It is also a thing to keep in mind while investigating coroutines. How does these two features of the language work together? What are their interactions?</p>
<p>Also, cppreference goes into a lot of details about the internals of the C++20 couroutines implementation, which is really interesting. When I’ll be getting to trying to understand this particular implementation, that will be a great resource.</p>
<h3 id="rust">Rust</h3>
<p>Rust also allows builtin support for async programming, using the <code class="language-plaintext highlighter-rouge">async</code> and <code class="language-plaintext highlighter-rouge">.await</code> keywords. There is actually a lot going on under the hood of the Rust async implementation, but let’s have a quick look. This example is taken from the <a href="https://rust-lang.github.io/async-book/01_getting_started/01_chapter.html">Asynchronous Programming in Rust book</a>:</p>
<div class="language-rust highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">async</span> <span class="k">fn</span> <span class="nf">get_two_sites_async</span><span class="p">()</span> <span class="p">{</span>
<span class="c">// Create two different "futures" which, when run to completion,</span>
<span class="c">// will asynchronously download the webpages.</span>
<span class="k">let</span> <span class="n">future_one</span> <span class="o">=</span> <span class="nf">download_async</span><span class="p">(</span><span class="s">"https://www.foo.com"</span><span class="p">);</span>
<span class="k">let</span> <span class="n">future_two</span> <span class="o">=</span> <span class="nf">download_async</span><span class="p">(</span><span class="s">"https://www.bar.com"</span><span class="p">);</span>
<span class="c">// Run both futures to completion at the same time.</span>
<span class="nd">join!</span><span class="p">(</span><span class="n">future_one</span><span class="p">,</span> <span class="n">future_two</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>As you can see, this looks a bit more involved than the last example I took a look at. This example downloads the contents of two web-pages in a single-threaded, asynchronous, manner.</p>
<p>Rust’s implementation of asynchronous programming relies on the <a href="https://doc.rust-lang.org/beta/std/future/trait.Future.html">Future Trait</a>. It’s definition is a bit much for this short example, so I think I’ll postpone diving into the internals of Rust async to a later post, dedicated soley for that. Actually, I really enjoyed reading into the details of this implementation and what’s going on here “under the hood”, and I really look forward to sticking this in a compiler and see what comes out.</p>
<h3 id="llvm-ir">LLVM IR</h3>
<p>This one is probably a bit weird, since <a href="https://llvm.org/">LLVM</a> IR is not really a “Programming Language” most people use. I won’t go into the details of LLVM and its structure (because I have no clue about that), but the important thing here is that a lot of modern languages uses LLVM for its compilation and/or optimization. According to <a href="https://llvm.org/docs/LangRef.html">the official documentation</a>:</p>
<blockquote>
<p>LLVM is a Static Single Assignment (SSA) based representation that provides type safety, low-level operations, flexibility, and the capability of representing ‘all’ high-level languages cleanly. It is the common code representation used throughout all phases of the LLVM compilation strategy.</p>
</blockquote>
<p>So yea, it’s pretty cool. What’s interesting for my purposes, is that LLVM <a href="https://llvm.org/docs/Coroutines.html">supports coroutines</a> as a part of their IR. It’s supposed to look a bit like this (don’t be afraid if you don’t understand every piece of code here, neither do I):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>define i32 @main() {
entry:
%hdl = call i8* @f(i32 4)
call void @llvm.coro.resume(i8* %hdl)
call void @llvm.coro.resume(i8* %hdl)
call void @llvm.coro.destroy(i8* %hdl)
ret i32 0
}
</code></pre></div></div>
<p>This reminds me a bit of the Zig code we started with: we call an async function, recieve a handle (or a <em>frame</em>), resume it to our heart’s content, and in the end, destroy it. Seems pretty harmless.</p>
<h3 id="libraries-and-apis">Libraries and APIs</h3>
<p>The examples I’ve looked into so far are <em>compiled languages natively supporting async programming</em>. That’s pretty cool, but there are also a lot of frameworks and operating system APIs that allow those fun shenanigans.</p>
<p>For example, there are <a href="https://www.boost.org/doc/libs/1_57_0/libs/coroutine/doc/html/index.html">Boost.Coroutine</a> and <a href="https://www.boost.org/doc/libs/1_61_0/libs/coroutine2/doc/html/index.html">Boost.Coroutine2</a> for C++’s <a href="https://www.boost.org/">Boost</a>, Rust’s <a href="https://docs.rs/tokio/0.2.21/tokio/">tokio</a>, D’s <a href="https://tour.dlang.org/tour/en/multithreading/fibers">Fiber</a> and a lot of other <a href="https://en.wikipedia.org/wiki/Coroutine#Implementations">cool stuff</a>.</p>
<p>Operating systems like Windows support cooperative multitasking API’s through <a href="https://docs.microsoft.com/en-us/windows/win32/procthread/fibers">Fibers</a>, and Linux’s <a href="https://www.man7.org/linux/man-pages/man2/getcontext.2.html">ucontext</a> can be used to implement coroutines as well. Actually, some cool guy wrapped both of those up to a cross platform <a href="https://github.com/tonbit/coroutine">C++ library</a>. I’ll maybe cover those later, as its always fun to poke into those pesky little Windows DLLs.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The point of this post was to “set the stage” for a couple of posts I’ll publish here in the future. I hope I managed to give you the idea of “what” coroutines are, and maybe a touch of their use-case. In the next posts we’ll be digging into some of that oh-so-sweet <a href="https://en.wikipedia.org/wiki/X86_assembly_language">x86 assembly</a>, taking a look of some the <em>implementations</em> of coroutines provided by some compiled languages.</p>Have you ever looked up into the stars and wondered, “How the fuck is that feature implemented”? In this series, I’ll (hopefully) dive into the implementation of coroutines for several (compiled) programming languages.