Fixing the EndFrame function

Today we’ll be looking a compiler bug, and I don’t want it to sound like I’m airing grievances when we do. The compilers that coders have created for QuakeC over the years are awesome tools, they’ve added loads of new features, even added optimisation passes. One of the under-appreciated things they’ve done is improve the syntax and parsing of QuakeC – from tiny changes like making spaces between operations optional, to big changes like allowing nested function calls instead of using a temporary variable between the calls. So I love the work you guys have done, and this report is so you can keep on helping mod authors.
Advice (24/05/2014): The complier bug investigated in this article has already been fixed, read Easily fixing the EndFrame function to get a fixed version of the compiler and the bug-fixed version of the EndFrame code.

Symptoms

After I’d posted the Endframe function tutorial, I ran through all the steps and began playing around with it. I noticed a worrying glitch if I listed the entities right at the start of the map, the backstop would be at the end of the list as required, but its classname would be corrupted.

EDICT 139:
classname      ms/itembk2.wav
think          EndFrame()
nextthink      0.0500

Notice that not only has the classname been replaced, but that it’s been replaced by a fraction of a string. QuakeC doesn’t have string handling functions, you aren’t supposed to ever get a substring, so it was a big red flag that things have gone wrong somewhere. It did seem like it resolved itself after a few more entities were spawned, but the presence of a bug like this bothered me.

Investigation

The next step was to try and find out what triggered it initially. It turned out that the bug was triggered extremely early on, in InitBodyQue – the first function to spawn any entities. The line bodyque_head.owner = spawn(); was causing the bug – the backstop was uncorrupted at the end of the spawn() wrapper we wrote last time, and corrupted by the time we returned to InitBodyQue. How could returning from a function corrupt an entity?

The answer is found in the difference between what the QuakeC looks like, and what the compiler creates from it. So to understand where the bug came from we need to know about bytecode.

Bytecode

QuakeC is not loaded as text by the engine, the compile process turns it into a stream of bytecodes. Each bytecode represents a simple instruction, like loading values from memory, storing values somewhere, adding or subtracting them, or running another function. All the complex behaviour in QuakeC is built from just a few dozen basic instructions like this. Probably the most similar widely-used language is Java.

Today the crucial aspect of bytecodes is how they handle an assignment like self.health = 100. This is actually split into two stages in bytecode. The first instruction calculates the correct memory location to store health on the self entity. The second stores 100 in the location calculated by the first. So what goes wrong in InitBodyQue?

Diagnosis

The crucial difference in the line bodyque_head.owner = spawn(); is that this line is assigning the result of a function to the entity. When FTEQCC calculates the sequence of bytecodes needed for this (with-O2 optimisations), it creates

  • Find memory location
  • Call function
  • Store result in location

When spawn() was a built-in function, this arrangement was safe. Our wrapper function was the change that exposed the bug, because now a bunch of QuakeC operations run during that second bullet point. One of the operations is e.classname = "backstop";. This involves calculating another address, which overwrites the address from the first bulletpoint due to a combination of optimisations.

Result: the entity which is returned from spawn gets stored in the classname field of the backstop! For reasons I won’t go into, it’s not entirely crazy that assigning an entity to a string would result in the back half of a random string. To sum up, the bug occurs under the following conditions.

  • The result of a function is assigned to a field on an entity
  • The function assigns a value to a field on an entity
  • The code is compiled with fteqcc at -O2 or above

The bug is that the result of the function gets assigned to the field and entity last referenced in the function in question.

Mitigation

To avoid encountering this bug, one solution is to turn off the optimisations. The first problem with this is that it exposes a bug in CreateBackstop which causes an infinite loop, but we’ll fix that shortly…We can fix this issue without having to lose the optimisations, so I wouldn’t choose this workaround from the options available.

The next way to avoid this bug is to use a temporary variable in these kind of assignments: assign the result of the function to a local variable, then assign that local variable to the field of the entity. This is the workaround I will use to fix up the EndFrame code.

changes

The code that needs to change from last time is CreateBackstop:

void() CreateBackstop =
{
    local entity e, f, head;
//begin our list with the temp backstop
    e = f = head = backstop;
// spawn() already contains the code to make a backstop
// so we just burn off enough entities that one appears
// at the back
    do
    {
        f = spawn();
        e.chain = f;
        e = e.chain;
    }
    while(nextent(backstop) != world);
    while(head != f)
    {
        e = head;
        head = e.chain;
        remove(e);
    };
}

Notice how the second loop has a new terminating condition. The old one didn’t actually ever terminate, except when the bug was failing to create a linked list in the first place…

We also need to change any existing code which might trigger the bug. In stock Quake I think the only function that need a tweak is InitBodyQue, but in your own mod there may be more. Here’s how InitBodyQue looks when fixed

void() InitBodyQue =
{
    local entity    e;
    e = spawn();
    bodyque_head = e;
    bodyque_head.classname = "bodyque";
    e = spawn();
    bodyque_head.owner = e;
    bodyque_head.owner.classname = "bodyque";
    e = spawn();
    bodyque_head.owner.owner = e;
    bodyque_head.owner.owner.classname = "bodyque";
    e = spawn();
    bodyque_head.owner.owner.owner = e;
    bodyque_head.owner.owner.owner.classname = "bodyque";
    bodyque_head.owner.owner.owner.owner = bodyque_head;
};

Also consider that you may have written other functions whose return values are assigned to entity fields. Any of these functions could trigger this bug if they assign a value to an entity before returning. It is a surprisingly rare pattern though, functions can often be divided into those that mutate the state of the world but don’t return anything important, and those that return computation results with no side effects.

Fixes

As for a compiler side fix, I think the best way to fix it would be just to re-order the bytecodes for this situation to look like

  • Call function
  • Find memory location
  • Store result in location

But if you told me easier said than done then I would not be surprised.

Advertisements

2 thoughts on “Fixing the EndFrame function

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s