Today we’ll be looking a compiler bug, and I don’t want it to sound like I’m airing grievances when we do. The compilers that coders have created for QuakeC over the years are awesome tools, they’ve added loads of new features, even added optimisation passes. One of the under-appreciated things they’ve done is improve the syntax and parsing of QuakeC – from tiny changes like making spaces between operations optional, to big changes like allowing nested function calls instead of using a temporary variable between the calls. So I love the work you guys have done, and this report is so you can keep on helping mod authors.
Advice (24/05/2014): The complier bug investigated in this article has already been fixed, read Easily fixing the EndFrame function to get a fixed version of the compiler and the bug-fixed version of the EndFrame code.
Symptoms
After I’d posted the Endframe function tutorial, I ran through all the steps and began playing around with it. I noticed a worrying glitch if I listed the entities right at the start of the map, the backstop would be at the end of the list as required, but its classname would be corrupted.
EDICT 139: classname ms/itembk2.wav think EndFrame() nextthink 0.0500
Notice that not only has the classname been replaced, but that it’s been replaced by a fraction of a string. QuakeC doesn’t have string handling functions, you aren’t supposed to ever get a substring, so it was a big red flag that things have gone wrong somewhere. It did seem like it resolved itself after a few more entities were spawned, but the presence of a bug like this bothered me.
Investigation
The next step was to try and find out what triggered it initially. It turned out that the bug was triggered extremely early on, in InitBodyQue – the first function to spawn any entities. The line bodyque_head.owner = spawn(); was causing the bug – the backstop was uncorrupted at the end of the spawn() wrapper we wrote last time, and corrupted by the time we returned to InitBodyQue. How could returning from a function corrupt an entity?
The answer is found in the difference between what the QuakeC looks like, and what the compiler creates from it. So to understand where the bug came from we need to know about bytecode.
Bytecode
QuakeC is not loaded as text by the engine, the compile process turns it into a stream of bytecodes. Each bytecode represents a simple instruction, like loading values from memory, storing values somewhere, adding or subtracting them, or running another function. All the complex behaviour in QuakeC is built from just a few dozen basic instructions like this. Probably the most similar widely-used language is Java.
Today the crucial aspect of bytecodes is how they handle an assignment like self.health = 100. This is actually split into two stages in bytecode. The first instruction calculates the correct memory location to store health on the self entity. The second stores 100 in the location calculated by the first. So what goes wrong in InitBodyQue?
Diagnosis
The crucial difference in the line bodyque_head.owner = spawn(); is that this line is assigning the result of a function to the entity. When FTEQCC calculates the sequence of bytecodes needed for this (with-O2 optimisations), it creates
- Find memory location
- Call function
- Store result in location
When spawn() was a built-in function, this arrangement was safe. Our wrapper function was the change that exposed the bug, because now a bunch of QuakeC operations run during that second bullet point. One of the operations is e.classname = "backstop";. This involves calculating another address, which overwrites the address from the first bulletpoint due to a combination of optimisations.
Result: the entity which is returned from spawn gets stored in the classname field of the backstop! For reasons I won’t go into, it’s not entirely crazy that assigning an entity to a string would result in the back half of a random string. To sum up, the bug occurs under the following conditions.
- The result of a function is assigned to a field on an entity
- The function assigns a value to a field on an entity
- The code is compiled with fteqcc at -O2 or above
The bug is that the result of the function gets assigned to the field and entity last referenced in the function in question.
Mitigation
To avoid encountering this bug, one solution is to turn off the optimisations. The first problem with this is that it exposes a bug in CreateBackstop which causes an infinite loop, but we’ll fix that shortly…We can fix this issue without having to lose the optimisations, so I wouldn’t choose this workaround from the options available.
The next way to avoid this bug is to use a temporary variable in these kind of assignments: assign the result of the function to a local variable, then assign that local variable to the field of the entity. This is the workaround I will use to fix up the EndFrame code.
changes
The code that needs to change from last time is CreateBackstop:
void() CreateBackstop =
{
local entity e, f, head;
//begin our list with the temp backstop
e = f = head = backstop;
// spawn() already contains the code to make a backstop
// so we just burn off enough entities that one appears
// at the back
do
{
f = spawn();
e.chain = f;
e = e.chain;
}
while(nextent(backstop) != world);
while(head != f)
{
e = head;
head = e.chain;
remove(e);
};
}
Notice how the second loop has a new terminating condition. The old one didn’t actually ever terminate, except when the bug was failing to create a linked list in the first place…
We also need to change any existing code which might trigger the bug. In stock Quake I think the only function that need a tweak is InitBodyQue, but in your own mod there may be more. Here’s how InitBodyQue looks when fixed
void() InitBodyQue =
{
local entity e;
e = spawn();
bodyque_head = e;
bodyque_head.classname = "bodyque";
e = spawn();
bodyque_head.owner = e;
bodyque_head.owner.classname = "bodyque";
e = spawn();
bodyque_head.owner.owner = e;
bodyque_head.owner.owner.classname = "bodyque";
e = spawn();
bodyque_head.owner.owner.owner = e;
bodyque_head.owner.owner.owner.classname = "bodyque";
bodyque_head.owner.owner.owner.owner = bodyque_head;
};
Also consider that you may have written other functions whose return values are assigned to entity fields. Any of these functions could trigger this bug if they assign a value to an entity before returning. It is a surprisingly rare pattern though, functions can often be divided into those that mutate the state of the world but don’t return anything important, and those that return computation results with no side effects.
Fixes
As for a compiler side fix, I think the best way to fix it would be just to re-order the bytecodes for this situation to look like
- Call function
- Find memory location
- Store result in location
But if you told me easier said than done then I would not be surprised.
[…] ← Quoth 2.2 Released Fixing the EndFrame function → […]
[…] ← Fixing the EndFrame function […]