Platforms: Windows x86, x64, Unix, Linux, OSX, iOS
VMem - Fast and efficient
C++ malloc replacement
There are various sorts of memory corruption, but the most common is something writing to memory after it has been freed, and this is the case that we will be dealing with here.
This sort of corruption can be very difficult, because the way that most allocators work is that they re-allocate the most recently freed allocation, so the chances are that the freed allocation that you are writing to is already being used by something else. When you write to that memory, you will be corrupting another object.
The result of that corruption can be anything from slightly strange behaviour to a crash, it all depends on what happens to own that memory. Even worse, it doesn’t always result in anything noticeable, but causes unexpected behaviour or crashes in unrelated code.
Some memory corruption bugs can go unnoticed in code for a long time, causing subtle artefacts and the odd crash now and then. They can mysteriously come and go over time as your memory layout changes. The first thing to do is to catch these corruptions before they have knock on effects, and confirm that you really are dealing with a memory corruption.
VMem has a significant amount of integrity checking, memory guards around allocations and addition debug options. It isn’t uncommon for VMem to find problems when integrated into a new codebase simply because it does more checking than most allocators. The first thing to enable in VMem if you suspect you have a memory corruption is the trail guards. You can enable them by changing the debug level to 3 (define VMEM_DEBUG_LEVEL in VMem.hpp).
When trail guards are enabled freed allocations are not immediately re-used. Freed allocations are memset to 0xbd and put onto a queue. This queue is periodically checked to ensure that the memory is still 0xbd. When the allocation gets to the end of the queue is checked again, and then made available to be re-allocated. The idea is that if something writes to an allocation after it has been freed, it will be on the trail guard queue, and VMem will assert when it checks this memory.
The trail guard asserts won’t tell you what corrupted the memory, only that it has been corrupted. This is still valuable, it’s much better to corrupt something on the trail guard queue than an object in your application. You can also see the value that the memory was corrupted with, and the size of the allocation. This is valuable information for the next step; to find what is doing the corruption.
One was to catch the allocation as it happens is to use Page Protection. You can use Page Protection by setting the VMem debug level to 4. Then change the VMemShouldProtect function to protect the allocation size that you think is being corrupted.
VMem will then use the ProtectedHeap for all allocations of size 128. The protected heap works by allocating a separate system page for each allocation. When the allocation is freed, the page is de-committed but it isn’t re-used. This uses up virtual memory, but not physical memory. Anything that tries to write to a decommitted page will access violate at the time of the write, allowing you to catch the memory corruption as it happens.
This obviously uses up a lot of virtual memory, which can be a problem on 32bit processes. Many real-time applications such as games allocate many thousands of allocations per second, and if each allocation is using up on 4K page that’s a lot of memory! By default, the page heap is set to a maximum size of 100MB, but you can change this to anything you want by changing the following define:
#define VMEM_PROTECTED_HEAP_SIZE (100*1024*1024)
Ideally you want to make this as large as possible. Antoher thing you can do is to only protect every n’th allocation, for example:
This will use up 10 times less memory, but you might have to run your application 10 times to have a good chance of catching the bug. You could also only turn the protection on at a specific point where you think the corruption is happening.
Sometimes you simply have too many allocation and not enough memory to find the corruption using the protected heap. The next option is to use MemPro. When you have found a corruption, it is almost always after the corruption has happened, wouldn’t it be great if you could just rewind time and see what was previously at the memory address that had been corrupted? With MemPro, you can do exactly this. MemPro is a memory profiler that tracks every allocation and free. You can specify a range of memory, and then ask MemPro to show you everything that has been allocated at that memory address. It shows you the time and callstack of each allocation and free that intersects the memory range.
Enable MemPro in your code (see the MemPro documentation on how to do this) and connect to your application or set it to write out a dump file. Run your application with the VMem debug level set to 3 and catch the memory corruption. The VMem assert will tell you the address of the corruption. In the MemPro application, disconnect from your application (or load up the dump file) and go to the ‘Rewind Memory’ tool. Enter the address range of the corruption and hit ‘Get History’.
MemPro will list all allocations and frees that intersect with the specified range. You can use the slider to scrub backwards and forwards through time to see the different allocations, and click on the allocations to see their callstacks.
Now, it could obviously be any of the allocations that have written to the memory after it had been freed, but more often than not it’s the last allocation that was allocated that has done the damage. This is the first one to check, check your code to see if you can find anything that looks suspect. Sometimes you will be lucky and there will only be one or two allocations to have allocated at the corrupted address, and you can be sure that one of these allocations is the culprit.
If you have many allocations at the address and you are not sure which allocation did the corruption, it’s a good idea to make a note of the last 10 allocations. Run your application again, repeat the process and look at the allocations again. If all the allocations are different, but one is the same in each run you can be fairly sure that that is the one doing the corruption.
In my time in the games industry I’ve fixed many memory corruption bugs using the techniques described here. The most difficult thing is often to re-create the corruption, so it’s important to have every debug check enabled possible, and track all allocations using MemPro. This makes the corruption more likely to be caught, and makes it more likely that you will be able to find the cause of the memory corruption. Not only is VMem a high performance memory manager, it is also one of the most powerful memory diagnostics tools out there. I often use it simply to track down issues. This, combined with the MemPro memory profiler is a powerful combination. I hope you also find them useful.