Garbage Collector

4 minute read

This article is part 2 in a series: Memory management



The language used for examples (and some of the more language specific parts, like new and delete) in the description of the stack and the heap is C++ but most of it applies in one way or another to other languages as well.

In short, the garbage collector (GC) is a functionality that cleans up all garbage (loose pointers). I won’t go in to the low level stuff when it comes to the GC, cause I can’t say that I’m an expert on it, but I’ll try to describe in short what it does…

A program basically makes use of two “types” of memory:

  • The Stack
  • The Heap

The Stack

Normally, the stack is a chunk of memory which is used during a specific threads lifetime. When the thread is done, the stack is gone. Whenever a function is called, a piece of the stack memory is used for all local variables, when the function is done, the memory that it used is free to use by another part of the thread. That means: when a object which is allocated on the stack goes out of scope, the memory is released.

void StackExampleFunction() {
  int tempValue = 10;
}

As stated in earlier posts, the scope starts with the { character and ends with the matching } character. Within the scope of the example function, a variable is initialized. Its a local variable which will only live for the time the function is executed, and the memory it uses on the stack will no longer be reserved when its done. The piece of memory that was used by the variable is then possible to write over with other data.

The Heap

The heap is the place in memory where all “dynamically allocated” data is stored. While data on the stack is “disposed” when the scope ends, the data on the heap is not. Anything allocated will have to be removed, else it will stay there til the application is terminated and the heap memory used by the application is released.

In the following example, a object of the type MyClass is created. The variable is declared in the stack and when the end of the function scope is reached it will be freed up. While the variable is freed (normally 4 bytes of memory (x86) to 8 bytes of memory (x64)), the memory it points to is not. A pointer is just a object which points to a part of the memory, it uses a few bytes on the stack for the pointer and the amount of memory that the class uses on the heap. When a pointer is out of scope, the data on the stack is removed, but not on the heap, to remove the data on the heap, the delete keyword is used.

public void HeapExampleFunction() {
  MyClass* obj = new MyClass();

  // delete obj;
}

The code which deletes the object from the heap is in the example commented out, that way the code creates a memory leak. The GC is created to make sure this type of things don’t happen. There are – of course – still memory leaks in a LOT of programs which are written in languages that uses a GC, but just this type of leaks are easier to avoid.

So what does the GC do?

The GC is a automatic memory management which cleans up the heap. Now, pointers are usually not used in the same sense in languages like C# or Java as in C++, in those languages, most objects created from classes are of a type called “reference types”.
Simply put, most objects are pretty much pointers. All instances of reference types are “known” by the GC upon creation, and whenever a object runs out of scope and all references to it is gone (I.E., it will not be possible to use anymore cause its not referenced anymore), its marked as ready to be removed.
The next time the GC runs its cleanup routine, the data will be removed from memory.

This sounds nice, and is quite nice. One does not have to remember to delete pointers and everything is fine and dandy!
The GC have some downsides too though.
If the implementation of the GC is bad, it will use more resources than one would like it to, which is of course bad, and even a good implementation will be more resource heavy than if not using a GC at all. The developer has little (in some cases no at all) control over when the GC collects and removes the data from ram, which can be quite a pain. And in some implementations the GC can create hiccups in the application when its freeing memory, which could be quite annoying! Especially in programs that are vital to run without any freezes (for example equipment used at a hospital or a game).

I personally like to code in languages with a GC. When its not extremely important to have full control over the memory and the amount of resources the application uses isn’t a big deal, a language with a GC could (in my opinion) easily be prefererad over a lower level language.