When the different software programs run and execute, they may accumulate various data objects which are of no particular use. These, unnecessary objects known as garbage in computer literal terms, simply eat up memory space and in turn, hamper performance. Thus, Garbage collection (GC) is that process or manner of cleaning up the unnecessary objects. Thus, this post here, delves into the mechanics of Garbage Collection in Python.

As a matter of fact, Python offers a lot of flexibility. In how it deals with these garbage. It includes its native support for manual as well as automatic garbage collection. So, what are we waiting for? Let’s get a move on !!!

Memory Space Management

Memory space management is a process by which the memory space during the read and writing process is manage. A memory manager is the one who determines the memory location for the data objects. The very process of allocating or providing memory spaces is termed as memory allocation.

Now, lets dive deeper. In our systems, a physical device known as Hard Drive to store data is present. It stores data coming from various programs and executions. The python code has various layers of abstraction. The Operating System(OS) carries out the various requests to read and write memory.

On top of the Operating System, are some special applications. Memory management for the Python code is handle by the Python Implementation application. There are various algorithms and other structs that Python uses for management of memory.

Garbage Collection in Python

The Garbage Collection 

As, we have already seen, Garbage collection is that process of cleaning up or handling the various unnecessary objects.

Therefore, it calls for an advanced and systematic implementation in place for intelligent memory space management. Because accumulation of unwanted data objects over time results in bad system performance. This problem is what is known as a memory leak. 

Various methods like tracing, reference counting, timestamp and escape analysis find use in the process of garbage collections. These methods may integrate within the compiler and runtime systems of various programming languages. Python though, uses the technique of reference counting.

Working of the Garbage Collection in Python

Python’s garbage pickup setup primarily utilizes the reference counting method. It uses the count of an object’s references to work out whether it should be deleted or not.

Reference Counting

Each object or variable in your Python code is tracked for the number of times it has been referenced. This reference count goes up whenever the variable is referenced and goes down whenever it’s de-referenced. Once this count reaches 0, Python is happy to delete that variable from your system’s memory. Then, it reclaims the blocks that had been held up due to the variable. We get access to an object’s reference count using the sys.getrefcount( )function. Let’s fiddle with it using the Python Shell in our terminal.

>>> import sys
>>> phrase = "Hello world!"
>>> sys.getrefcount(phrase)
2

The reference count here is 2. Why? Its due to the “phrase” variable being passed to the “getrefcount” is in turn, count as a reference.

Let’s see some more examples :

>>> arr = []
>>> arr.append(phrase) 
>>> sys.getrefcount(phrase)
3
>>> dict = {}
>>> dict['phrase'] = phrase 
>>> sys.getrefcount(phrase)
4

We see here that the reference count increases due to it being add to the array and the dictionary.

Now, let’s see the effect of de-referencing through an example.

>>> arr = []    # resetting the array
>>> sys.getrefcount(phrase)
3
>>> dict['phrase'] = "My name is Shubham"
>>> sys.getrefcount(phrase)
2

If you want to work with or know about Magic Methods reach here

The Reference Cycle

Here, we saw that when the reference count of an object becomes 0, Python deletes it from memory. However, in some special cases where the objects have been assigned values, the reference count may never become 0. What do we do in that case? We need to explicitly delete it from memory. This is what is known as the reference cycle.

Lets see some examples :

>>> arr = ['Shubham', 'Rakshit', 'Manya']
>>> arr
['Shubham', 'Rakshit', 'Manya']
>>> arr.append(arr) 
>>> arr
['Shubham', 'Rakshit', 'Manya', ['Shubham', 'Rakshit', 'Manya']]
undefined
>>> class ownClass:
...     pass
...
>>> Obj = ownClass()
>>> Obj.my_prop = Obj 

This is one situation in which the reference count will never reach 0, because it is being referenced to itself. Let’s see a situation where there are more number of data objects referenced to each other.

>>> dict1 = {}
>>> dict2 = {}
>>> dict1['dict2'] = dict2
>>> dict2['dict1'] = dict1

Since both objects are seen to be referencing each other, the least reference count will always be 1. Lets get on to see a very clever hack here :

>>> import ctypes # it allows access to an object from memory even if deleted

>>> class myObject(ctypes.Structure):
     _fields_ = [("refcnt", ctypes.c_long)]

>>> dict1 = {}
>>>dict2 = {}
>>> dict1['dict2'] = dict2
>>> dict2['dict1'] = dict1
>>> address_obj = id(dict1)      # getting the memory address 
>>> address_obj
140730947711232
>>> del dict1, dict2 # deleting both objects
>>> print(myObject.from_address(address_obj).refcnt) 
1

The gc() module — another way for garbage collection in Python

>>> import gc 
>>> import ctypes

>>> class myObject(ctypes.Structure):
     _fields_ = [("refcnt", ctypes.c_long)]

>>> dict1 = {}
>>>dict2 = {}
>>> dict1['dict2'] = dict2
>>> dict2['dict1'] = dict1

>>> collection = gc.collect()
>>> print("Values : collection {} objects.".format(collection))
Values : collection 0 objects.

>>> address_obj = id(dict1)      # getting the memory address 
>>> address_obj
140730947711232

>>> del dict1, dict2 # deleting both objects

>>> print(myObject.from_address(address_obj).refcnt) 
1

>>> collection = gc.collect()
>>> print("Values : collection {} objects.".format(collection))
Values : collection 2 objects.
>>> print(myObject.from_address(address_obj).refcnt) 
0
# its the reference count when the gc is called

The reference counting method in Python is automatic. It occurs in real-time. While, the generational garbage pickup operations is periodic. Let’s check out what else we will achieve using the gc module.

>>> import gc
>>> gc.get_threshold()
(800, 15, 15)

Let’s see the total number of data objects currently residing in memory using “gc.get_count()” 

>>> gc.get_count()
(788, 19, 0)

As we see in the above examples, that the 1st generation object count is 788. That’s quite huge, isn’t it? Now lets manually invoke the collection and see the difference it makes.

>>> gc.collect()
0
>>> gc.get_count()
(35, 0, 0)

The count now comes down to 35 from its earlier value of 788. Quite a magnanimous difference!

Now, lets change the threshold values manually.

>>> gc.set_threshold(900,25,25)
>>> gc.get_threshold()
(900,25,25)

CLOSING WORDS

In this post, we have come across what is garbage collection? Then, we moved from there to the topic of memory space manage. So, we then came back to get a clean, in-depth view of the garbage collection concept. We then, moved on to see how the garbage collection method works in Python. What are the various functions and modules associated with this task. We also saw some hacks. We have seen all these through the help of quite a lot of amazing hacks and knowledgeable examples.

Thus, I have tried my heart best to make this topic clear to you. But, in case, you still have some doubts lingering. Then, please do write to me in the comments section and I am as always, ever-ready to help you. And, also solve your queries and problems.

Categorized in: