Wednesday, July 14, 2021

Python Memory management

Unlike many other languages, Python does not necessarily release the memory back to the Operating System. Instead, it has a dedicated object allocator for objects smaller than 512 bytes, which keeps some chunks of already allocated memory for further use in the future. The amount of memory that Python holds depends on the usage patterns. In some cases, all allocated memory could be released only when a Python process terminates.


Standard CPython's garbage collector has two components, the reference counting collector and the generational garbage collector, known as gc module.


The reference counting algorithm is incredibly efficient and straightforward, but it cannot detect reference cycles. That is why Python has a supplemental algorithm called generational cyclic GC. It deals with reference cycles only.


Every variable in Python is a reference (a pointer) to an object and not the actual value itself. For example, the assignment statement just adds a new reference to the right-hand side. A single object can have many references (variable names).


EXAMPLES, WHERE THE REFERENCE COUNT INCREASES:

  • assignment operator
  • argument passing
  • appending an object to a list (object's reference count will be increased).



If the reference counting field reaches zero, CPython automatically calls the object-specific memory deallocation function. If an object contains references to other objects, then their reference count is automatically decremented too. Thus other objects may be deallocated in turn. For example, when a list is deleted, the reference count for all its items is decreased. If another variable references an item in a list, the item won't be deallocated.


Variables, which are declared outside of functions, classes, and blocks, are called globals. Usually, such variables live until the end of the Python's process. Thus, the reference count of objects, which are referred by global variables, never drops to zero. To keep them alive, all globals are stored inside a dictionary. You can get it by calling the globals() function.



It's important to understand that until your program stays in a block, Python interpreter assumes that all variables inside it are in use. To remove something from memory, you need to either assign a new value to a variable or exit from a block of code. In Python, the most popular block of code is a function; this is where most of the garbage collection happens. That is another reason to keep functions small and simple.


import sys


foo = []


# 2 references, 1 from the foo var and 1 from getrefcount

print(sys.getrefcount(foo))



def bar(a):

    # 4 references

    # from the foo var, function argument, getrefcount and Python's function stack

    print(sys.getrefcount(a))



bar(foo)

# 2 references, the function scope is destroyed

print(sys.getrefcount(foo))




Sometimes you need to remove a global or a local variable prematurely. To do so, you can use the del statement that removes a variable and its reference (not the object itself). This is often useful when working in Jupyter notebooks because all cell variables use the global scope.


The main reason why CPython uses reference counting is historical. There are a lot of debates nowadays about the weaknesses of such a technique. Some people claim that modern garbage collection algorithms can be more efficient without reference counting at all. The reference counting algorithm has a lot of issues, such as circular references, thread locking, and memory and performance overhead. Reference counting is one of the reasons why Python can't get rid of the GIL.




References 

https://rushter.com/blog/python-garbage-collector/


Python Memory management 


Unlike many other languages, Python does not necessarily release the memory back to the Operating System. Instead, it has a dedicated object allocator for objects smaller than 512 bytes, which keeps some chunks of already allocated memory for further use in the future. The amount of memory that Python holds depends on the usage patterns. In some cases, all allocated memory could be released only when a Python process terminates.


Standard CPython's garbage collector has two components, the reference counting collector and the generational garbage collector, known as gc module.


The reference counting algorithm is incredibly efficient and straightforward, but it cannot detect reference cycles. That is why Python has a supplemental algorithm called generational cyclic GC. It deals with reference cycles only.


Every variable in Python is a reference (a pointer) to an object and not the actual value itself. For example, the assignment statement just adds a new reference to the right-hand side. A single object can have many references (variable names).


EXAMPLES, WHERE THE REFERENCE COUNT INCREASES:

  • assignment operator
  • argument passing
  • appending an object to a list (object's reference count will be increased).



If the reference counting field reaches zero, CPython automatically calls the object-specific memory deallocation function. If an object contains references to other objects, then their reference count is automatically decremented too. Thus other objects may be deallocated in turn. For example, when a list is deleted, the reference count for all its items is decreased. If another variable references an item in a list, the item won't be deallocated.


Variables, which are declared outside of functions, classes, and blocks, are called globals. Usually, such variables live until the end of the Python's process. Thus, the reference count of objects, which are referred by global variables, never drops to zero. To keep them alive, all globals are stored inside a dictionary. You can get it by calling the globals() function.



It's important to understand that until your program stays in a block, Python interpreter assumes that all variables inside it are in use. To remove something from memory, you need to either assign a new value to a variable or exit from a block of code. In Python, the most popular block of code is a function; this is where most of the garbage collection happens. That is another reason to keep functions small and simple.


import sys


foo = []


# 2 references, 1 from the foo var and 1 from getrefcount

print(sys.getrefcount(foo))



def bar(a):

    # 4 references

    # from the foo var, function argument, getrefcount and Python's function stack

    print(sys.getrefcount(a))



bar(foo)

# 2 references, the function scope is destroyed

print(sys.getrefcount(foo))




Sometimes you need to remove a global or a local variable prematurely. To do so, you can use the del statement that removes a variable and its reference (not the object itself). This is often useful when working in Jupyter notebooks because all cell variables use the global scope.


The main reason why CPython uses reference counting is historical. There are a lot of debates nowadays about the weaknesses of such a technique. Some people claim that modern garbage collection algorithms can be more efficient without reference counting at all. The reference counting algorithm has a lot of issues, such as circular references, thread locking, and memory and performance overhead. Reference counting is one of the reasons why Python can't get rid of the GIL.




References 

https://rushter.com/blog/python-garbage-collector/

No comments:

Post a Comment