Wednesday, June 30, 2021

importlib.reload(module) some learnings

Reload a previously imported module. The argument must be a module object, so it must have been successfully imported before. This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter. The return value is the module object (which can be different if re-importing causes a different object to be placed in sys.modules).


When reload() is executed:


  • Python module’s code is recompiled and the module-level code re-executed, defining a new set of objects which are bound to names in the module’s dictionary by reusing the loader which originally loaded the module. The init function of extension modules is not called a second time.
  • As with all other objects in Python the old objects are only reclaimed after their reference counts drop to zero.
  • The names in the module namespace are updated to point to any new or changed objects.
  • Other references to the old objects (such as names external to the module) are not rebound to refer to the new objects and must be updated in each namespace where they occur if that is desired.

When a module is reloaded, its dictionary (containing the module’s global variables) is retained. Redefinitions of names will override the old definitions, so this is generally not a problem. If the new version of a module does not define a name that was defined by the old version, the old definition remains. This feature can be used to the module’s advantage if it maintains a global table or cache of objects — with a try statement it can test for the table’s presence and skip its initialization if desired:


try:

    cache

except NameError:

    cache = {}



It is generally not very useful to reload built-in or dynamically loaded modules. Reloading sys__main__builtins and other key modules is not recommended. In many cases extension modules are not designed to be initialized more than once, and may fail in arbitrary ways when reloaded.


If a module imports objects from another module using from … import …, calling reload() for the other module does not redefine the objects imported from it — one way around this is to re-execute the from statement, another is to use import and qualified names (module.name) instead.


If a module instantiates instances of a class, reloading the module that defines the class does not affect the method definitions of the instances — they continue to use the old class definition. The same is true for derived classes.




References:

https://docs.python.org/3/library/importlib.html#importlib.invalidate_caches




Python loaders, finders

name

In this proposal, a module's "name" refers to its fully-qualified name, meaning the fully-qualified name of the module's parent (if any) joined to the simple name of the module by a period.



finder

A "finder" is an object that identifies the loader that the import system should use to load a module. Currently this is accomplished by calling the finder's find_module() method, which returns the loader.

Finders are strictly responsible for providing the loader, which they do through their find_module() method. The import system then uses that loader to load the module.



loader

A "loader" is an object that is used to load a module during import. Currently this is done by calling the loader's load_module() method. A loader may also provide APIs for getting information about the modules it can load, as well as about data from sources associated with such a module.

Right now loaders (via load_module()) are responsible for certain boilerplate, import-related operations. These are:

  1. Perform some (module-related) validation
  2. Create the module object
  3. Set import-related attributes on the module
  4. "Register" the module to sys.modules
  5. Exec the module
  6. Clean up in the event of failure while loading the module

This all takes place during the import system's call to Loader.load_module().



origin

This is a new term and concept. The idea of it exists subtly in the import system already, but this proposal makes the concept explicit.

"origin" in an import context means the system (or resource within a system) from which a module originates. For the purposes of this proposal, "origin" is also a string which identifies such a resource or system. "origin" is applicable to all modules.

For example, the origin for built-in and frozen modules is the interpreter itself. The import system already identifies this origin as "built-in" and "frozen", respectively. This is demonstrated in the following module repr: "<module 'sys' (built-in)>".

In fact, the module repr is already a relatively reliable, though implicit, indicator of a module's origin. Other modules also indicate their origin through other means, as described in the entry for "location".

It is up to the loader to decide on how to interpret and use a module's origin, if at all.




location

This is a new term. However the concept already exists clearly in the import system, as associated with the __file__ and __path__ attributes of modules, as well as the name/term "path" elsewhere.

A "location" is a resource or "place", rather than a system at large, from which a module is loaded. It qualifies as an "origin". Examples of locations include filesystem paths and URLs. A location is identified by the name of the resource, but may not necessarily identify the system to which the resource pertains. In such cases the loader would have to identify the system itself.

In contrast to other kinds of module origin, a location cannot be inferred by the loader just by the module name. Instead, the loader must be provided with a string to identify the location, usually by the finder that generates the loader. The loader then uses this information to locate the resource from which it will load the module. In theory you could load the module at a given location under various names.

The most common example of locations in the import system are the files from which source and extension modules are loaded. For these modules the location is identified by the string in the __file__ attribute. Although __file__ isn't particularly accurate for some modules (e.g. zipped), it is currently the only way that the import system indicates that a module has a location.

A module that has a location may be called "locatable".



cache

The import system stores compiled modules in the __pycache__ directory as an optimization. This module cache that we use today was provided by PEP 3147. For this proposal, the relevant API for module caching is the __cache__ attribute of modules and the cache_from_source() function in importlib.util. Loaders are responsible for putting modules into the cache (and loading out of the cache). Currently the cache is only used for compiled source modules. However, loaders may take advantage of the module cache for other kinds of modules.




 The importlib module, introduced with Python 3.1, now exposes a pure Python implementation of the APIs described by PEP 302, as well as of the full import system. It is now much easier to understand and extend the import system. While a benefit to the Python community, this greater accessibility also presents a challenge.



References

https://www.python.org/dev/peps/pep-0451/#finder 

Tuesday, June 29, 2021

Python building C and C++ extensions

A C extension for CPython is a shared library (e.g. a .so file on Linux, .pyd on Windows), which exports an initialization function.


To be importable, the shared library must be available on PYTHONPATH, and must be named after the module name, with an appropriate extension. When using distutils, the correct filename is generated automatically.


The initialization function has the signature:


PyObjectPyInit_modulename(void)


It returns either a fully-initialized module, or a PyModuleDef instance. See Initializing C modules for details.

For modules with ASCII-only names, the function must be named PyInit_<modulename>, with <modulename> replaced by the name of the module. When using Multi-phase initialization, non-ASCII module names are allowed. In this case, the initialization function name is PyInitU_<modulename>, with <modulename> encoded using Python’s punycode encoding with hyphens replaced by underscores. In Python:



def initfunc_name(name):

    try:

        suffix = b'_' + name.encode('ascii')

    except UnicodeEncodeError:

        suffix = b'U_' + name.encode('punycode').replace(b'-', b'_')

    return b'PyInit' + suffix



It is possible to export multiple modules from a single shared library by defining multiple initialization functions. However, importing them requires using symbolic links or a custom importer, because by default only the function corresponding to the filename is found. See the “Multiple modules in one library” section in PEP 489 for details.




References:

https://docs.python.org/3/extending/building.html

How to create a python whl file ?

Python wheel is an executable pip package. This allows organize all your Python modules in nice and clean way for easy importing the underlying Python function / class in other codes. 

Python module is essentially a script generally consisting of some functions and/or classes, which can be referenced in other codes to make them concise, more readable, and easy to upgrade/enhance/maintain the modules — as all of it is kept in a single place.


Once a Python Wheel is created, you can install it (file format with .whl extension) using simple pip install [name of wheel file].


Below is steps to create simple and basic wheel file


Keep all the modules (python scripts), packages (folders/directories, which contain the modules) in a parent directory. Name the root directory whatever you like, typically something related to a project


Preferably, create an empty .py file named __init__, and place this __init__.py under all the package directories and sub-package / sub-directories. No need to keep this in the root directory. Note that, this is not mandatory, but will be helpful.


Create a file named setup.py and place it in the root directory. Content of this script at the very minimal should include the distribution name, version number, and list of package names. An example below



from setuptools import setup, find_packages


setup(

    # this will be the package name you will see, e.g. the output of 'conda list' in anaconda prompt

    name='ruleswhl',

    # some version number you may wish to add - increment this after every update

    version='1.0',


    # Use one of the below approach to define package and/or module names:


    # if there are only handful of modules placed in root directory, and no packages/directories exist then can use below syntax

    #     packages=[''], #have to import modules directly in code after installing this wheel, like import mod2 (respective file name in this case is mod2.py) - no direct use of distribution name while importing


    # can list down each package names - no need to keep __init__.py under packages / directories

    #     packages=['<list of name of packages>'], #importing is like: from package1 import mod2, or import package1.mod2 as m2


    # this approach automatically finds out all directories (packages) - those must contain a file named __init__.py (can be empty)

    # include/exclude arguments take * as wildcard, . for any sub-package names

    packages=find_packages(),

)



Now from the command line, enter the below to build the whl file 


python setup.py bdist_wheel


The built whl file can be installed as usual using pip 


pip install <whl file>


References

https://medium.com/swlh/beginners-guide-to-create-python-wheel-7d45f8350a94

Django Model How to have a hook to deletion of the model

For the DefaultAdminSite the delete_queryset is called if the user has the correct permissions, the only difference is that the original function calls queryset.delete() which doesn't trigger the model delete method. This is less efficient since is not a bulk operation anymore, but it keeps the filesystem clean.


Below are example Model and ModelAdmins 


models.py

class MyModel(models.Model):

    file = models.FileField(upload_to=<path>)


    def save(self, *args, **kwargs):

        if self.pk is not None:

            old_file = MyModel.objects.get(pk=self.pk).file

            if old_file.path != self.file.path:

                self.file.storage.delete(old_file.path)


        return super(MyModel, self).save(*args, **kwargs)


    def delete(self, *args, **kwargs):

        ret = super(MyModel, self).delete(*args, **kwargs)

        self.file.storage.delete(self.file.path)

        return ret

admin.py

class MyModelAdmin(admin.ModelAdmin):


    def delete_queryset(self, request, queryset):

        for obj in queryset:

            obj.delete()


References:

https://stackoverflow.com/questions/1471909/django-model-delete-not-triggered

Model Meta Options in Django

db_table

Options.db_table

The name of the database table to use for the model:

db_table = 'music_album'



Table names

To save you time, Django automatically derives the name of the database table from the name of your model class and the app that contains it. A model’s database table name is constructed by joining the model’s “app label” – the name you used in manage.py startapp – to the model’s class name, with an underscore between them.

For example, if you have an app bookstore (as created by manage.py startapp bookstore), a model defined as class Book will have a database table named bookstore_book.

To override the database table name, use the db_table parameter in class Meta.



verbose_name

Options.verbose_name

A human-readable name for the object, singular:

verbose_name = "pizza"

If this isn’t given, Django will use a munged version of the class name: CamelCase becomes camel case.



verbose_name_plural

Options.verbose_name_plural

The plural name for the object:

verbose_name_plural = "stories"

If this isn’t given, Django will use verbose_name + "s".



Read-only Meta attributes

label

Options.label

Representation of the object, returns app_label.object_name, e.g. 'polls.Question'.

label_lower

Options.label_lower

Representation of the model, returns app_label.model_name, e.g. 'polls.question'.



constraints

Options.constraints

A list of constraints that you want to define on the model:

from django.db import models


class Customer(models.Model):

    age = models.IntegerField()


    class Meta:

        constraints = [

            models.CheckConstraint(check=models.Q(age__gte=18), name='age_gte_18'),

        ]



Others few are


index_together = [

    ["pub_date", "deadline"],

]


unique_together = [['driver', 'restaurant']]



Indexes to be used in place of index_together. 


class Meta:

        indexes = [

            models.Index(fields=['last_name', 'first_name']),

            models.Index(fields=['first_name'], name='first_name_idx'),

        ]



ordering

Options.ordering

The default ordering for the object, for use when obtaining lists of objects:



ordering = ['-order_date']

ordering = ['pub_date']

ordering = ['-pub_date']

ordering = ['-pub_date', 'author']

ordering = [F('author').asc(nulls_last=True)]




References:

https://docs.djangoproject.com/en/3.2/ref/models/options/

Monday, June 28, 2021

python Importlib main methods



Import a module. The name argument specifies what module to import in absolute or relative terms (e.g. either pkg.mod or ..mod). If the name is specified in relative terms, then the package argument must be set to the name of the package which is to act as the anchor for resolving the package name (e.g. import_module('..mod', 'pkg.subpkg') will import pkg.mod).


The import_module() function acts as a simplifying wrapper around importlib.__import__(). This means all semantics of the function are derived from importlib.__import__(). The most important difference between these two functions is that import_module() returns the specified package or module (e.g. pkg.mod), while __import__() returns the top-level package or module (e.g. pkg).


If you are dynamically importing a module that was created since the interpreter began execution (e.g., created a Python source file), you may need to call invalidate_caches() in order for the new module to be noticed by the import system.If you are dynamically importing a module that was created since the interpreter began execution (e.g., created a Python source file), you may need to call invalidate_caches() in order for the new module to be noticed by the import system.


importlib.reload(module)

Reload a previously imported module. The argument must be a module object, so it must have been successfully imported before. This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter. The return value is the module object (which can be different if re-importing causes a different object to be placed in sys.modules).

When reload() is executed:

  • Python module’s code is recompiled and the module-level code re-executed, defining a new set of objects which are bound to names in the module’s dictionary by reusing the loader which originally loaded the module. The init function of extension modules is not called a second time.
  • As with all other objects in Python the old objects are only reclaimed after their reference counts drop to zero.
  • The names in the module namespace are updated to point to any new or changed objects.
  • Other references to the old objects (such as names external to the module) are not rebound to refer to the new objects and must be updated in each namespace where they occur if that is desired.



References:

https://docs.python.org/3/library/importlib.html


Metaprogramming in Python

The term metaprogramming refers to the potential for a program to have knowledge of or manipulate itself. Python supports a form of metaprogramming for classes called metaclasses. Metaclasses are an esoteric OOP concept, lurking behind virtually all Python code. You are using them whether you are aware of it or not.


When the need arises, however, Python provides a capability that not all object-oriented languages support: you can get under the hood and define custom metaclasses


understanding Python metaclasses is worthwhile, because it leads to a better understanding of the internals of Python classes in general. You never know: you may one day find yourself in one of those situations where you just know that a custom metaclass is what you want.


Django Model_meta API 


The model _meta API is at the core of the Django ORM. It enables other parts of the system such as lookups, queries, forms, and the admin to understand the capabilities of each model. The API is accessible through the _meta attribute of each model class, which is an instance of an django.db.models.options.Options object.



Methods that it provides can be used to:

  • Retrieve all field instances of a model
  • Retrieve a single field instance of a model by name


Field access API


Retrieving a single field instance of a model by name


Options.get_field(field_name)

Returns the field instance given a name of a field.

field_name can be the name of a field on the model, a field on an abstract or inherited model, or a field defined on another model that points to the model. In the latter case, the field_name will be (in order of preference) the related_query_name set by the user, the related_name set by the user, or the name automatically generated by Django.


Hidden fields cannot be retrieved by name.


If a field with the given name is not found a FieldDoesNotExist exception will be raised.


Example is given below 


from django.contrib.auth.models import User

User._meta.get_field('username')


A field from another model that has a relation with the current model

User._meta.get_field('logentry')

<ManyToOneRel: admin.logentry>



User._meta.get_field('does_not_exist')

Traceback (most recent call last):

    ...

FieldDoesNotExist: User has no field named 'does_not_exist'


Retrieving all field instances of a model

Options.get_fields(include_parents=Trueinclude_hidden=False)



include_parents

True by default. Recursively includes fields defined on parent classes. If set to Falseget_fields() will only search for fields declared directly on the current model. Fields from models that directly inherit from abstract models or proxy classes are considered to be local, not on the parent.



include_hidden

False by default. If set to Trueget_fields() will include fields that are used to back other field’s functionality. This will also include any fields that have a related_name (such as ManyToManyField, or ForeignKey) that start with a “+”.



from django.contrib.auth.models import User

>>> User._meta.get_fields()

(<ManyToOneRel: admin.logentry>,

 <django.db.models.fields.AutoField: id>,

 <django.db.models.fields.CharField: password>,

 <django.db.models.fields.DateTimeField: last_login>,

 <django.db.models.fields.BooleanField: is_superuser>,

 <django.db.models.fields.CharField: username>,

 <django.db.models.fields.CharField: first_name>,

 <django.db.models.fields.CharField: last_name>,

 <django.db.models.fields.EmailField: email>,

 <django.db.models.fields.BooleanField: is_staff>,

 <django.db.models.fields.BooleanField: is_active>,

 <django.db.models.fields.DateTimeField: date_joined>,

 <django.db.models.fields.related.ManyToManyField: groups>,

 <django.db.models.fields.related.ManyToManyField: user_permissions>)