Wednesday, June 30, 2021

Python loaders, finders

name

In this proposal, a module's "name" refers to its fully-qualified name, meaning the fully-qualified name of the module's parent (if any) joined to the simple name of the module by a period.



finder

A "finder" is an object that identifies the loader that the import system should use to load a module. Currently this is accomplished by calling the finder's find_module() method, which returns the loader.

Finders are strictly responsible for providing the loader, which they do through their find_module() method. The import system then uses that loader to load the module.



loader

A "loader" is an object that is used to load a module during import. Currently this is done by calling the loader's load_module() method. A loader may also provide APIs for getting information about the modules it can load, as well as about data from sources associated with such a module.

Right now loaders (via load_module()) are responsible for certain boilerplate, import-related operations. These are:

  1. Perform some (module-related) validation
  2. Create the module object
  3. Set import-related attributes on the module
  4. "Register" the module to sys.modules
  5. Exec the module
  6. Clean up in the event of failure while loading the module

This all takes place during the import system's call to Loader.load_module().



origin

This is a new term and concept. The idea of it exists subtly in the import system already, but this proposal makes the concept explicit.

"origin" in an import context means the system (or resource within a system) from which a module originates. For the purposes of this proposal, "origin" is also a string which identifies such a resource or system. "origin" is applicable to all modules.

For example, the origin for built-in and frozen modules is the interpreter itself. The import system already identifies this origin as "built-in" and "frozen", respectively. This is demonstrated in the following module repr: "<module 'sys' (built-in)>".

In fact, the module repr is already a relatively reliable, though implicit, indicator of a module's origin. Other modules also indicate their origin through other means, as described in the entry for "location".

It is up to the loader to decide on how to interpret and use a module's origin, if at all.




location

This is a new term. However the concept already exists clearly in the import system, as associated with the __file__ and __path__ attributes of modules, as well as the name/term "path" elsewhere.

A "location" is a resource or "place", rather than a system at large, from which a module is loaded. It qualifies as an "origin". Examples of locations include filesystem paths and URLs. A location is identified by the name of the resource, but may not necessarily identify the system to which the resource pertains. In such cases the loader would have to identify the system itself.

In contrast to other kinds of module origin, a location cannot be inferred by the loader just by the module name. Instead, the loader must be provided with a string to identify the location, usually by the finder that generates the loader. The loader then uses this information to locate the resource from which it will load the module. In theory you could load the module at a given location under various names.

The most common example of locations in the import system are the files from which source and extension modules are loaded. For these modules the location is identified by the string in the __file__ attribute. Although __file__ isn't particularly accurate for some modules (e.g. zipped), it is currently the only way that the import system indicates that a module has a location.

A module that has a location may be called "locatable".



cache

The import system stores compiled modules in the __pycache__ directory as an optimization. This module cache that we use today was provided by PEP 3147. For this proposal, the relevant API for module caching is the __cache__ attribute of modules and the cache_from_source() function in importlib.util. Loaders are responsible for putting modules into the cache (and loading out of the cache). Currently the cache is only used for compiled source modules. However, loaders may take advantage of the module cache for other kinds of modules.




 The importlib module, introduced with Python 3.1, now exposes a pure Python implementation of the APIs described by PEP 302, as well as of the full import system. It is now much easier to understand and extend the import system. While a benefit to the Python community, this greater accessibility also presents a challenge.



References

https://www.python.org/dev/peps/pep-0451/#finder 

No comments:

Post a Comment