"from stackoverflow import quick_sort will go through the search results of [python] quick sort looking for the largest code block that doesn’t syntax error in the highest voted answer from the highest voted question and return it as a module. If that answer doesn’t have any valid python
code, it checks the next highest voted answer for code blocks."
I once implemented a custom importer as part of a system where the Python interpreter never touched the filesystem.
Can I just say that the introduction to Lisp you made in your README.md is really good!
I keep trying to get into Lisp (and JavaScript, and TypeScript, etc etc) but I've been a sysadmin my whole professional life and also a chronic pain sufferer. That translates into mostly having energy only for work and that's it, not much motivation to learn after work or on the weekend.
In my DevOps job, I write Terraform, plus read javascript and cloudformation yaml. I do wish I could convert my current stuff to AWS CDK, but I don't want to fragment the multiple projects that are using Terraform. (I haven't looked into tf-cdk much at all yet)
If you have any questions about it, I'd be happy to answer. This stuff is pure fun mixed with a shot of professionalism.
For what it's worth, as someone with narcolepsy, I relate quite a lot to your chronic pain. (https://twitter.com/theshawwn/status/1392213804684038150) For me, it mostly translated into wandering aimlessly from job to job, since I thought no one would have me. I hope that you find your way -- there's nothing wrong at all with taking it slow and spending years on something that takes others a few months. Everyone is different, and it's all about the fun.
Hah! Good catch! That readme typo has been in there since Lumen’s inception.
It evaluates to -500000, as you’d expect.
(Just kidding, it’s -50000. Amusingly, the https://docs.ycombinator.lol version gets it right, since it has to; every expression is actually evaluated in the browser.)
Lots of things you can do with Python but probably shouldn't and people typically don't. That's one reason I prefer it to Ruby, or even Node, where monkey-patching or otherwise exposing bad magical behaviours is common and even encouraged -- the power is all there, but the ecosystem encourages you to use it for good, not evil.
This sounds very much like the good kind of magic, though.
To add to this: bad magic is magic that the user has to be aware of in order to use safely. Good magic is an implementation detail that the user doesn't need to know anything about.
Then it's punted over to compiler.l https://github.com/shawwn/pymen/blob/ml/compiler.l where it's passed through `expand`, which does a `macroexpand` followed by a `lower`. E.g. (do (do (do (print 'hi)))) becomes ["print", "\"hi\""]
Then the final thing is thrown to the compile function, which spits out print("hi") -- the final valid Python code that gets passed into the standard python `exec` function.
Works with all the standard python things, like async functions and `with` contexts. Been screwing around with it for a few years.
I have a homemade language that's not a lisp, but is lispy in some ways. I've only got to the point where I expand the code to an abstract syntax tree, and deciding how to go from there is the hard part for me right now. It's never crossed my mind to just compile it to valid python code. Thanks for the inspiration!
If the import system allows your code to run instead of the ‘import’ statement, and to produce the module however you want, then of course you can do whatever: load code from Google or StackOverflow results, if you wish.
> As in existing language supplied eg. Perl, Java, etc., or literally anything? Like bootstrapping your own home made language from scratch?
Should work with anything as long as you can ultimately generate Python bytecode (and provide a module object). The import system is not simple, but it's really open.
NodeJS allows this as well. I think this is pretty much a must-have feature for any serious dynamic language.
Edit: A must have for any prolific dynamic language. But now I’m not sure that’s true, because even though it apparently works in Python, it’s certainly not widely used. In NodeJS this feature is used quite heavily for typescript, coffeescript (etcetera) interop.
> But now I’m not sure that’s true, because even though it apparently works in Python, it’s certainly not widely used.
I mean, it is used—even in the standard library—but often for alternative packaging (e.g., loading python modules in zip files) rather than alternative languages. It may be used less prominently than in Node, but it definitely is used for a variety of things.
It's pretty much the exact feature that's behind the saying that ‘to parse Perl you must have the Perl interpreter’. Because Perl allows some kind of language/imports handlers, as exhibited by tricks like Lingua::Romana::Perligata.
You mean that you can override the import mechanism, which means that it allows you to do just about anything, including making it work with other languages.
There are two things that are annoying about Python's import system.
Number one is that relative imports are weird. My intuition about imports is good enough that I never bothered to learn all the rules explicitly, but sometimes something simple is just not possible and it bites me. I think the case is importing files relative to a script (and not running with python -m ...).
Number two is, in order to do package management, you have to create a fake python installation and bend PYTHONPATH. Virtualenvs are the canonical way to do it, but to me it feels like a hack - the core language seems to wants all packages installed in /usr. So now I have all these virtualenvs lying around and they are detached from the scripts.
Why couldn't the import system resolve versions, too? You could say `import foo >= 1.0` and it would download it to some global cache, and the import the correct versions from the cache.
Once I wrapped my head around when you can and can't use relative imports, I've been pretty ok with them. The think that irks me is whether they work changes based on where you've invoked Python from. `./bin/my_script.py` behaves differently from `./my_script.py`.
Coming from JS, that was a pretty frustrating realization.
> The think that irks me is whether they work changes based on where you've invoked Python from. `./bin/my_script.py` behaves differently from `./my_script.py`.
It does not though, unless you have altered the default sys.path to always contains `.`.
When running a file / script, Python will add that file's directory as the first lookup path, so these two invocations should have the exact same sys.path, and thus the same module resolution throughout[0].
If you have added `.` to sys.path (which is not the default), then the first invocation will also have the "grand-parent" folder on the sys.path, while the second won't.
This also doesn't seem to have anything to do with relative imports, you need to already be inside a package for relative imports to do anything.
[0] that is not the case of `-c` and `-m`, both add CWD to the path, and they differ in what they do: `-c` adds `''` to sys.path, which is the in-process CWD. `-m` stores the actual value of CWD at invocation, so changes to CWD while the program runs don't influence module resolution
Assuming `foo.bar` is a script inside the package `foo` that you want to both import and run directly (without `python -m ...`), this lets you do so without too much hassle.
> The think that irks me is whether they work changes based on where you've invoked Python from. `./bin/my_script.py` behaves differently from `./my_script.py`.
This.
IMO that's also one of the main issues of Bash – you can't modularize a script unless you make sure your working directory is the directory containing the script. (And good luck with finding out the latter! [0])
Relative imports used to work much more naturally IMHO in python2 but then they broke it in python3 because Guido wanted scripts and modules to always be separate codebases. So, whereas it used to be easy to have a module that could also be run as a script inside a package, this is now very difficult to implement. To the extent that any python2 code that does this, should probably be refactored when being ported to python3.
I want to organize my code logically in directories. As a script grows, I want the ability to spin out parts of that file to separate files.
In order to do that in python, I need separate directories between the script and the spun-out functionality. This ends with a script that says "do function from module" and all code being in the module.
Having code in different directories for no reason except "the import system" sucks. How is this supposed to go?
FWIW, my command-line tools and any support code are all in a subdirectory under my package directory.
I used to include a simple stub Python program (~10 lines) as a "script" in my setup.py that would import the right code and call it.
Then I learned that "entry_points" implemented most of the dispatch behavior I wanted.
I no longer have those scripts, just an entry that basically says "from abc.cli.prog1 import main; main()". The prog1.py, prog2.py, etc. look like normal command-line scripts, assuming the usual:
if __name__ == "__main__":
main()
The major downside is I can't suppress SIGPIPE and SIGINT until main() is called, leaving a wider window where something like ^C gives a unwanted Python stack trace.
So as your script grows, don't you eventually rename it to whatever/__main__.py and run it with -m? seems like a fairly trivial transform that then allows you to spin out as many modules as you need.
Are you sure you're not confusing changes to the import system with changes to the default sys.path (especially when executing a file)?
Because unless I'm missing something, the changes to the import system make this more reliable, not less: in Python 2, implicitly relative imports means `import x` might depend on the file's location. In Python 3, it does not, it depends solely on the setup of the sys.path.
Nice, do you know of anything similarly-comprehensive that is updated for Python 3? IIRC, Python 3 simplified things a good deal by removing "implicit" relative imports, but I'm a little foggy on exactly what that means.
Explicit relative imports use a "." to indicate the file/package is from the current directory and not found elsewhere in sys.path:
from . import foo
from .foo import bar
Implicit relative imports don't have such an indicator:
from foo import bar
In py2, that second one could be a relative import or from anywhere in sys.path, while in py3 those implicit relative imports were removed so that means it'll only only look in sys.path and not the local directory.
> the core language seems to wants all packages installed in /usr.
There's also ~/.local/lib/python3/site-packages (or whatever your distribution made of that). Virtualenvs are only necessary if you want to isolate dependencies between environments. That's useful if you have projects with conflicting dependencies, because Python doesn't allow you to install multiple versions of the same package, for better or worse. However, if you've written some simple scripts that don't care much about the exact version of their dependencies, it's perfectly fine to install their dependencies glboally.
Quick shout-out to nix-shell shebangs, which allow a script to specify exact versions of all dependencies, Python or otherwise, which will be cached into a temporary sandbox.
Earnest question: why are you all trying to use relative imports? What problem is that solving for you? I've never even bothered to try it out because it seems potentially problematic in the way all relative references can be, e.g., relative file paths.
I do use them quite often without much issue, not sure why some people are struggling with it.
Say you have a main package "mylib" with a subpackage "mylib.utils". Typically I like to see "mylib.utils" imports as being in one of (roughly) 4 categories:
- standard imports, that I would put first in the file (e.g. "import logging")
- external imports, that I would put second in the file (e.g. "import requests")
- library local imports, bits and pieces from the "mylib" package that you want to reuse in "mylib.utils", but are external to the current package (e.g. "from mylib.email import client")
- and package local imports, which I see as implementation details of the current subpackage, and should be agnostic from the overall architecture of "mylib" (e.g. "from .helpers import help_function")
The last category are modules that only make sense from within "mylib.utils", should relocate with it even if it is renamed or moved elsewhere, and shouldn't require a change whatever the structure of "mylib" becomes, which is why I would use "mylib.utils" relative imports in there.
People have a script run with python, and want to use code in other files.
This is not supported in python. For reasons beyond my understanding, you are supposed to put the script with python, or with the shebang, in a different directory.
Alternatively, you can always use `python -m` to run your code.
I missed one detail. The files and the script are in the same folder.
I say script here because the apparent intention is that scripts and modules are separate. That is why it's not easy to import functions from a file in the same directory as a script.
This restriction is very unexpected. And it is not borne out well by the fact that it is not obvious this distinction exists.
When you run a script, the directory containing it is on the path. You can just write "from somefile import something". What you can't do is "from .somefile import something" because it is not a package.
When you run a module, the root package containing it is on the path by definition and imports work.
The directory containing the file is added to the path, and the named file is imported as the __main__ module. That's why "if __name__ == '__main__':" works in script files.
I don't think Python will resolve symlinks here, so if that needs to work, you probably need to do it yourself before touching other code / files.
In JS land everything is a relative import. It's nice because you can move a whole directory of code from one project to another (or around in the same project) and it still works because all the imports pointing to other files inside that directory were relative, you only have to fix imports that go outside the directory.
Also the way that JS imports are just relative paths is very nice because it means that the imports are statically determinable, your editor can understand them and fix them automatically and you can trust that refactoring. Python has turing complete imports because there's so much dynamic messing about with sys.path that goes on in Python due to inadequacies of the import system.
> It's nice because you can move a whole directory of code from one project to another (or around in the same project) and it still works
Yes, just like absolute imports...
> relative paths is very nice because it means that the imports are statically determinable, your editor can understand them and fix them automatically and you can trust that refactoring
1) Just like absolute imports?
2) I absolutely contest that relative imports are easier for an IDE to refactor. I've never had VSCode hang while refactoring a Java package; I've had VSCode hang while refactoring "create-react-app" app...
There is no way in general for Python tooling to figure out where an import points. All it can do is guess. It's a very regrettable situation.
Yes in Java it's fine because Java has a build stage. So tooling can figure out where imports point to from your static configuration.
However from the tool's perspective there is nothing better than a relative path. Relative paths require no messing about with configuration files at all to resolve. It's just a path to another file or directory on disk.
When a tool sees
import c from "./a/b/c"
It can resolve it immediately.
So what is the advantage of absolute imports really? If import lines are mainly written and maintained by tooling then shouldn't we pick the representation that is easiest for the tooling? Then we can have more and better tools and the tools will be more reliable.
And it turns out that, relative paths are easy for humans to understand too. The same configuration-free resolution algorithm also works in your head when you are reading code! At least when the language doesn't overcomplicate them too much (JS is guilty of this to a certain extent, although nowhere as bad as Python)
IIRC, it was done for security, to avoid inadvertently picking up a library from an unexpected place. For instance, drop a file named test.py in the same directory as your python 2 script and have "fun" figuring out what went wrong.
> Number two is, in order to do package management, you have to create a fake python installation and bend PYTHONPATH.
Well whatever the language or runtime, you need to tell it how to find its dependencies. Be it with a configuration file, environment variables or parameters.
Your statement is not exactly correct either. You don't need a fake installation _and_ bend PYTHONPATH.
Virtualenv leverages the fact that "../lib/python<version>/site-packages/" relative to the interpreter is a default import path (default value of PYTHONHOME). It doesn't use PYTHONPATH.
> Why couldn't the import system resolve versions, too?
Not sure that would be really great. I prefer to have my dependencies all grouped together in a setup.py rather than scattered in various files at import time.
> Why couldn't the import system resolve versions, too? You could say `import foo >= 1.0` and it would download it to some global cache, and the import the correct versions from the cache.
What do you do about conflicts? Or say you have `import foo >= 1.0` in a file, and `import foo == 2.4`, but the latest version is 2.5, so the first import grabbed the latest version, and you later realize you need 2.4?
Imagine running a report generator for 5 hours, only to have the formatting module require a conflicting version of something and erroring out at run time...
> Number one is that relative imports are weird. My intuition about imports is good enough that I never bothered to learn all the rules explicitly, but sometimes something simple is just not possible and it bites me. I think the case is importing files relative to a script (and not running with python -m ...).
I think the thing to remember about using relative imports is that it requires modules. Using relative imports to import something into a script will fail because the scripts don't belong to modules.
> Number two is, in order to do package management, you have to create a fake python installation and bend PYTHONPATH. Virtualenvs are the canonical way to do it, but to me it feels like a hack - the core language seems to wants all packages installed in /usr. So now I have all these virtualenvs lying around and they are detached from the scripts.
This is a pain. You can abstract this away with with pyenv or virtualenvwrapper, though.
Python import system is by far the worst one I dealt with. Using Setup.py and regular or namespace packages, relative import, having complex sub packages and cross importing, running a script from somewhere inside one of your sub packages, and many more craps like these. Import system must be intuitive and easy to use!
Yeah it really tripped me up as a beginner. I think the hardest part to get used to was that the import syntax actually changes based on how, and from where, you are running your code. So depending on how you call your code, your imports might or might not work. This is ESPECIALLY painful when you are building a distribution. There is no syntax that works for all situations, which seems like it would be pretty important for an import system. I had to bookmark this tab, and still refer to it often.
It lead me to read the source of the Python "types" standard library module, which really does just create a bunch of different Python objects and then use type() to extract their types: https://github.com/python/cpython/blob/v3.9.6/Lib/types.py
Some examples from that file:
async def _ag():
yield
_ag = _ag()
AsyncGeneratorType = type(_ag)
class _C:
def _m(self): pass
MethodType = type(_C()._m)
BuiltinFunctionType = type(len)
BuiltinMethodType = type([].append) # Same as BuiltinFunctionType
I’m always amazed just how tolerant javascript’s import system is when I have circular imports. I guess maybe because it doesn’t care about modules and just cares about specific elements that are being imported/exported.
When I do have a nasty circular dependency Webpack usually does a bad job telling me what’s wrong.
Though I should still treat circular imports as, at the very least, an organization code smell.
Circular imports in JS are fine and not a code smell, for instance in typescript you may have two classes that reference each other's types - obviously this is not a real import from JS's perspective but the point is that you should not have to care whether it is real or not. That's a world we don't want to live in.
Circular imports are only ever a problem when you have code running when the module loads. Then you run into module load ordering issues. So avoid any side effects on module load and make all
setup explicit.
IIRC an explicit design goal was to enable circular deps (hence why imported bindings are considered "live". It's interesting to see this works in practice though, I've never tried using them myself.
Circular imports in JS matter when the imported code is being called immediately at import time. If a file defines functions that later call functions from another file (and vice-versa), but those symbols will be populated before the function actually gets called, there's no problem
Yeah. It’s so flexible. I get frustrated at Python (Django) serializers that legitimately need to depend on each other. And the answer on the forums is to create a near duplicate class.
JavaScript does not provide namespaces by default which allows this.
Python is built upon namespaces and cycles in dependencies, either viewed as graph theory or kSAT introduce that fun NP-complete problem of version hell.
Using `need` in Javascript maintains the directed acyclic graph structure, but if you get fancy you will run into the same problems with circular depends in Python.
Karp's 21 and/or SAT will catch up with you at some point if you don't respect the constraints that make the problem tractable.
Note I am not saying I prefer or like pythons choices...but that they had to make one.
I usually deal with this by defining a cell with the actual contents of the class/module code so that I can just re-execute that cell any time I make changes to it. Then I simply copy/paste all the code back into the module.py file once I'm done tweaking it and playing with it. Thus for me Jupyter sort of operates as almost an IDE of sorts.
> three modules cannot depend on each other in a circular way
What would the purpose of circular modules like this be? You may as well collapse into a single module and the situation would not be any different would it?
What would the purpose of circular modules like this be? You may as well collapse into a single module and the situation would not be any different would it?
What is the purpose of modules? You may as well collapse into a single script and the situation would not be any different, would it?
I'm not being facetious here. The answer to the second is the answer to the first.
A common example might go like this. You have a module for each kind of thing you have in the database. But now if someone loads a Company, they need to get to Employees. And if someone loads Employee they need to get to Accounting for the salary, payments, etc. And Accounting needs to be able to get to Company.
Those are all large and complicated enough that it makes sense to make them modules. But you just created a circular dependency!
The standard solution is to load a base library that loads all kinds of objects so they all can assume that all the others already exist and don't need a circular dependency. But of course someone won't like wasting all that memory for things you don't need and ...
Only if those things not only need to be able to “get to” each other, but also need to know, at compile time, about the concrete implementation of the others.
That's possible to be a real need, but its also something that often happens because of excessive and unnecessary coupling.
The coupling that I described needs to be in the software because it exists in the real world that the software is trying to describe.
However your "compile time" point is important. There is another solution, which is to implement lazy loading of those classes.
So you put your import in the method of each that needs to know the other. This breaks the circular dependency and needs more up front memory. However it can also become a maintenance issue where a forgotten import in one function is masked by a successful import in another, until something changes the call and previously working code mysteriously goes boom.
To me the purpose of models is to help humans manage code. Our brains don't hold much at once, so the more we can forget about in a given circumstance, the easier it is. So I think btilly is correct: the reason I want modules, ignoring things I don't care about, is the same reason I want them to deal reasonably with circular references.
In the example that I gave, the design described will handle complex edge cases such as a part time employee working for multiple companies. And will do so without programmers having to think through the whole system at all points.
Independence of modules has no importance in a codebase that ships as a whole. But modularity does.
In my experience they creep in over time as the system grows. Coupling between parts of the system that was previously unnecessary is added, and the cycles form.
I don't think purposeful circular dependency, but you can end up with circular imports after refactoring for instance.
The common approach to solving this is pulling everything that is used by all the modules into leaf libraries, effectively creating a directed acyclic graph, but this is not obvious nor easy to do the first time.
A Customer has a BillingContact, which references a Person, which has primary Customer.
Boom, circular dependency.
Happens in basically all corporate code bases that grow over the years, with varying path lengths.
Throwing all potentially circular types into one big module isn't a great solution.
(In practice, we tend to rely on run-time imports to make it work. Not really great, but better than throwing several 10k or 100k lines of code into a single module).
Suppose you have two classes, A and B. They are sufficiently complex to merit their own modules.
Suppose you have some method of A which does something special if it gets an instance of B, and vice versa. Now you have a circular import problem; glhf
I generally solve this problem by having a module specifically containing the abstract base classes of each of the classes I will be working with that implements either no or bare minimum functionality for these objects. That way, any other module can import this module and have visibility of every other class I will be working with.
I ran into this when A and B had many derived classes. I wanted to put A and it’s derived classes in one module, and B and it’s derived classes in another. It was messy.
I wound up putting A and B in a single module and having a separate one for the derived classes. Not ideal.
It does sound ideal to me, or at least better than the initial proposal.
A and B both need to know about the other's base definition, neither cares about the details about the other's derived classes. Splitting it into three modules shares as little surface area as possible.
> Suppose you have some method of A which does something special if it gets an instance of B.
While that’s in rare circumstances the right thing to do, it's mostly an anti-pattern—you should be taking an object supporting a protocol, with the behavior difference depending on a field or method of (or actually implemented in a method of) that protocol. If you do that, you don't create a dependency on a concrete class that happens to require the special behavior.
IMHO that's code smell. Modules shouldn't depend on each other, because that creates a web of tangled dependency where you have to understand everything before you can understand one of them. Circular dependency is to modules what goto is to control flow.
Besides, if you are in a "well, fuck it, deadline is tomorrow" mode, you can always do something horrible like:
I think bad code gives raise to more dependencies in general and so circular dependencies.
But the truth is sometimes it has happened to me and the only solution I found was creating an small module with maybe one or two functions which is not exactly ideal.
In my experience, typically a codebase that has grown organically in a different direction than the original design, and where the cost of refactoring is not deemed worth it.
I've never had a circular dependency I couldn't resolve. Just organize your modules into a DAG. I have a pretty standard flow:
- protocols and abcs. Depend only on builtins, this sits at the top of the hierarchy and can be used anywhere
- top-level types/models, data structures. Only depends on external libraries
- config and logging. Depends on pydantic, which handles all init config validation, sets up log level. Many things import this
- specific class implementations, pure functions, transforms. imports from all the above
- dependency-injected stuff
- application logic
- routers and endpoints
I've had some close calls since I type check heavily, but protocols (since 3...6? 3.7 for sure) are a fantastic tool for both structuring and appeasing mypy.
If you have A -> B -> C -> A, then you need all three of A, B and C to be defined if you're going to use any one of those modules. When that's the case, the only thing you're gaining by using modules is the organisation of the file.
Probably half of the commenters here know this, but since we're here, this is my go-to boilerplate for starting a python script. (Probably won't work on Windows.)
#!/bin/sh
# Run the interpreter with -u so that stdout isn't buffered.
"exec" "python3" "-u" "$0" "$@"
import os
import sys
curdir = os.path.dirname(os.path.realpath(sys.argv[0]))
# Add enough .. to point to the top-level project directory.
sys.path.insert(0, '%s/../..' % curdir)
Your main program starts here ...
> # Add enough .. to point to the top-level project directory.
This suggests that there is more than one entry point to the Python project?
While I'm sure there are good reasons for this, and while I'm not criticising your instance of this specifically, as a general point of advice I've found this sort of thing to be a bit of an anti-pattern.
Having one entry that handles things like path setup and other global concerns, before delegating out to subcommands or whatever construct works best makes it much easier to keep the whole codebase aligned in many ways.
Django has a system for this and while it has its flaws, it is nice to have it. Using this, on our main Python codebase of ~400k lines, we have a single manual entry point, plus one server entry point. Coordinating things like configuration, importing, and the application startup process, are therefore essentially non-issues for us for almost all development, even though we have a hundred different operations that a dev can run, each of which could have been a separate tool like this.
For additional bonus points, have your single entry point exhibit a CLI that properly documents everything the developer can do, i.e. what features are available, what environment variables and config flags can be set etc. That way, the code essentially documents itself and you know longer have to keep your README file updated (which people always tend to forget).
I use a very similar flow. The highly-opinionated-yet-effective pattern I use involves pydantic, cleo, and entry_points/console_scripts in setup.py.
- everything is structured as a module
- options and args are stored in their own module for easy reuse
- the whole stack has one cleo.Application, with however many subcommands. Usually of the form "mytool domain verb" e.g. "mytool backend start."
- cleo args/options are parsed into pydantic objects for automatic validation (you could do this with argparse and dataclasses to skip the deps but it's more work)
- each subcommand has a `main(args: CliArgsModel)` which takes that parsed structure and does its thing. This makes it super easy to unit test
I install into a venv with `poetry install` or `pip install -e` for editable installs.
There's no well known module to do most this for you? In perl, the recent canonical way is to use the FindBin module to find the current binary's running dir, and the the local::lib module to set the import path (or just use lib for older style library dirs). That always seemed cumbersome to me at 2-3 lines that weren't very clean looking.
Also, say what you will about Perl and esoteric global variables, but it's kinda nice to be able to toggle buffered output on and off on the fly. Is there really no way to do this in python without re-executing the script like that?
Ya... if you're trying to get the path of the script, you can use `__file__` special variable (instead of loading it from bash $0 and grabbing sys.argv[0]).
For adding current directory to front of path, the sys.path.insert() call is a pretty sound way of doing it.
Yeah, I think you're right - didn't know about __file__.
(To clarify, using "$0" with bash is just standard method to invoke the same script with an interpreter - sys.argv[0] will work with or without bash exec part.)
That's crafty; it reminds me of the suggested similar tactic with tclsh since tcl honors backslashes in comment strings(!): https://wiki.tcl-lang.org/page/exec+magic
This is sometimes what you want, but it will always look at this exact path, and won't play nicely with virtualenv/conda.
#!/usr/bin/env python
This works - it will use Python found in $PATH. Unfortunately you can't add any more parameters to the interpreter.
The contraption I wrote allows adding arbitrary parameters - I was burnt one too many times by Python silently buffering my debug messages, so I use it to always add "-u".
I'm just grateful I'm not a core Python dev after reading this thread. I've never seen so much negativity concentrated in one place in quite some time, for a feature of a programming language which is fairly innocuous.
Python is the language everyone at HN loves to hate. One presumes it has something to do with the fact that it's Y Combinator's recommendation for most startups.
Despite using python for the past 4 years, it still takes me several tries to set up packages and imports correctly when I make them myself. Honestly, I wish that python had an import system similar to JS (where you can just say “I want this file” <insert path to file> and specify the exports yourself). For me, it just feels more intuitive and less “magic”-like when dealing with custom scripts you want to import.
I have seen way too many `ModuleNotFoundError`s. It is moderately infuriating when the two files are in the same directory and python can't find the module.
Honestly that error is misnamed. It should be `ModuleImportRefusedError`.
And the frustration caused by getting PyTest to work in a project is likely responsible for a large percentage of the untested python projects in the world...
Years ago, I thought I'd try learning Python, I'd heard it was supposed to be easy, good for beginners and everything. I read one of those beginner Python type books and followed along with a roguelike tutorial. Everything was going pretty alright, until I started trying to split everything into different files and use imports.
I ended up just giving up. I read programming in lua, and rewrote my entire project in lua and actually finished it.
Some day I'd like to go back and maybe learn Python but I really didn't enjoy my experience with it. I even found C headers easier to figure out than Python imports.
It's on of those things people using Python for so long forget about: Some people try to run individual files and cd around the place. I never do that anymore. I have a test suite and breakpoints and that's it. But before you've learned those tools it feels natural to "run that file there" and then say "oh hey why doesn't it work any more?"
It has some wildly frustratingly unintuitive behaviours in precisely the wrong place for beginners: in between having everything in a single script and building a proper package, especially when you are invoking your script with 'python script.py' as opposed to say 'python -m scripts.script'.
Yeah. Start writing a program in 'myprogram.py' as things grow do the right thing and split a function out to its own file and import it. It doesn't work. Suddenly you need to learn a whole bunch about python modules and the import system and scripts vs modules, and some of the questions you have just literally have no good answer.
I was using Python 2 at the time. Python 3 was still relatively new. Not sure how much difference it makes for your example, but the import systems are different between 2 and 3.
It was probably something I did. The original tutorial I followed had everything in one file and didn't get into anything about imports. I started splitting everything up arbitrarily and started tossing imports into the files that complained about missing dependencies and ended up getting overwhelmed because nothing worked.
I'm sure if I'd taken the time to try and fix it I eventually could have and at this point i've had more experience with a bunch of different languages, so I'm sure it's not as bad as I remember.
I imagine it's one of those cases where if i were to go back and laugh about how stupid I was, but ya know, those first impressions.
No, your impression was right. Reading this blog post made me realize how little I know about the python import system (and I use python daily), and at the same time how little I want to learn it. It is completely unintuitive and probably one of the worst aspects of otherwise beautiful and useful language. Fortunately, sys.path hack works reliably - one can just add that one line and imports work as expected.
I've been programming in Python professionally for more than five years. I consider myself quite a good programmer. I still don't have good grasp on Python's import system. Does anyone else have similar experience?
When in doubt, put some more dots in front of your import statements. Or remove them? Maybe I need an extra __init__.py somewhere? Oh I'm importing from a subfolder, what do I need to do to get that work again? I can't remember.
To be fair, Google's python avoids ~99% of the complexity of Python's import system by making all imports absolute and doing most things through blaze/bazel.
I’ve only been writing Python for about a year, but I’ve found it much harder to grasp how dependency resolution and imports work than other languages I’ve picked up (JVM, Node, Go, C).
Same. Occasionally I will get into some sort of mess, learn how it works under the hood enough to get myself out, and the promptly forget everything.
And I think that's for the best. I'd much rather have a happy path that I stay on than use some sort of dark magic that nobody who comes after me will understand.
I have >10y python exp. Vanilla import system? I grasp it quite well.
- Entrypoints? Getting there.
- Namespace packages? Ehh. Murky
- site-packages/mypackage.pth - I get it but I don't know why sometimes it appears and other times not
- c extensions? Ehhh.
- .so loading? Kinda magic.
- the confluence of editable installs, namespace packages, foo.pth, PYTHONPATH, sys.path, relative imports, entry points, virtualenvs, LD_LIBRARY_PATH, PATH, shell initialization, setup.py vs pyproject.toml? Um yeah that's some heavy wizardry.
Tbf, you don't need the vast majority of that to be effective in python.
The average python programmer does not really need to deal with pythons import system that much (just be aware of how it does its module loading and that you can conditionally do stuff sometimes with __import__ etc). As someone who has messed around with it a lot (dynamically loading/unloading modules, modifying on the fly etc) I would NOT recommend doing that stuff for anything in production.
I sometimes fall into the trap of using pip to install dependencies and then things break after an os update. That is, my python version has changed from 3.8 to 3.9 and my dependencies are sitting in the wrong directory. I never know if I should use pip and requirements.txt or rely on Ubuntu's packaged versions.
I used to have a rule of, never use pip and only use Ubuntu's/Debian's packaged versions. That works pretty well if you're happy with the packaged versions and you don't need unpackaged libraries.
I now have the rule of, only ever use pip inside a venv. If your venv is more than a little bit complex, write a requirements.txt file so you can generate it. So it's something like
Then when your Python version changes, or you get confused about what's installed, or whatever, you can just blow away the entire venv and recreate it:
For me the rule is to always use pipenv locally and pip + requirements.txt (generated by pipenv) for production (in docker container usually). No complaints.
same! I've always been shielded from it with Django's conventions. (the ecosystem I mainly work in). I used a lot of '.' and '..' imports but I think something changed in python3 that made that strategy a lot less forgiving... now I _really_ should read the entirety of this article!
I created this fun hack that taught me a LOT about the import system. Basically it allows you to import anything you want, even specify a version, and it will fetch it from PyPI live. Might be interesting to flesh this out in a way that's deployable.
Basically instead of
import tornado
and hoping and praying that the user has read you README and pip installed the right version of tornado, you can do
from magicimport import magicimport
tornado = magicimport("tornado", version = "4.5")
and you have exactly what you need, as long as there is an internet connection.
Because then you'd still have to install the package.
It's nice to have something that "just works" without having to install it. I like to call it Level 5 autonomous software -- software that can figure out how to run itself with zero complaints.
I actually use this for a lot of personal non-production scripts, I can just clone the script on any system and just run it, it will figure itself out.
Also, packages with version dependencies fuck up other packages with other version dependencies, unless you set up virtualenvs or dockers or condas for them, and those take additional steps.
magicimport.py uses virtualenv behind the scenes, so when it imports tornado 4.5 because some script wants 4.5, it won't mess up the 6.0 that's already on your system. In some sense it just automates the virtualenv and pip install process on-the-fly, to give the effect that the script "just works".
There's a very cool (and succinct) blog post[1] showing how to abuse this in an interesting way where you can put the code of a module into a string and load it that way.
Not something I'd use in production, but it's a very clear way to see how both "finding a module" and "loading a module" works under the covers.
Edit: As an aside, I much prefer the way Perl does things in this space. It's much easier to define multiple packages either within a single file or across different files, and much clearer about what's happening underneath.
But despite being terrible the Python import system is remarkably easy to get started in. And generally easy for beginners to work with (put all the code in the same folder or "pip install").
There are some lessons here that other languages would do well to learn. Trouble importing 3rd party libraries must be a kiss of death for beginner engagement.
I disagree. I find it hard to get started in python. There are loads of package managers, so I don't know which one to pick. There are multiple different rules for for how imports work with local files. The standard library is full of of functions that one should not use any more, and you need to know which is which. Defining a main entrypoint consists of checking a magic __NAME__ constant.
You have internalised all these quirks, and know how to work with/around them. Beginners haven't.
Start with pip, the official package manager. You don't have to start with virtual envs: I didn't.
I haven't ran into any function which I "should not use".
You don't need an entry point, you can just write code in a file. Besides that, I don't see how much it differs from other languages with implicit entry points, where you need to match a certain name in your function.
Imports: yes, those are annoying and confusing. I still struggle with those.
Aditionally: package managers and virtual envs are a pain in the ass. Every year we're getting a new one which is supposed to solve the problems from the previous one, but doesn't, and the cycle goes on. The language should really solve this at the core instead of requiring community fixing, as it is a core part of any serious development.
Honestly I have had very few issues with poetry, and the ones I've had, it's because I'm trying to use plugins, which are alpha right now.
Dependency resolution just works. Editable installs just work. Building just works.
Before that, I only had problems with virtual envs once, and it was due to bad hygiene with system python libs and deps. Moral of the story: don't. Unless it's necessary to bootstrap virtualenv or compiled libs, don't system install python. Good ol' get-pip.py and virtualenv.
I disagree too... you can't even refer to a file in the same folder without using some magic (adding current folder to the path), which is a huge barrier when starting.
I vividly remember the frustration when trying to make some very simple application imports working. There's a couple of gotchas, the biggest one being that you need to write imports based on where you expect to run your applications from. Perhaps obvious for experienced python devs, very much surprising for newcomers.
And then I was quite shocked by the state of package managers in python. You need to learn pip, venv (with references to "virtualenv"), these are too low level, so you find pipenv, which is unbelievably slow (installing a single dependency can take 10 minutes), so you need to learn to use it with "--skip-lock", but then you lose locking ...
I've never appreciated node's bundled "npm" so much before which mostly "just works".
Poetry is pretty great comparatively, since it handles both dependencies, locking and virtual environments for you, but it has slow resolution just like pip (since the new update), pyenv, pipenv, etc.
No. Python is a language I have to use infrequently, but I give up half of the time using a project found on GitHub because of missing depencies, the need to install a package manager to install another package manager to install dependencies, etc. The other day I spent some times fixing a docker image that was working fine few months ago but which was now failing because some Python package install returned an error.
On the contrary, C projects tends to build with 3 commands and C# (often way bigger) with a single command, and without having to do magic things around "virtual environments".
One of the reasons I gradually fell out of love with Python. To get Python right you need to remember more protocol than Queen Victoria's master of tea. And it is truly protocol in the sense that there is always this arbitrariness hanging around it.
Hi! I'm the author of this article. Thanks for posting it.
I've been programming in Python for quite a while but didn't really understand how the import system works: what modules and packages are exactly; what relative imports are relative to; what's in sys.path and so on. My goal with this post was to answer all these questions.
It's certainly a lot better than Ruby's require which just executes code and alters global virtual machine state. Not too different from C's #include.
My favorite is Javascript's. Modules are objects containing functions and data, require returns such objects. Simple, elegant. Completely reifies the behind-the-scenes complexity of Python's import, making it easily understandable.
I have no idea why they added a module system to the Javascript language itself. The old one was so awesome. Not sure what advantages the new system offers.
One of the simplest import systems I've seen were in q/kdb (like with most things in that language, everything is as simple as possible)
Imports work by simply calling `\l my_script.q` which is similar to simply taking the file `my_script.q` and running it line by line (iirc, it does so eagerly, so it reruns the entire file whenever you do `\l my_script.q`, even if the file has been loaded before, which may affect you state. By contrast, Python `import` statements are no-ops if the module has already been imported).
The main disadvantage is that you risk your imported script overwriting your global variables. This is solved by following the strong (unenforced) conversion that scripts should only affect their own namespaces (which works by having the script write declare \d .my_namespace at the top of the script)
I never found this system limiting and always appreciated its simplicity - whenever things go wrong debugging is fairly easy.
What does Python gain by having a more sophisticated import system?
Python avoids having to rerun the import over and over again.
Suppose that you are importing a larger project. Where your one import (say of your standard database interface) pulls in a group of modules (say one for each table in your database), all of which import a couple of base libraries (to make your database objects provide a common interface) that themselves pull in common Python modules (like psychopg2) which themselves do further imports of standard Python modules.
The combinatorial explosion of ways to get to those standard Python modules would make the redundant work of parsing them into a significant time commitment without a caching scheme.
From the point of view of a small script, this kind of design may seem like overkill. But in a large project, it is both reasonable and common.
Python imports aren't necessarily always importing Python source code -- they can be pyc (bytecode) files, or C API extensions, etc.
These are slightly more complicated than "load the script at this path".
There's probably a more detailed answer, in that historically, decisions were made that we're now stuck with. Python packages can and sometimes intentionally have import-time side effects, for example. They must be only run once, without relying on convention, or we break existing code.
Everything is an object, and it “just works” for most purposes. (Not nearly as many as I would like, but still; it has backwards-compatibility to consider, so I'll cut it some slack.)
If you need to start installing your own user packages, you need `pip` and then `venv` and then things get ugly, but for the usual case where the sysadmin deals with all that (or you're on Windows), it works quite well.
My least favorite is that import ordering matters in some situations. Like if I just run "organize imports", all of a sudden dependency cycles pop up and everything is broken. Certainly a sign of things being misimported/misorganized, but stuff happens has systems grow fast. And solving these issues is always incredibly time consuming.
Unless anyone knows of magical tools to help solve import issues?
Yes, it is unfortunate. Loading modules can have side effects as the loaded module is allowed to execute arbitrary code at load time. This is also a source of ordering issues.
Maybe some think this is only a theoretical problem and doesn't happen with "well-written" libraries. Well, here is one example which bit me in the past: https://stackoverflow.com/a/4706614/767442
The breakage described can only occur if there are dependency cycles, so topological sorting can't fix it. If there are no cycles, then the order of imports doesn't matter.
edit:
Actually I'm not even sure what kind of error we're talking about here. If two modules import each other and they both need access to the other's contents upon initialization, there is no ordering that will work. And if at most one needs access to the other, it will always work, no matter in which order they are imported. So I don't really know what the OP was talking about.
One interesting use case of overwriting `builtins.__import__` I've encountered was the automatic hooking by ClearML [0] (experiment tracking, ...) into all sorts of common libraries like Matplotlib, Tensorflow, Pytorch, and friends.
The implementation is surprisingly straightforward, once you've come to terms with the basic idea, see [1] and the rest of the `clearml.binding` package.
I use this for my Python-based Lisp: https://github.com/shawwn/pymen/blob/ml/importer.py
checks for "foo.l", and if found, compiles it to foo.py on the fly and imports that instead.It's so cursed. I love it.