A Heisenbug lurking in async Python

613 points by willm 2 years ago on hackernews | 201 comments

dataflow | 2 years ago

The notion of fire-and-forget is itself the problem. Even with threads, you should have them join the main thread before the program exits. Which implies you should hold strong references to them until then. Most people don't go out of their way to do this even when they're able to, but that's what you're supposed to do.

SamReidHughes | 2 years ago

I came here to write this comment. Also, you usually need to have some means of canceling the task -- otherwise you have to wait for them to finish, or you leak these stray lost tasks that are doing stuff, like manipulating the state of things.

This. Even if you hold a reference to the task, your program very likely has a bug. At some point you should always await it to see if it failed or not.

It's easy to miss this if you observe completion via a side-channel, for example item removed from a queue. But this is also a bad way to write tasks in the first place, let them return meaningful data rather than mutate shared objects. That way you are forced to await them and your code becomes much more straightforward. It's counter-intuitive at first if you think in threads, because there you are more used to worker pools and such, whereas asyncio tasks can be written in a more linear way and don't background workers to the same extent.

After having done this mistake several times, I've concluded one should almost never use create_task. It's much better to place them into a top-level list of background tasks, that is always awaited, using this method they are both started automatically and always awaited for errors appropriately.

dataflow | 2 years ago

> At some point you should always await it to see if it failed or not.

What you're sayin is correct, but doesn't quite imply what I'm saying. I'm saying everything that you spawn asynchronously (be they threads, tasks, whatever) needs to be joined - even if they're no-ops whose success or failure is irrelevant. This is similar to how you should always make sure to deallocate memory that you dynamically allocate whenever you can, as a matter of good practice and good hygiene. Sometimes you can get away with not doing so, but you shouldn't really skip it unless you don't have a choice, as it makes the program logic clearer and can make the program more robust too. (e.g., imagine running your main() in a loop where threads are spawned each time but never guaranteed to join.)

sidlls | 2 years ago

Conveniences like this library and other threading libraries make it easy for people to trivialize something (concurrent programming) that ought not be.

fernandotakai | 2 years ago

one of the reasons i was never "bitten" by this bug: whenever i use python tasks, i save them in a collection so i can cancel all of them when the program is ready to quit.

it makes cleaning up after myself a LOT easier.

bornfreddy | 2 years ago

Wow. What a strange design decision, as evidenced by sheer number of developers who don't / didn't know about this (myself included). I hope this gets fixed instead of just documented.

jcheng | 2 years ago

Agreed, I’m really surprised at all the comments defending this behavior. I suspect there is a non-obvious reason why it’s this way, but “you should’ve read the docs” and “but why wouldn’t you hold your own strong reference” are weird takes IMHO.

hitekker | 2 years ago

Bugs stemming from the architecture of a poorly specified system become insecurities for the people who rely on that system.

One of the major reasons why Python’s leadership refused to optimize python’s performance, besides Guido’s intransigence, was because they treated the CPython implementation as the specifications.

Legions of script kiddies built their programming identities around the belief that python must be slow, because to admit otherwise would require changing the system.

remram | 2 years ago

Not just that, it is also annoying to work around. Sometimes you really do want to kick off a background task and have it run to completion, even if you can't keep a handle on it, just like a thread. This is surprisingly hard to do.

You can always do something extremely goofy like this (but you shouldn't):

    _running_tasks = set()

    def my_create_task(coro, **kwargs):
        async def _coro():
            nonlocal task
            try:
                return await coro
            finally:
                _running_tasks.remove(task)

        task = asyncio.create_task(_coro(), **kwargs)
        _running_tasks.add(task)
        return task

boomskats | 2 years ago

As someone who happens to be eternally grateful to the author for his contribution to the Python ecosystem [0], I kinda feel like this comment thread is overreacting to his overreaction. When I look at this post all I see is a useful, well explained, byte-size writeup that a search engine might recommend to someone looking for help in writing async Python.

Maybe it's because a bunch of my friends are Scottish and I get their sense of humour.

[0]: https://rich.readthedocs.io/ (yes I'm talking about the fancy new progress bar that pip got recently)

quietbritishjim | 2 years ago

I do wish he dwelt on task groups a bit more at the end. Many comments here seemed too have missed that bit. They're not just a handy way of executing a hack. Instead, they're a revolutionary way (ok maybe that's a but string but not much) to structure your async program to avoid a whole host of bugs.

A code snippet would have been nice, or a link to the blog post that introduced them (in trio, another async library): https://vorpus.org/blog/notes-on-structured-concurrency-or-g...

ravloony | 2 years ago

I also came here to say this. Structured concurrency gets rid of these types of problems.

quietbritishjim | 2 years ago

*bit strong

To me, the surprise here is that usually you don’t expect Python finalizers to do something like this: when they dispose something, it’s usually unobservable from the perspective of the program, e.g. an unreachable file descriptor. Here, the runtime is disposing something that is still observably in progress, which is surprising behavior.

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

This issue doesn't exist with Trio's structured concurrency model. In other words, the problem is already solved.

nbadg | 2 years ago

I'll +1 the Trio shoutout [1], but it's worth emphasizing that the core concept of Trio (nurseries) now exists in the stdlib in the form of task groups [2]. The article mentions this very briefly, but it's easy to miss, and I wouldn't describe it as a solution to this bug, anyways. Rather, it's more of a different way of writing multitasking code, which happens to make this class of bug impossible.

[1] https://github.com/python-trio/trio

[2] https://docs.python.org/3/library/asyncio-task.html#task-gro...

Tanjreeve | 2 years ago

Oh good so now we can all move to this years Async flavour in Python.

mhils | 2 years ago

The note in the official Python documentation was only added in September 2022 [1], so no wonder this comes as a surprise to many!

[1] https://github.com/python/cpython/commit/6281affee6423296893...

edfletcher_t137 | 2 years ago

This is a great blog post. Concise, lacking fluff or extraneous prose, it gets right to the point, presents the primary-source reference and then gets right to the solution. A bit of editorializing in the middle but that's completely allowed when writing this tightly. Well damn done, OP.

And also it's great information that I - like I'm sure many of you - also never noticed. THANK YOU!

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

What does this add this isn't already right there in the documentation?

nkrisc | 2 years ago

If there was nothing to add then there wouldn’t be loads of projects on GitHub making exactly this mistake.

Jtsummers | 2 years ago

It draws attention to a problem that a lot of people have created for themselves by not reading the documentation (or not recalling it if they read it). I guess the author could have just linked the documentation but then they couldn't have added the additional context of the github search demonstrating how common it is.

newaccount74 | 2 years ago

I must have looked through the docs for create_task a dozen times while trying to figure out how async/await works in Python but still managed to overlook this part.

edflsafoiewq | 2 years ago

That is unsurprising. It was first added as a brief note only in 3.9, and expanded to its present length only in 3.10.

wnolens | 2 years ago

Same.

klyrs | 2 years ago

The author doesn't go into much detail on that point: this warning should be present in documentation of many Python libraries that use create_task and return the result to the user unless that library stores those tasks in a collection as is recommended -- at which point the library author had better roll their own garbage collection!

isoprophlex | 2 years ago

Well, I don't know, I kinda miss the human angle. I'd have loved to first read six paragraphs about how the author's grandmother raised them on home grown threads and greenlets :^)

nickjj | 2 years ago

> I'd have loved to first read six paragraphs about how the author's grandmother raised them on home grown threads and greenlets.

With recipes, often times your problem is you want to learn how to make something where having the steps listed out is the most important thing. The story behind the recipe isn't important to solve your problem but for tech the story around the choice is important. Often times the "why" is really important and I really like hearing about what led someone to use something first. Often times that's more important or equally as important as the implementation details.

It wouldn't make sense for this post given its title but if someone were making a post about why they chose to use async in Python I'd expect and hope that half of the post goes into the gory details of how they tried alternatives and what their shortcomings were for their specific use cases. That would help me as the reader generalize their post to my specific use cases and see if it applies.

bialpio | 2 years ago

Off-topic but the life story is there to make them eligible to be protected by copyright. IANAL.

Source: https://copyrightalliance.org/are-recipes-cookbooks-protecte...

iudqnolq | 2 years ago

For some reason whenever this comes up there'll be one person saying "I bet you didn't know it's for copyright" and another saying "I bet you didn't know it's for SEO". I've yet to see either prove anything beyond that it's a plausible explanation that could fit the minimal known facts.

jackthetab | 2 years ago

I can't find the reference atm, but Jeff Jarvis (https://en.wikipedia.org/wiki/Jeff_Jarvis) says that this _is_ to get around copyright law; the same technique was being done over a hundred years ago (possibly more, that's why I want to reference!). Instead of blogs, think pamphlets.

The More You Know...

gigatexal | 2 years ago

Really? Hmm. Had no idea

flandish | 2 years ago

Interesting. I always thought it was search engine optimization.

aidenn0 | 2 years ago

SEO is definitely a big part of it; Google penalized pages where people closed or navigated away quickly.

fbdab103 | 2 years ago

I immediately bounce from those Stackoverflow clones that keep appearing up at the top of searches. So, I am wondering how much this is still weighted in the scores.

gdprrrr | 2 years ago

jonas21 | 2 years ago

You might. But many people don't. They just want an answer and don't care if it's a clone or not.

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

kevin_thibedeau | 2 years ago

Leaked Google code:

  if(bounced && hosts_doubleclick_ads) pagerank++;

rmbyrro | 2 years ago

SEO makes total sense. I always add grandma keywords when I'm searching for Python stuff on Google.

Like: "grandma, how the hell have I still not memorized the API and keep needing to resort to the same doc pages again and again?"

Now I trained ChatGPT with grandma letters from when I was young, so it will answer just like if it was my grandma.

water-your-self | 2 years ago

Its engagement optimization. Adsense pays more if you spend more time on the page

yunohn | 2 years ago

When is the last time you heard of online recipe blogs enforcing copyright claims on other blogspam? Ridiculous.

The real reason is simple, people who write recipes aren’t robots - they’re expressing their stories and emotions, while explaining how to make food that’s dear to them..

fsckboy | 2 years ago

>people who write recipes aren’t robots - they’re expressing their stories and emotions, while explaining how to make food that’s dear to them..

the people who write recipes aren't robots, they're narcissists and various forms of insecure and seek validation in the form of attention and adulation from others. That's not a bad thing, it's all too human and we should embrace, not stigmatize, the needy, but if all you want is a recipe rather than to be an acolyte it can seem like a big ask.

You enjoyed time with your grandparents, and you remember it? Welcome the club! and I remember family as much more complicated than simply being all fun, and I feel like you might be Norman Rockwelling a bit.

yunohn | 2 years ago

> they're narcissists and various forms of insecure and seek validation in the form of attention and adulation from others

This is incredibly insensitive and judgemental. Not sure what I expected from HN, I guess...

Why are these "narcissistic" people obliged to provide you with formulaic recipes for free? If the cost is perusing over their feel-good story, I feel it's a fair trade-off.

afiori | 2 years ago

The theory that recipes are written to make you scroll at least a full screen to show more ads seems much more plausible

rendaw | 2 years ago

For recipes, it also signals effort and provides a hint about quality. There's a lot of low effort, broken, "would anyone enjoy eating this?" recipes out there dumped on recipe sites. A few pages of text says that the author thought the recipe was at least worth that amount of effort, and usually confirms that the author thinks the recipe is at least as good as X other recipes, etc.

throwaway81523 | 2 years ago

There's a similar thing in tkinter but I guess users discover it faster, since the failure if you don't save the reference shows up fairly quickly.

Lammy | 2 years ago

I experienced a heisenbug exactly like this in Ruby when trying to `while case Ractor::receive`: https://github.com/okeeblow/DistorteD/blob/dd2a99285072982d3...

sidlls | 2 years ago

A better title would be “Bug lurks in incorrect usage of async Python”.

The library documentation clearly calls this out, and incorrect implementations, while buggy, do not mean that async Python is itself buggy.

perlgeek | 2 years ago

Thank you! I just did a quick `git grep` in a work code base and found one clear instance of this bug, and two more locations where I'm not 100% certain whether references are kept around long enough. Made a note to open a bug on Monday :-)

Another surprise in python's base library: I knew that re.search searches for a regex match in a string, so I thought that re.match would match the whole string. I was wrong, re.match only anchors the regex at the start, not the end. re.fullmatch anchors it at both sides.

I felt very stupid when I found out; I started at my current work as a Perl developer and learned Python for a new job; but there are two more Python developers (with previous Python experience outside this company) on the same project, and none of them noticed the mistakes I made based on this misunderstanding.

zzzeek | 2 years ago

I think asyncio is kind of neat for what it's good at, but beginner programmers who have never wrote code before are going directly to using Python asyncio (i know this because they are telling me so when they post sqlalchemy discussions). This is just wrong.

slewis | 2 years ago

The response here makes me think most commenters don’t have experience with this particular footgun.

To clarify: Python can gc your task before it starts, or during its execution if you don’t hold a ref to it yourself.

I can’t think of any scenario in which you’d want that behavior, and it is very surprising as a user.

Python should hold a ref to tasks until they are complete to prevent this. This also “feels right”, in that if I submit a task to a loop, I’d think the loop would have a ref to the task!

It’d be interesting to dig up the internal dev discussions to see why the “fix this” camp hasn’t won.

kevincox | 2 years ago

I can see this behaviour being useful if you are no longer interested in the result of a "pure" task. For example imagine fetching some data via HTTP. If you no longer need to response canceling the request could make sense.

But I agree that this is unexpected and most code probably isn't ready for being cancelled at random points. (Although I guess in Python your code should be exception safe which is often similar)

BerislavLopac | 2 years ago

A possible solution might be a built-in dunder flag to explicitly tell the interpreter not to get rid of a specific object. Something like __keep_alive__ or similar.

hgomersall | 2 years ago

That would make tidying up rogue tasks impossible. Of course we all like to think we do cancellation perfectly, but it's nice to know that the task scheduler has your back.

Edit: I don't quite understand why a user would expect a task to remain live _after_ the last reference to it has been dropped...

syngrog66 | 2 years ago

Java had a similar but inverse problem in early versions. A counter-intuitive behavior that bit people and caused leaks.

If you instantiated a Thread, and then start() was never called, that thread object would leak. And thus potentially an entire graph of objects, via the references chains beneath it.

Obviously a thread that is never started seems pointless by design. But it could happen easily if, for an example, an error happened or an exception was thrown at some point between the instantiation line and the call of start().

The root cause was because Sun's programmers had made the early implementations of Thread get added to a ThreadGroup by default, under the hood. What would happen is that ThreadGroup stayed alive/reachable and thus it kept your app's thread object reachable too, and thus the GC would never clean it up. It was never eligible.

It ended up being the cause of a few weird leaks we saw in production.

IIRC in Java 1.4 or 1.5 Sun fixed it by ensuring the thread got cleaned up in those cases.

qwertox | 2 years ago

I'm using a lot of `asyncio.get_event_loop().create_task(...)` calls without assigning the task to a variable, but the docs [0] don't mention anything regarding this method on the loop object. Can I assume that I'm safe?

[0] https://docs.python.org/3/library/asyncio-eventloop.html?hig...

quietbritishjim | 2 years ago

I'm pretty sure that's equivalent to asyncio.create_task(), it just gives you the opportunity to specify a different loop if needed. I think the docs are just less explicit because they're aimed more at power users (since most people don't deal with multiple event loops).

samwillis | 2 years ago

This is one of many reasons I'm sceptical of the current trend in Python to "async all the things". The nuance to how it operates is often opaque to the developer, particularly those less experienced.

GUI toolkits (like Textual) however are a really good use case for Asyncio. Human interaction with a program is inherently asynchronous, using async/await so that you can more cleanly specify your control flow is so much better than complicated callbacks. Using async/await in front end JS code for example is a delight.

Where I'm particularly unconvinced of their use is in server side view and api end point processing. The majority of the time you have maybe a couple of IO opps that depend on each other. There is often little than can be parallelised (within a request) and so there are few performance gains to be a made. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug.

There are always places where it's useful though, things such as long running requests (websockets, long polling), or those very rare occurrences where you do have many easily parallelizable IO opps within one short request.

heavyset_go | 2 years ago

> Where I'm particularly unconvinced of their use is in server side view and api end point processing. The majority of the time you have maybe a couple of IO opps that depend on each other. There is often little than can be parallelised (within a request) and so there are few performance gains to be a made. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug. Traditional synchronous imperative code run with a multithreaded server is proven, scalable and much easier to debug.

Python doesn't have multithreading that scales or supports real parallelism. asyncio has very measurable performance benefits for exactly that use case you've mentioned versus threaded servers.

zzzeek | 2 years ago

Sorry that's not accurate. Asyncio and threading offer the same variety of "parallelism" , which is that both can wait on multiple io streams at once (the gil is released waiting on io). Neither offer CPU parallelism, unless lots of your CPU work is in native extensions that release the gil. In that unusual case, threading would offer parallelism where asyncio wouldn't.

Asyncio's single advantage is you can wait on lots of io streams, like many thousands, very cheaply without having to roll non blocking IO queueing code directly.

heavyset_go | 2 years ago

I didn't say that asyncio offered parallelism, I'm pointing out that normal assumptions about multithreading you'd make with other languages don't always apply to Python. You'd typically assume that threads offer parallelism, a property you might choose to use them for over something like single-threaded asyncio.

I've found that for even IO bound workloads, the amount of throughput plateaus when using a relatively small amount of threads despite the GIL being released on IO.

zzzeek | 2 years ago

sorry for misreading that from your post! my own benching with threads vs. asyncio has never found any performance difference between the two approaches (asyncio slightly slower). if you need very wide throughput, then yes asyncio is better. otherwise, it's very difficult to create equitable comparisons between threaded and asyncio code.

Topgamer7 | 2 years ago

These days with graphql, or complex microservices architectures, you could have multiple hops to fulfil l the original request.

Flask sync will hold that thread hostage until the request is done. Where async with properly used async libs will allow other requests to process.

We often have medium sized reports take seconds. That is a lot of time to wait. And would just end up bloating your service scaling to handle more connections.

Any service with decently long lived network requests will benefit from event loop handled scheduling.

jsyolo | 2 years ago

I don't think he was referring to python when he mentioned server-side, but to real multi-threaded runtimes.

traverseda | 2 years ago

>Where I'm particularly unconvinced of their use is in server side view and api end point processing.

Sure, performance isn't going to get better, but for websockets and server sent events the occasional long-lived async task can be great. Especially when you need to poll something, or check in on a subprocess.

nbadg | 2 years ago

The thing is, there's a lot more nuance to it than this. Async/await is part of the language syntax in python, but asyncio is only one particular implementation of an event loop framework to power it. But really what async/await provides is a general-purpose cooperative multitasking syntax. This allows other libraries to implement their own event loop frameworks, each with their own different semantics and considerations (the two best-known alternatives being Curio and Trio). At a language level, there's nothing even forcing you to use async/await for ascync IO -- you could, if you really wanted, probably write a library that used it to start threads and await their completion.

So you have, from highest-level to lowest-level: application code, async/await language syntax, the event loop framework, and then the implementation of the event loop itself. The OP article concerns a peculiar implementation detail in the lowest level that makes it very easy to write bugs at the highest level.

But that means that even if you do "async all the things", you'll only encounter this situation if you write your application code in a particular way. It just so happens that "in a particular way" is, in this case, the overwhelming majority of how people write it, which is, of course, why the OP article is relevant.

heavyset_go | 2 years ago

> The OP article concerns a peculiar implementation detail in the lowest level that makes it very easy to write bugs at the highest level.

Are other async implementations using the asyncio.Task abstraction? I haven't looked into it, but I assumed that asyncio.Task was tied to the asyncio implementation and event loop.

nbadg | 2 years ago

asyncio.Task is part of the asyncio event loop framework. So any event loop implementation that conforms to that framework will have to have one (including the default event loop implementation that ships with asyncio). So for example, uvloop, which is an alternative event loop implementation that works with the asyncio event loop framework, also uses asyncio Tasks.

Other event loop frameworks can do whatever they want, and presumably, wouldn't be importing from asyncio. Whether they have a similar abstraction is completely up to the framework itself. Trio, for example, doesn't have a concept of a task object at all, because it enforces a strict tree structure for tasks.

LtWorf | 2 years ago

> This allows other libraries to implement their own event loop frameworks

At work someone replaced the default library with another faster implementation.

Then the unix socket listener task was not working.

A few hours of git bisect later, I found out the offending commit was the 1 line switching the event loop. Seems the fast implementation didn't implement unix sockets and just had "pass" in the function.

pdonis | 2 years ago

> GUI toolkits (like Textual) however are a really good use case for Asyncio.

Only if the GUI toolkit is explicitly written to be asyncio-aware and use asyncio's event loop. Textual appears to be written specifically to do that.

However, other GUI toolkits that I'm aware of that have Python bindings aren't written that way. Qt, for example, uses its own event loop, and if you want anything other than a GUI event to be fed into Qt's event loop so your event-driven code can process it, you have to do that by hand and make sure it works. There is no point in even trying to use another event loop, such as Python's asyncio event loop, since that loop will never run while Qt's event loop is running.

samsquire | 2 years ago

I am a huge fan of parallel and async code. I spend a lot of time researching it and trying to design software that is easily parallelisable.

Many GUIs use the event/message pump pattern, such as Windows 32 API. Qt does something with its event loop (QEventLoop)

Threads are a rather low level instrument to get background tasks going because the interface between the main thread and the threads is rather omitted.

In Java you could use a ConcurrentLinkedQueue. And in Python you can use JoinableQueue.

I am heavily interested in this space because I want to write understandable software that anybody can pick up and work with. I worked on a JMS log viewer that used threads but would crash with ConcurrentModificationException due to not being thread safe. I changed it to be thread safe but its performance dropped through the floor. In my learnings since then I should hast sharded each JMS connection topic to its own thread or multiplexed multiple JMS topics per thread and loop over them. The main thread can interrogate the thread with a lock, that should be faster than every thread trying to acquire the lock. It would be driven by the main thread but the work is done in the background. The threads can keep the fetched messages in memory until the main thread is ready for them.

I think with the right abstraction, thread safety can be achieved and concurrency shouldn't be something to be afraid of. It is very difficult and challenging working at the low levels of concurrency such as a concurrent browser engine. (I've not done that though.)

This is why languages such as Pony lang, Inko, Cyber and Erlang, Elixir are so promising. We can build high performance systems that parallelise.

Writing an async/await pipeline that looks synchronous is far easier to understand and maintain than nested callbacks. So I can see where async is useful. I just hope we can design async software to be simpler to maintain and extend.

whoopdeepoo | 2 years ago

I don't write any colored function code in python, I'd much rather work with process/thread pools

hgomersall | 2 years ago

I really don't get the whole coloured function thing. How's it not just the function signature? You might as well claim a new function argument makes a new colour. Granted, all of my use of async is in Rust in which the compiler picks this stuff up, so maybe in python there are other concerns I'm missing.

Animats | 2 years ago

Me too, but threading is botched in Python. Not just the Global Interpreter Lock. Some Python packages are not thread-safe, and it's not documented which ones are not. Years ago I discovered that CPickle was not thread safe, and that wasn't considered a problem.

jsyolo | 2 years ago

Since the GIL is there and it restricts multithreaded issues to happen, why would python packages be thread-safe?

Jorge1o1 | 2 years ago

You can still have thread safety issues with the GIL in place, because globals and other data is shared between threads.

For example, you can put a dictionary at the module level, thread A can set a key in that dictionary like “name”, thread B can overwrite it, and then thread A comes back, does dct[“name”] and gets an unexpected answer.

This is a relatively easy mistake to make, a lot of python code has module level variables.

michael_j_x | 2 years ago

I am not sure I agree that the GUI is a good use case for async. A human interaction with the program must almost always pre-empt whatever the program was running, so I can not see how a cooperative multi-threading runtime like async Python can work in such a scenario.

kodablah | 2 years ago

It is for this reason in Temporal Python[0], where we wrote a custom durable asyncio event loop, that we maintain strong references to tasks that are created in workflows. This wouldn't be hard for other event loop implementations to do too.

0 - https://github.com/temporalio/sdk-python

make3 | 2 years ago

he never said it was hard, his point is that it's unintuitive & a lot of people don't know or don't remember

kodablah | 2 years ago

I mean the default asyncio event loop can be replaced/extended where you won't have to know/remember on each create_task. But yes, it is an unintuitive default.

NelsonMinar | 2 years ago

Does anyone understand why the event loop only keeps weak references to tasks? It'd seem wise to do something to stop it from being garbage collected while running, maybe also while waiting to run.

coopsmoss | 2 years ago

I agree, I think this is very unpythonic behavior

masklinn | 2 years ago

Only guess I’d have is to protect the system against infinite-loop tasks, but I don’t remember any other runtime caring and an a task which never terminates seems easier to diagnose than one which disappears on you.

kortex | 2 years ago

Because it's almost always the case that the consumer is going to keep a reference to the task in some way, so that is the logical choice for the "primary owner" of the task. Python doesn't have ownership per se like rust, but if you keep more than one hard reference to an object around, it'll prevent collection, so in cases such as this it makes sense to designate one primary owner and have all other references be weakref.

skitter | 2 years ago

> if you keep more than one hard reference to an object around, it'll prevent collection

Which is the behavior the parent comment asks for.

kevin_thibedeau | 2 years ago

That creates a new problem that you have to remember to kill unwanted threads.

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

JonChesterfield | 2 years ago

Python's reference counted - if the event loop holds a reference until the task has run, then drops it, then everything behaves sanely. That's not a cycle. It just means the task that was scheduled will execute, which seems like the right default.

anthomtb | 2 years ago

Well, looks like I know what I am doing first thing on Monday. I converted a bunch of code to asyncio a while back. I have yet to run into any heisenbug in that code and want to keep it that way.

cpburns2009 | 2 years ago

I've been working on a PySide6 application recently using asyncio. I read the docs but totally overlooked the requirement to hold references to tasks created with `create_task()`.

dehrmann | 2 years ago

Eww. What's especially nasty is this is the opposite behavior of threads.

aeturnum | 2 years ago

I really think this writer doth protest too much.

Yes, the base async interface is confusing and overly complex. It's a downside! As they note lots of people have stepped in to provide better helpers (like TaskGroups) - but these are the docs for the base library!

> But who reads all the docs? And who has perfect recall if they do?

Everyone reads the docs? That is why you don't need perfect recall because you can read them whenever you want.

Python has lots of confusing corner cases ("" is truthy, you need to remember to call copy [or maybe deepcopy!] sometimes, all the other situations where you confuse weak v.s. strong references). They cause really common bugs. It's just a hazard of the language in general and the choices it makes (much like tasks being objects is a hazard). I do understand why people think they can throw away task references (based on other languages) - but this is Python! The garbage collector exists and you gotta check if you own the object or something else does.

Edit: this feels like an experienced Python developer, who has already internalized all the older, non-async Python weirdness, being taken aback by weirdness they didn't expect. Like, I feel you, it does suck - but it's not a bug that values you don't retain may get garbage collected.

He didn't even have to read "all the docs" - just the ones that pertain the the function that he is using. And then not ignore the section marked "Important" and the highlighted "Note".

richbell | 2 years ago

What if he read the docs for that function prior to the "important" note being added?

Karunamon | 2 years ago

>Everyone reads the docs?

The author goes on to say they found this pattern lurking in various projects on github. So, no. The problem is that this behavior is subtle, not intuitive, and unless you are reading the actual documentation top to bottom (and not just the function signature and first paragraph from the pop up in your IDE) you will likely get bitten by this.

What is the point of your comment? The author shouldn't have called out the upturned rake in the darkened shed?

rollcat | 2 years ago

> The author goes on to say they found this pattern lurking in various projects on github.

I'd call it an anti-pattern. If you spawn a process/thread, and never wait/join it, it means you don't actually care what it does, if it crashes, etc. I don't see a problem with Python's behavior here.

mannykannot | 2 years ago

The chances are that most people doing this are introducing some nondeterminism that they did not expect, and will have a hard time dealing with it when it bites them in the ass.

What's more to the point is that I am going to have a hard time when it leads to a serious outage or security violation at some major corporation that has become too pervasive in its reach for me to avoid its influence. No amount of schadenfreude is going to compensate for that.

hn_throwaway_99 | 2 years ago

Seriously?? I think the vast majority of developers would find it very surprising that the Python runtime would GC a task in the middle of execution. I would expect that the runtime would by default do what the example in the doc says, which is keep a strong reference to the Task object until it finished execution.

aeturnum | 2 years ago

I wouldn't say shouldn't - they are free to do what they want. But this is a blog post about something that can trip you up that the docs highlight - which the author calls a "heisenbug". The author doesn't even have a suggestion for the docs, which already calls out the problem they encountered, they just note that there are helpers for this problem (which is true).

The point of my comment is that subtle, non intuitive things like this are all over Python and, while this one is particularly bad, this blog post makes it seem like more of an aberration than it is.

IshKebab | 2 years ago

> Everyone reads the docs?

Wow I've heard people say that everyone should read all of the docs (which isn't really true) but I've never heard anyone claim that everyone does read all of the docs! Wild.

raverbashing | 2 years ago

> "" is truthy

Humm, no? Unless you mean ("",)

    >>> not ""
    True

aeturnum | 2 years ago

Oh, sorry, you are right - "" is false-y, even though it's a valid empty value. So it's hard to tell the difference between a value not being filled and a value being filled with an empty value.

ex:

  answers = {}
  answers["I exist"] = ""
  if answers["I exist"]:
      print("a")
does not print.

fbdab103 | 2 years ago

I guess I am too deeply in the Python ecosystem to see a problem here. Unless you want to check for the existence of "I exist"? In which case, the Python Way would be

  answers = {}
  answers["I exist"] = ""
  if "I exist" in answers:
      print("a")

pacaro | 2 years ago

Maybe

  ...
  if answers.get('I exist'):
    print('a')
Which is why you should always explicitly check for None if that is your intent.

aeturnum | 2 years ago

It's not a problem? The async interface isn't a problem either. It's just a thing you have to remember about python: "most input is truthy except for the input that isn't"

"Most of the time you don't disrupt your program by not keeping the returned reference in scope except for when you do"

It's just a thing that trips people up.

The alternative here (which would make your example work) is that dictionary lookups return some falsy sentinel value when the key does not exist, just like in javascript. In javascript, this has made a lot of people very angry and been widely regarded as a bad move.

Those values tend to propagate out and make a mess, sneaking into your stored data or causing errors in unexpected places -- fixing this feels like one of the main advantages of using things like typescript, and it's just not an issue in python.

It's not a problem, because the alternative is way worse. It's just a different language design than the one you're used to.

You could argue about whether or not truthiness makes sense as a concept (personally I think not), but the way it's defined in python is quite coherent and useful in practice.

heavyset_go | 2 years ago

> The alternative here (which would make your example work) is that dictionary lookups return some falsy sentinel value when the key does not exist, just like in javascript. In javascript, this has made a lot of people very angry and been widely regarded as a bad move.

You get that behavior by using dict.get(): answers.get("I exist") returns None, which is falsy.

dwattttt | 2 years ago

> It's just a thing you have to remember ...

The more of these things there are, the more brainpower you devote to remembering the right way to do things; if you don't you introduce bugs, a subtle, painful one here.

heavyset_go | 2 years ago

"Empty containers are falsy" is a Python fundamental, this isn't a subtle bug, but an obvious one.

> "most input is truthy except for the input that isn't"

Can you think of any other value that has len(x) == 0 but is truthy?

It’s quite simple. Empty collections and zeros are false, almost everything else is true.

The real head scratcher is that midnight is false.

fbdab103 | 2 years ago

Truthy is a Pythonic core principle of the language. It is not an edge case phenomenon in the language which I would expect a regular practitioner to confuse.

https://docs.python.org/3/library/stdtypes.html#truth-value-...

aeturnum | 2 years ago

I mean, I've seen bugs around that in code I've worked on and I've created bugs where it's a factor.

Weakrefs are also a core part of the language: https://docs.python.org/3/library/weakref.html . You can't use python without using them.

fiddlerwoaroof | 2 years ago

What I learned when I wrote Python professionally was “never rely on truthiness” explicitly writing out a boolean expression that does what you want is more explicit (“explicit is better than implicit”, PEP 8) and prevents a whole class of bugs down the line.

nemetroid | 2 years ago

PEP 8, which you mention, explicitly recommends relying on truthiness:

> For sequences, (strings, lists, tuples), use the fact that empty sequences are false:

  # Correct:
  if not seq:
  if seq:

  # Wrong:
  if len(seq):
  if not len(seq):

AeroNotix | 2 years ago

PEP8 is touted a lot as if it is a perfectly correct tome of ... correctness. I've worked in Python long enough to know that it both doesn't cover everything and the advice is sometimes actively bad.

raverbashing | 2 years ago

Amen

I mean, I'm not blaming PEP8 per se (A Foolish Consistency is the Hobgoblin of Little Minds) but it has a tendency to be taken as gospel by a lot of people

Funny enough, those who push for more strict adherence are also the ones that neglect other aspects (speed, alg. complexity, etc) especially readability (and no, PEP8 compliant code is not necessarily the most readable)

heavyset_go | 2 years ago

> if answers["I exist"]:

    if "I exist" in answers:
         ...

wizzwizz4 | 2 years ago

> So it's hard to tell the difference between a value not being filled and a value being filled with an empty value.

  >>> answers = {}
  >>> if answers["I don't exist"]:
  ...     print("a")

  Traceback (most recent call last):
    File "<pyshell#3>", line 1, in <module>
      if answers["I don't exist"]:
  KeyError: "I don't exist"
The method you're trying to use doesn't work anyway: it doesn't matter that it's confusing. You'd have the same problem with the value False.

hn_throwaway_99 | 2 years ago

I mean, that's the fundamental reason it's called falsey, that is, Python does automatic type coercion if it is evaluating things in a boolean context. FWIW this is the exact same behavior in javascript.

Etheryte | 2 years ago

I think you may be too bold with the assumption here, personally I would wager that the majority of people who write Python don't even know Python has official docs outside of a site called Stack Overflow.

leni536 | 2 years ago

Considering how many times I need to add site:python.org to my python search queries to actually get to the docs, I assume that a surprisingly low number of python developers actually read the docs.

0x008 | 2 years ago

If you use Druck duck go you can prefix search with “!py3”

brundolf | 2 years ago

In 15 years of programming I’ve never read the docs for anything front-to-back. I look at the docs when I have a specific question

It’s crazy to suggest something this surprising wouldn’t be a problem just because it’s technically in the docs

sidlls | 2 years ago

This is unfortunately all too common. I am often dinged by my manager about my throughput when investigating the integration of new libraries because I read the docs, front-to-back, as part of my research and that takes time.

brundolf | 2 years ago

I don't think it's really appropriate or feasible or even helpful for most things. At least for myself, if I read through the entirety of a language's documentation I wouldn't remember half of it, and I probably wouldn't need everything I remembered.

Language and library designs should optimize for least-surprise.

sidlls | 2 years ago

Most libraries are quite poorly documented (to my taste, anyhow). It is certainly worth spending the time reading at the very least documentation pertinent to the components of the library intended for use.

iforgotpassword | 2 years ago

> Everyone reads the docs?

For Python? The language where everyone just cobbles together random code from the internet and other repos? I can totally see how this mistake happens left and right. The bar of entry for this language is way too low to assume only rigorous senior devs use it.

bandyaboot | 2 years ago

He doesn’t really get into what makes this a Heisenbug, only that it’s indeterminate in nature. Would attaching a debugger/stepping through the code make it less likely that your task would get garbage collected out from under you?

Izkata | 2 years ago

You're probably going to need a reference to the task in order to inspect it in the debugger. Creating that reference prevents the bug.

foobarbecue | 2 years ago

Yeah, he seems to be re-defining the term to mean "a bug that occurs occasionally depending on system state" as opposed to "a bug that changes behavior when you observe it closely e.g. in a debugger."

macintux | 2 years ago

The first is a common way of using the term Heisenbug. I first heard it used that way 10 years ago when discussing Erlang’s error handling model.

foobarbecue | 2 years ago

TIL. I guess I assumed it would hew more closely to the Uncertainty Principle.

Edit: actually, come to think of it, I first heard of it in about 2006 from Jamie Brandon and at the time assumed it was something he'd made up. For a second there I forgot that 2006 is more than 10 years ago! (It was a python bug that went away when run in a debugger.)

throwaway81523 | 2 years ago

CPython does most of its memory management by reference counting, which fails to reclaim circular structure. So to make sure it gets everything, it occasionally runs a conventional tracing GC. If the GC happens to run just after you create that async task, the task itself can get collected, it sounds like. It's good to know about this and is (my own editorializing) yet another reason Python3 should have used Erlang-style concurrency instead of this async stuff.

remram | 2 years ago

It's actually very not Heisenbug, because if you enable asyncio's debug mode [1], it will tell you what's going on.

[1]: https://docs.python.org/3/library/asyncio-dev.html#debug-mod...

epakai | 2 years ago

I missed this in a little curses program launcher I wrote.

It looks obvious when he puts a big orange box around it, but in the actual docs it's an unassuming paragraph between two border-wrapped blocks with the only distinguishing feature being the bold "Important".

It should probably be referenced immediately next to the "Return the Task object" sentence.

His argument hinges on "I can't be bothered to read the docs on the stuff I'm using." So instead of reading the docs on coroutines and tasks before using them, writes a rant about how it's all wrong because he didn't understand how it works.

On a more fundamental level, why would anyone assume that a coroutine is guaranteed to complete if it is never awaited? There is no reason a scheduler could not be totally lazy and only execute the coroutine once awaited.

At least he bothered to make note of TaskGroups, also clearly shown in his documentation screenshot, immediately above the section marked Important that went ignored, and finishes with "As long as all the tasks you spin up are in TaskGroups, you should be fine." Yep, that's all there was to it.

> There is no reason a scheduler could not be totally lazy and only execute the coroutine once awaited.

Isn't the point of create_task (which is what the article is about) to launch concurrent tasks without immediately awaiting them? The example in the docs [1] wouldn't work (in the stated manner) if the task didn't start until it was awaited.

> At least he bothered to make note of TaskGroups [...] Yep, that's all there was to it.

That only works on Python 3.11, which was released just a few months ago. Debian still uses 3.9, for example, so the TaskGroups solution can't be used everywhere yet.

[1] https://docs.python.org/3/library/asyncio-task.html#coroutin...

The reason I said "on a more fundamental level" is that I'm not talking specifically about Python and asyncio, but coroutines in general. Even for Python, there are multiple event loop libraries available, they do not all work identically, which is why multiple ones exist. Someone here mentioned Temporal Python which works differently from asyncio, and would have avoided the author's problem. If you don't know how the scheduler works, you can't assume that a coroutine is guaranteed to complete just because you yoloed it into the scheduler, no matter how convenient that might be for you.

Yes, TaskGroups are a recent addition. If you can't use Python 3.11 for whatever reason, there is also the clearly written code sample at the bottom of the create_task documentation, which the author did not bother to mention. Probably didn't make it that far.

zackees | 2 years ago

[dead]

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

m3047 | 2 years ago

Hrmmmm.

> But who reads all the docs?

asyncio.create_task() doesn't exist in 3.6, and I can't find the string "to avoid a task disappearing" in the doc, so I'll go out on a limb: there is no such doc. However I see the reference to weakref.WeakSet.

Jtsummers | 2 years ago

The world didn't end in 2016. Welcome to seven years in the future where this documentation does, in fact, exist:

https://docs.python.org/3/library/asyncio-task.html#asyncio....

m3047 | 2 years ago

Some of us have been writing python since 2.x, and quite unsurprisingly wrote asyncio code at 3.6, and still happily support it. Some of us have even asked on HN about maintaining compatibility backwards and forwards between 3.6 and 3.11.

The documentation didn't exist at 3.6, when I wrote the code. I went and checked the source code, and the documentation and reported my findings. Good to know that there's a potential problem, don't you agree? What would you do differently?

cutler | 2 years ago

Maybe grafting async onto a single threaded dynamic language just isn't such a good idea in the first place.

hn_throwaway_99 | 2 years ago

Works absolutely fine in JavaScript, and the language is certainly much better for it than it was before async/await.

murphy214 | 2 years ago

bingo

postultimate | 2 years ago

This one seems quite easy to fix - just have the scheduler check that task objects have a positive refcount before running them, including the first time.

remram | 2 years ago

A positive refcount doesn't mean the object is not dead.

postultimate | 2 years ago

But the lack of a positive refcount means that it is, so this solves the problem the article was complaining about.

qxmat | 2 years ago

Python has a few weird issues like this. The last one I encountered was with a class inheriting Thread, join and the SQL Server ODBC driver on Linux. Fairly sure I hit page faults thanks to a shallow copy on driver allocated string data but didn't have the time to investigate like the hero of this blog post.

whoopdeepoo | 2 years ago

> But who reads all the docs

Why is this so common? Do people seriously not read a language/library documentation? That's the absolute first thing I do when evaluating a technology.

adamckay | 2 years ago

Because people have deadlines and need to get things working. You read enough to figure out how to do what you need to do and then mostly move on.

This function was added in 3.7 with no note on the importance of saving a reference. In 3.9 a note was added "Save a reference to the result of this function, to avoid a task disappearing mid execution." which was then expanded with the explanation of a weak reference in 3.10.

skitter | 2 years ago

It absolutely is common. People see there is a len function that takes one argument, they call len(some_collection), see that it indeed returns the number of items in the collection like they expect and move on. They don't expect len to return a negative number instead on Thursdays, and of course it doesn't because that would be a pretty big footgun. People also see that there is a create_task function that takes a coroutine, they call create_task(some_coroutine), see that the coroutine indeed runs like they expect, and move on. Sure, you're supposed to await the result, but maybe they don't need the awaited value anymore, only the side effects, and see that it still works.

throwaway81523 | 2 years ago

I had a manager who actually told me not to read docs. I was a bad report and read them anyway.

winter_blue | 2 years ago

This article just makes me feel like Python, while a language with nice-ish syntax, is a language that was poorly hacked and put together with little concern/thought about the real-world implications of poor design decisions like this async design decision (and also dynamic typing – a terrible thing in any language).

crdrost | 2 years ago

Most languages have something like this, usually around async.

For instance NodeJS has had a bit of this around promises, and eventually needed to institute the rule “if a promise rejects with an error, anf nobody is around to hear it, we will crash your program on the assumption that you probably needed to clean up some resources but didn't and now they're going to leak. Listen to the error with a handler that does nothing, if we are wrong about that.”

macintux | 2 years ago

One of many reasons I like Erlang: everything is async, so you have plenty of tooling/libraries/core language features to support you.

philwelch | 2 years ago

Python 2.7 was a nice little language. Python 3 is a bit of an overwrought monster at this point.

photochemsyn | 2 years ago

'async footguns' returns 20,000+ hits on Google. Top one happens to be:

https://news.ycombinator.com/item?id=32086973

> "Async seems to be the first big "footgun" of Rust. It's widespread enough that you can't really avoid interacting with it, yet it's bad enough that it makes..."

deschutes | 2 years ago

Fun stuff. Why aren't unfinished tasks gc roots?

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

kyrofa | 2 years ago

Great post. Feels like something a linter should catch.

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

dehrmann | 2 years ago

Another common async footgun I see is unthrottled gathering, and no throttling mechanism in the standard library. Once you gather an unspecified number of awaitables, bad things start to happen, either with CPU starvation, local IO starvation, or hammering an external service.

What I like about threads is they make dangerous things like this harder, and you have to put more thought into how much concurrent work you want outstanding. They also handle CPU starvation better for things that are latency-sensitive. I've seen degenerate requests tie up the event loop with 500 ms of processing time.

rednafi | 2 years ago

Huh! Unless you're using semaphores, you can also recreate similar situation with threads. Spin up a whole bunch of threads and send all of them towards some shared object or make 100s of requests with them.

There's not much difference between spinning up threads explicitly and creating async task with asyncio.create_task. In either case, you can throttle them with semaphores.

dehrmann | 2 years ago

I don't have a source or affected versions, but semaphores can scale poorly. I vaguely remember each blocked acquire getting checked on every event loop iteration, or something silly like that.

acjohnson55 | 2 years ago

Something linters can help with would think?

ryanianian | 2 years ago

C++ has nodiscard which is super useful for scenarios like this where ownership can be tricky.

jasonlaster11 | 2 years ago

Unfortunately we don't support python yet, but if you ever encounter a similar heisenbug in JS, you can try recording it with Replay.io

smetj | 2 years ago

Start a thread/greenthread/fiber/process/task without holding a reference to at least tie all loose ends at exit? Hmm dunno.

You can do that in go. You don't even get a reference to the thread/goroutine.

nixpulvis | 2 years ago

Fire and forget.

crabbone | 2 years ago

In many years since asyncio has been added, I have never used it willingly, outside of the cases where a third-party library required it. There has never been a practical benefit for any of that stuff when compared to select. It always worked poorly and never justified the effort one has to put into writing code that uses the library. The behavior OP describes is just one of the many bad design decisions that are so characteristic of this library.

29athrowaway | 2 years ago

Reading this was like the first time I read "for/else". "wtf?!" was my first reaction, then I read the documentation.

pyuser583 | 2 years ago

I don’t find this behavior odd at all. Dereferencing unassigned values is normal Python garbage collector behavior. Threads are an exception (no pun intended), but they’re an exception in lots of ways - just try pickling them.

samsquire | 2 years ago

Thank you for this. This is really useful information.

I recently adapted some garbage collection code to add register scanning.

I can imagine all sorts of subtle bugs where things go away randomly. One problem I have with my multithreaded code is that sometimes a thread crashes and the logs are so long I don't notice. From my perspective the thread is just not doing anything.

Sometimes the absence of behaviour can be really tricky to debug!

Is this something go developers also have to be careful with when using goroutines?

gerad | 2 years ago

No. But sometimes goroutines have the opposite problem, where they don’t terminate and get cleaned up.

https://betterprogramming.pub/common-goroutine-leaks-that-yo...

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

candiddevmike | 2 years ago

Is there an (easy?) test for checking goroutine leaks?

Snawoot | 2 years ago

Yes, it's visible on goroutine profile, provided by built-in profiler pprof. E.g.: https://github.com/mysteriumnetwork/node/issues/5311#issueco...

Jtsummers | 2 years ago

No. Goroutines don’t generate a reference to hold onto, either. They just run until they or the program terminate.

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

makomk | 2 years ago

Well, this explains that one really annoying intermittent bug that I was having in some asyncio-based code.

magicalhippo | 2 years ago

Delphi had the opposite bug in its thread pool. The worker threads would dequeue a work item and process it in a loop. The work items were reference counted.

Now, Delphi doesn't have scoped variable declarations like say C++ or C#, so the dequeued work item was stored in a local variable. However, it didn't drop (nil/null) the work item reference before it looped. Thus it would hang on to that reference until the next work item got dequeued or the pool was destroyed.

The result was that if you in a function started a task which captured a local reference (f.ex. using an anonymous function) and then waited for it, that reference could live after your function returned if the pool didn't have anything else to do. Not what most people would expect.

aardvark179 | 2 years ago

The same problem or something similar exists in many languages. Threads are GC roots because the OS knows about them, but this may not be true for lightweight threads or async callbacks.

It is hard to fix because you don’t want to introduce references from an old object (such as a list of callbacks) to many new objects as that will introduce GC issues, and many other potential leaks.

collinvandyck76 | 2 years ago

It seems like the library should retain a handle on the task until it completes.

nerdponx | 2 years ago

This is partly the goal of structured concurrency, implemented as Nursery in Trio and TaskGroup in Asyncio.

jarboot | 2 years ago

If I want to create a task that runs even after the function returns, ie "async def f(): asyncio.create_task(coro=10_second_coro.run()); return;" is there any way to mitigate this? Function-scoped set of tasks?

nhumrich | 2 years ago

Yes, read the last part of the included documentation and hold onto background tasks.

jmholla | 2 years ago

Your task is implicitly not function-scoped as you want it to survive exiting the function. What your doing here would be better architecturally done with threads. async is not a direct replacement for threading.

But, you could also return the task object to the caller and have them manage it. There's also nothing async about your function, so you don't need the async or to await it.

mkarliner | 2 years ago

Thank you!

phyzome | 2 years ago

That seems very odd for a default behavior. There might be good reasons to allow GCing of a dropped task reference, but it doesn't seem like that would be the most common case.

cmstodd | 2 years ago

Thanks for posting.

nixpulvis | 2 years ago

Hey, at least it's documented... good developers actually RTFM.

I can't comment on the design of this API, because I don't feel like learning the library, but in some performance critical applications these sorts of contracts aren't all that uncommon. Granted, this is python, I guess it's a bit more suspicious, IDK.

vbernat | 2 years ago

The documentation update is quite recent (Python 3.11). It was added after this ticket: https://bugs.python.org/issue44665 (not the first ticket around this problem).

[Deleted] | 2 years ago

<p>[Empty / deleted comment]</p>

osigurdson | 2 years ago

A little pedantic but HUP concerns the fundamental limits of simultaneously knowing a particle's position and momentum, not about observation impacting outcomes.

blacklight | 2 years ago

I've used asyncio.create_task forever, and I've admittedly never read its documentation in depth.

However, I've ALWAYS assigned the return value of create_task to some variable.

To me this is just a good programming practice. The OP says "tasks are not like threads - that you can just launch and forget" - no! Even a thread should not simply be launched and forgotten! You always need a reference to it, so when your application is terminated you can join all the threads that are still running and things can exit in a clean way! Same goes for the tasks: when your application exits, it's just a good practice to stop the even loop and cancel any pending tasks.

And, in general, one should always keep in mind the reference count rule in Python (the author incorrectly calls it "garbage collection" btw): if something you created in your function isn't referenced/assigned to anything, then its reference count will be zero, and it will be removed when the function stack unwinds. This is totally expected behaviour to me.

notatoad | 2 years ago

wow. yeah, this absolutely explains a heisenbug that i've been chasing for a while. and i can't count the number of times i've had that exact doc page open on my screen in the last few months, and never bothered to read that block of text that starts with "important"...

thanks

fsckboy | 2 years ago

The oop model (or any "client" model of state) dependent on state outside your code being encapsulated/contained inside an object within your program is always confusing. It's not particular to this library or python.

A gui window handle within your program, or simply an open file handle, if the OS does something to your object, it's hard for you to know about it, and your object continually needs to refresh its state if you are concerned about it. I don't know what is referred to as a "task" in this case, but I don't think the lifetime of the actual task is the issue, it's the lifetime or your object.

It's always the case that if you instantiate an object with a ctor, you can't count on anything about it continuing if the dtor is invoked. The problem is much more general than this API, this library, this language, this use case. Just as you need to structure your C code so malloc and free will always match up spatially and temporally, you need to structure your oop code so ctors and dtors match up sensibly. Otherwise the confusion in your head will spread to your code.

And those who always want the compiler and tools to automatically do as much as possible to free them from the burden, are the ones who are most surprised by Heisenbergs. Computers can (or will try to) do anything, it's up to you to make sure it does what it needs to. Maybe someday AIs will do it better than us, but right now you need to provide the I.

tbrownaw | 2 years ago

So a `Task` object is in fact the actual task, rather than a handle or pointer to the actual task.

aldenpage | 2 years ago

That's extremely insidious. I suppose I never encountered this issue because I almost always call asyncio.gather(*), which makes having a collection of tasks natural.

kortex | 2 years ago

This is good form. It makes top-level control flow easier to follow, and keeps the concurrency scoped.

BiteCode_dev | 2 years ago

And this is why trio got it right, and why I think the task groups (nurseries from trio) can't arrive soon enough in the stdlib.

Because not only you must maintain a reference to any task, but you should also explicitly await it somewhere, using something like asyncio.wait() or asyncio.gather().

Most people don't know this, and it makes asyncio very difficult to use for them.

postultimate | 2 years ago

> task groups (nurseries from trio) can't arrive soon enough in the stdlib

Please, no. Asyncio is horrible, and bodging it to make it less horrible just means we will be forced to live with the remaining horror. Far better to replace it with something that works properly (yes, Trio).