What we'll cover
Never mind Python for a moment, how do computers do two things at once?
Back in the olden times, any computer you could afford only had one cpu and it only had one core in that cpu.
But even back in Windows 3.11 days, you could run more than one program at a time. How was such black magic possible?
Well each running program is really called a process. And each process has at least one thread. It's this thread that is actually running on the cpu. Thread is short for thread of execution
The OS is in charge of scheduling each thread. But how can a single core run the OS and multiple threads?
The OS has hardware support, the OS sets a timer and then there is an interrupt that forces the OS to start running again. This was a major difference between x286 and x386 back in the mid 80's.
So basically, the OS just schedules threads to run. Those threads may be different processes or it may be one process with lots of threads, it doesn't really matter.
Let's talk about steak for a minute.
So with one core, only one thread is ever running at one time. With multiple cores then multiple threads can run at the same time, this is called parallelism.
Switching from one thread to another is called a context switch. Alternating between threads of the same process is concurrency.
|One CPU cycle||0.4 ns||1 s|
|Level 1 cache access||0.9 ns||2 s|
|Memory access||~100 ns||4 min|
|SSD||50–150 μs||1.5–4 days|
|Spinning Rust||1–10 ms||1–9 months|
|SFO to NYC||65 ms||5 years|
|SFO to HK||141 ms||11 years|
As interesting as that was, the reason for it was...
|Context Switch||~30us||~1 day|
Whenever software interacts with the real world (the network, hard drive, clocks, etc) the application doesn't actually interact with anything, it asks the operating system to do it for us.
Your app took the blue pill, happy in its sandbox. The OS is the red pill
Calling into the OS is usually like calling any function, the calling function passes control into the callee and when the callee finishes, the caller resumes.
That's a blocking call and hopefully the concept is familiar.
Calling into the OS is a context switch.
But what if instead of blocking and waiting for the OS (for days/years) we instead keep running and the OS will just let us know when it did that thing for us?
That's a non-blocking call. Async is the name of a pattern that only makes non-blocking calls.
Let's see how far this rabbit hole goes.
So if we want to do lots of things at once, like handle web requests if we're a web server, then we need one thread per request.
That would totally work but it'd be slow as hell because of all the context switching.
With optimizations, this is more or less what the Apache web server does.
But what if we write our program in such a way as to never do a context switch?
That's the heart of asynchronous programming.
This is essentially what the Nginx webserver does and it can handle 100's of thousands of connections with a single thread.
If async is so great why isn't everybody doing it?
Well a lot more people are. However...
Super confusing at first. Super confusing the rest of the time. Callback hell. The thread can't actually do any work.
We haven't talked about implementation yet, but if we go back to the Nginx example with 100,000 connections, there isn't a lot of time do anything while you're servicing that many connections.
So if Nginx were to make a blocking call to that totally free service in China then the other 99,999 connections will wait until that very long round-trip is finished.
In an async program you're always racing back to the event loop.
while True: tasks = os.wake_me_if_something_happens() # blocking call for task in tasks: task.do_callback() # this hogs the cpu
Examples of work: new incoming network connection, network packet to read, network packet successfully sent, timer went off, a file changed
The heart of any async program is the event loop. Lots of libraries implement an event loop that you then build your app on top of.
Maybe but probably not. It depends.
Since work only ever happens in a thread and we only have one thread then a program with two threads will be twice as fast.
A program with 1000 threads will spend so much time context switching that no work will ever get done.
If you're doing lots of work (CPU bound) then use threads
If you're doing waiting work (IO bound) then use async.
Since Python has the GIL (Global Interpreter Lock) only one thread will ever run at a time (even with lots of cores).
That's why Python is "slow" but totally kicks ass as a web server. Being cpu slow doesn't matter when network time completely dwarfs everything else.
Because events can come in at any time we can't know what order our code will be run in. This is the opposite of syncronous code, hence asynchronous.
Good stuff: YouTube: Node.js Is Bad Ass Rock Star Tech