Concurrency and Parallelism using; asyncio, threading and concurrent libraries in Python: Part 1



Before we dive into a tutorial and start dwelling over which method is more properly suited for any given application, I want to make sure that you understand the difference between concurrency and parallelism.


As I was learning about these two topics, I came across several definitions and comparisons that were most of the time very confusing. When I began writing this post I decided to approach explanation differently, In a way I would’ve liked. To explain the difference, I’m going to step away from python for two minutes and give you a real life example. It’s gonna seem off topic, but just bare with me.


Understanding the Difference between Concurrency and Parallelism


You could be in your home kitchen, cooking, watching TV, on the phone with a friend and trying to feed your niece all at the same time. However, even though we’re saying this is all done at the same time, the truth is that at any single moment, you can only really be doing one thing. If you’re consciously watching TV and diverting all your senses to it, you cannot simultaneously use those same senses for your friend on the phone or your hungry niece. To put this example into perspective, what you’re really doing, is doing all of those things concurrently. That is, you’re dealing with several things at the same time, but that doesn’t necessarily mean that you’re doing all of them at once. Doing all at once is called parallelism.


Parallelism in this context would be if a human can use the full power of their senses in more than one application at any given moment. If Jane starts speaking with you while you’re talking with Joe, you cannot divert your full attention to both simultaneously. At some point, you will lose track of one, or even both.


On the other hand, computers are capable of parallelism. In fact, most modern computers are able to handle four different processes at the same time. Yea, a computer can speak with Joe, Jane, Don and Bob all at the same time, and respond to all of them, simultaneously and adequately. Such computers have four CPU Cores. I will not go into what exactly a CPU core is in order to keep this post simple and adequately informative.


Now, to keep this picture painting going, imagine a CPU core as a human brain. From the example above, we discussed that as humans, we’re able to deal with several things at once. In other words, concurrently. And we have one brain. Computers (modern-personal computers) have four. So not only are they able to do four things at once, but they’re also capable of dealing with so much more around each thing they’re doing. That speaks to the level of complexity these machines are capable of maintaining.


With all that being said, let's pour it all into Python. The first question to ask yourself is, what type of processing do I need? Do I need parallel processing or concurrent processing? Well, how do I know?

Which one for my application?


It’s simple really. The answer lies within the task at hand or the objective of your application. Ask yourself, does my application's progress rely on the speed of the CPU or the speed of the I/O subsystem? In other words, is it CPU bound or I/O bound?

If the task requires a lot of intense processing such as crunching numbers for sophisticated algorithms, data processing, or encrypting and decrypting, than this is a CPU bound application and you're better off parallelizing. To be very clear on this point, you're choosing parallelism because the rate at which your application progresses is dictated by the speed of the CPU.

On the other hand, if the application's progress relies on the speed of the I/O subsystem, then this is an I/O bound process and you should use concurrency. Again, to be clear, you're selecting this method because the application would run faster if the I/O was faster. An I/O subsystem could be a variety of things spanning all the way from your hard drive disk (writing to a file on it) all the way to a third party network server (making http network requests).


Your reasoning for selecting concurrency should develop from the need to efficiently (time) execute several tasks within a process. For example, if an HTTP request is going to take some time to respond, than I should do something else while I’m waiting, such as make another request, execute another function that perhaps writes to the disk, or whatever it is your application can do while waiting. Try to imagine if you were working in a kitchen restaurant, and you were given a days work of cooking and cleaning. A concurrent you would start the cooking first and then start cleaning while the pot roast is in the oven. That way, your using time efficiently, and you'll ultimately reach the goal faster. A sequential you would start and finish cooking before even thinking about cleaning. That person would also do absolutely nothing while the pot roast is cooking. What good is that?


On the other hand, when there is intense work that needs to be done for a specific task, concurrency would not be the ideal choice, simply because I don’t need a person who can cook and clean at the same time. A better example would be if the kitchen is severely unclean. I don’t need someone to work concurrently, because there is no concurrency needed. I need to hire a really good cleaner who can handle this intense amount of work of cleaning, or maybe even four to work in parallel! (see what i did there).


Now that we understand the difference between the both, and hopefully by now we’re able to select the appropriate method to complete the task/process, let’s see how these are done in Python. This will be done in the next post (part II).