Parallel Processing to Reduce Processing Time

September 23, 2021 - 5 mins read

Prologue

Parallel processing is a computation method in which a large task is broken down into several independent, smaller and usually similar tasks running simultaneously on different processors in the same server which is running the Python script.

This can significantly reduce the execution time of a program with a very large data set. We are going to have a detailed look into the parallel processing concepts, relevant python methods with examples.

multiprocessing module in Python

In this article, we are going to use multiprocessing module. It launches subprocesses instead of threads and have independent memory space. Hence, the subprocesses can be launched on different physical processors (also called as CPU cores, we will be using CPU core and processor interchangeably throughout this article) on a machine, both Unix and Windows.

Parallel processing can only be exploited well to decrease the execution time if the machine running the script has more than one processors. It is noteworthy that using subprocesses brings in its own overhead. And creating subprocesses on a single processor machine actually increases the total execution time compared to sans subprocesses execution.

Number of processors on a Machine

Naturally, the follow-up question comes: How do I find the number of processors on my machine?

The answer will vary on the machine type under concern. We are discussing each type in this section.

Linux Machines: Terminal can be used to run nproc utility which will return the CPU core count. Detailed information about each core can be found by firing command cat /proc/cpuinfo in terminal.
Apple Mac: In this case, the CPU cores can be found in System Report. Navigate as: Apple Icon Menu > About This Mac > System Report > Hardware. The number of cores can be found against “Total Number of Cores”
Windows Machine: There are multiple ways of finding CPU core count in Windows. We are discussing two major ways.
- Task manager: Task manager > Performance > The number of cores can be found against the “Cores” option below the performance graph.
- Note that if Windows machine is running as a server in cloud or a virtual machine, then the cores should be found against “Virtual processors”.
Any machine with python installed: multiprocessing module provides a method microprocessing.cpu_count() to return the CPU processor count. So the following command can be used in a terminal python -c 'import multiprocessing as mp; print(mp.cpu_count())'

Getting hands dirty

Enough of talking, now it’s time to get our hands dirty with examples.

As discussed earlier we are going to use multiprocessing module. We will leverage the Pool object to create parallel process. The Pool object is used to create a pool of worker processes which can be assigned jobs. The maximum number of worker processes that can be run simultaneously on different cores would be total available cores available on the machine.

The Pool object has processes as an argument which is essentially the number of worker processes to use. If no value is assigned, it will default to the total number of available cores on the machine.

Creating parallel processes: Following shows a basic example of creating multiple processes using Pool.map() to find square of numbers.

import multiprocessing as mp
import os

def squareCal(x):
    print (mp.current_process())
    print ("Process ID: ", os.getpid())
    print ("This string is for intermixing the outputs.")
    return x*x

if __name__ == '__main__':
    
    print("Processor count: ",os.cpu_count())
    
    with mp.Pool() as p:
        print(p.map(squareCal, [1, 2, 3, 4]))

Note that in the above example, we have not provided any value in mp.Pool(), so it will default to 4 worker processes pool, as the machine used in example has 4 CPU cores. And the p.map() method assigns the jobs to the worker processes.

In the output of above code, the process ID and last string in each callback function (squareCal()) of each worker are not printed sequentially. The reason being workers from different CPU cores are using the I/O stream which is mixing the outputs.

MultiProcesses

So, if we use a single worker process (mp.Pool(1)), in that case, each execution of the above snippet will result in perfect ordered output, as only one worker process can use the I/O stream.

Now, let’s take an example of finding sum of consecutive integers from 0 to n-1. We need to keep this in consideration that the sum has to be found for a very large value of n, so that the parallelization compensates the small overhead due to parallelization. If n is a small value, it would take lesser time to run it on a single processor as it will avoid the overhead.

In the following example, we are taking a very large value of n (10⁹).

Sum without Parallelization

import multiprocessing as mp
import os
import time

if __name__ == '__main__':
    print("Processor count: ",os.cpu_count())

    sum = 0
    i = 0
    startTime = time.time()
    while i < 1000000000:
        sum = sum + i
        i = i+1
    print("Sum without parallel processing: ", sum)
    print("Time Elapsed without parallel processing: ", (time.time() - startTime))

The above code takes 133+ seconds to find the sum:

SingleProcessorSum

Sum with Parallelization

The following snippet uses all the four cores from machine to find the sum.

import multiprocessing as mp
import os
import time

def findSum(x):
    sum = 0
    i = x
    while i < (x+250000000):
        sum = sum + i
        i = i+1
    return sum
    
if __name__ == '__main__':
    
    print("Processor count: ",os.cpu_count())

    startTime = time.time()
    with mp.Pool() as p:
        sum1, sum2, sum3, sum4 = p.map(findSum, [0,250000000,500000000,750000000])
    sum = sum1 + sum2 + sum3 + sum4
    print("Sum with parallel processing: ",sum)
    print("Time Elapsed with parallel processing: ", (time.time() - startTime))

The above code with parallelization took little over 56 seconds:

MultiProcessorSum

Conclusion

The test results clearly show that with parallel processing, it’s taking little over lesser than 56 seconds which is lesser than half the time it takes to find the sum without parallelization (133 seconds). And the difference grows even larger as we go for larger computations.

So, if you have large data sets which can be computed independently and multicore CPU on machine, parallel processing will be a good idea to decrease the execution time.