IT & Engineering
We’re here to discuss parallel programming, but we want to set the scene with burritos to make the topic easier to digest. If you had to make 100 burritos for a potluck party, it would take a very long time. But what if you got a group of nine friends and gave everyone proper burrito-assembly instructions? The same task that would’ve taken you tens of hours can now be done in maybe thirty minutes, depending on how fast you roll.
Setting up the burrito rolling production requires prepping ingredients, gathering friends, giving instructions, and testing for quality – but distributing the task among 10 people and running the tasks in parallel undoubtedly saved you a lot of time. We can do the same thing in coding with parallel programming.
Not sure what this means? Don’t worry. This article will cover the basics of parallel programming, its common uses, and its pros and cons.
Parallel programming is often used interchangeably with the terms parallel processing, parallel computing, and parallel computing solutions. Parallel programming is like running 10 burrito rolling stations in parallel instead of slowly making 100 burritos yourself. In computer science terms, parallel programming is the process of splitting a problem into smaller tasks that can be executed at the same time – in parallel – using multiple computing resources. In other words, parallel programming allows programmers to run large-scale projects that require speed and accuracy.
You can use parallel processing techniques on various devices, from mobile devices to laptops to supercomputers. Different programming languages rely on different technologies to enable parallelism. Open multi-processing (OpenMP) provides a cross-platform API for developing parallel applications using C, C++, and Fortran across the cores of a single CPU.
On the other hand, technologies such as the message passing interface (MPI) enable parallel processes between different computers or nodes.
Parallel programming is an umbrella concept that can describe many types of processes that run simultaneously on the same machine or on different machines. Before we dive in, let’s distinguish between some popular parallel programming models:
These models describe how processes interact with one another in parallel programming. Let’s look at each of these, as well as some of the principles of parallel programming, in more detail below.
Data parallelism is taking a given task and splitting its execution by the work to be done. Let’s continue with the burrito example. Say you and your two friends need to make 100 burritos. One way to split this up would be for all of you to make 33 burritos concurrently.
Task parallelism is splitting a task’s execution by individual tasks. This time, instead of splitting burrito work by number of burritos, one friend would make tortilla, another makes the chorizo, and one would assemble.
Multithreaded programming is a subset of parallel programming in which more than one set of sequential instructions (“thread”) executes concurrently. Multithreading is a concept that can exist either on a single core, or multiple processes. If the threads are run on a single processor, the processor rapidly switches between the threads. It’s important to point out that on a single core, rapidly switching processes isn’t a true representation of multithreaded programming, but more an example of the CPU prioritizing the execution of these processes. When the threads run on multiple processors, they’re executed simultaneously.
With the shared memory model, the program is a collection of processes that use common or shared variables. This program is common with affiliated data stored in the main memory. All the processes access this common program and data. Each process is assigned a different part of the program and data, and the main program creates separate processes for each processor. After the processes have run, they all rejoin the main program.
The processes share a global address space where they perform read and write functions asynchronously.
In the message-passing model, parallel processes exchange data by passing messages to one another. These messages can be asynchronous or synchronous.
Partitioned Global Address Space (PGAS) models live somewhere between the shared memory and message passing models. PGAS provides a global memory address space logically partitioned for each process. Parallel processes “talk” by performing asynchronous operations like read and write functions on the global address space.
Since parallel programming is great for decomposing complex tasks, it usually shines best when leveraging complex calculations, large datasets, or large simulations.
Here are some use cases for parallel programming:
You can use parallel programming when you want to quickly process large amounts of data. It’s easy and can help you complete projects quickly and efficiently. While parallel programming can create technical debt and be time-intensive to set up – after all, programmers need to develop efficient parallel algorithms and code – the process saves time overall. By leveraging parallel processing power, parallel programming runs a specific program across multiple compute nodes and CPU cores at the same time.
Data processing doesn’t have to be difficult, and with the help of parallel programming, you can take your to-do list to the next level.
The most significant benefit of parallel programming is faster code execution, saving you runtime and effort. Instead of running sequential code, you’d run parallel code. These rewards are especially evident in large-scale data parallelism, as shown in our example above. Data parallelism is when we have each thread work on the same set of tasks on a subset of values. This means that each thread performs the same task on different data sets — the data itself is being parallelized, not the tasks themselves. Fewer tasks mean less time and effort, which equals more time to spend on other details and projects.
Parallel programming is not limited to data parallelism, however. We can spread code execution across several tasks for faster execution by distributing tasks across different threads, and across different processors. By doing so, we’re also increasing a program’s natural resources for work and thus increasing its capacity. In short, we get things done faster.
For all its virtues, speed does have its downsides. In parallel programming, the code executes non-sequentially. If an operation requires a specific order of code for the next statement to process, that operation will fail if you apply parallel programming.
Since the code is moving quickly during parallel programming, it might also create a few new bugs. The two big bugs to look out for are; data races where two processes unintentionally pick up the same data and create unexpected mutations, and deadlocks, where threads are not able to release memory properly causing them to endlessly wait on each other to finish a process.
Let’s move from the concept into something concrete. In this section, we’ll cover an introduction to parallel computing.
For this example, we will use C# and Data Processing to create a general sequential loop executed across several threads.
To understand parallel programming in a business context, imagine you’re trying to process a large group of payroll data.
//model
public class Employees
{
public string name {get; set; }
public double salary {get; set; }
}
//Instantiate Class
List<Employees> employeeList = new List<Employees>();
employeeList.add("Sam", 1000.0);
employeeList.add("Dean", 1000.0);
employeeList.add("Bob", 2500.0);
To process this data, you can divide it into smaller pieces and execute it through parallels. By running these parallels, you’re taking data items like the person.name and their salary and processing them in different threads.
//Parallel Simple Example
public void CalculateSalary(List<Employees> employeeList){
Parallel.ForEach(employeeList, person =>
{
Debug.WriteLine("Name :" + person.name + ", Salary: $" + person.salary);
});
}
Simultaneously processing these various smaller threads is faster than processing one long thread. And, as we all know, faster payroll-related anything is a win for everyone.
Easy and efficient, right? In this article, we briefly introduced parallel programming and how you can use it in your email development practices. At Mailgun, our IT and Engineering teams are always exploring and sharing tools and ways to optimize.
Want a backstage pass to our engineers brains? Check out our software engineering tutorials and educational materials. Or, sign up for our newsletter so you never miss our tips and tricks.
Send me the Mailjet Newsletter. I expressly agree to receive the newsletter and know that I can easily unsubscribe at any time.