Distributed Memory Model
Hi y’all, are you searching for Distributed memory models on internet and couldn’t find a proper piece catering your requirements? So let me tell you’ve reached your destination. This is Samyak Shukla here and I am presenting you a brief but interesting article on Distributed memory models. So, brace yourself and without wasting anymore time, lets just dive deep into the topic.
Before directly jumping to the main topic i.e., Distributed Memory Models, we should first learn Shared Memory models to understand the working of Distributed memory architecture which we will be discussing in a short while.
What is shared memory architecture? In this architecture, there is a large memory unit and there are multiple processors from 1 to ’n’ which are accessing it. Let’s take an example for it. As a student we must have used Blackboards in our school time, a number of students can use the blackboard at the same time. In this case blackboard acts as a shared memory unit and students are posing as processors. Simply put, multiple processors are using a shared memory unit and each processor has access to a certain portion of memory in shared memory architecture.
Now since we have understood the shared memory architecture, understanding Distributed Memory architecture will be a little more easier.
What happens in this architecture is that there are multiple processors, each one of them having its own local memory and are connected through some interconnection network. All the processes, distributed across several computers, processors, multiple cores, are the small parts that together build up a parallel program in the distributed memory approach. Simply put, the memory is not shared anymore, it is now distributed among multiple processors having their own private memory. Let’s assume there are two processors P1 and P2 with their respective memory units mem1 and mem2. In this architecture if one processor P1 needs to access data from some other processor’s(P2) memory unit(mem2) which is remote over here, P1 needs to write data to the interconnection network, that has to be read by processor P2, which needs to fetch data from its local memory(mem2) and put it back onto the interconnection network and then it is read by processor P1. That is the whole working of Distributed Memory Model.
It is much easier to build networks that can connect large numbers of computers together than it is to have large numbers of CPU-cores in a single shared-memory computer. One of the difficulties with this architecture having multiple separate computers with their own private memory is that we now have to write a program that can take benefit of all those thousands of CPU-cores and this can be quite challenging.
Suppose a scientist is working on solving a large computational problem, with the help of this architecture, more memory and more computational power is available so that he can increase the problem size and therefore solve larger computational problems in the same amount of time through adding additional processors(that comes with their own private memory) and keeping the work load per process (i.e. size of the subproblem and number of operations) at the same level. Or what he can do is maintain the overall problem size and divide the bigger problem into smaller chunks of subproblems to a larger number of processes/processors. Every processor then needs to deal with a smaller workload and can finish its tasks much faster. In the best case scenario, the reduction in computational time for a problem of fixed size distributed on P number of processes will be P. Instead of one simulation per time unit (one hour, one day, etc.), P simulations can be run per time unit. This method is known as strong scaling.
Simply put, distributed memory architecture can help you solve larger computational problems within the same amount of time or help you solve problems of fixed size in less amount of time in comparison to when they were done using shared memory architecture.
Ok so now let us see how the messages will be conveyed. How does one process will know what the other processes of the program are doing? As we know from above, the processes explicitly have to send and receive the information, data and variables that they or other processors need through some interconnection network. This, in turn, brings with it some drawbacks, especially concerning the time it takes to send the messages over the network.
Let’s take a scenario where some collaborative work occurred around a table in a meeting room and all the information was made freely available for everyone sitting at that table to access, make use of it and work in, even in parallel. Just suppose what if the collaborative table and the common meeting room is exchanged with individual offices, everyone works in their individual office and now employees sit and work, make changes to papers or documents sitting in their private office. Suppose in this scenario, an employee named Rohan is working on something, he made few changes to the document he was working on and now he wants to tell his colleagues that some changes have been made. Now he will have to stop doing his work, move out of his office, go to each of his colleague’s office one by one informing them that there has been some changes in the document and after he has informed everyone, he will finally come to his office desk and continue doing his work. This takes a huge toll on both time and efficiency of employees, requires much effort than sliding the document over the table in the common meeting room as it happens in shared memory architecture. In the worst case scenario, Rohan will spend more time informing his colleagues about the changes in the report than actually doing his work. So we can clearly see that communication, message passing is the new bottleneck in our version of analogy, that slows down overall work progress of the whole team. If we had to reduce the time spent in message passing, communication within the co-workers (or speed up the communication, maybe by installing telephones in the offices, or using mobile phones, real time chat application or using an even faster kind of network) we could spend less time waiting on messages being delivered among the team members and more time solving the big computational problems. In distributed memory architecture computing, the bottleneck is mostly the technology that passes electronic data to each other, the wires (interconnection network) between the nodes (each computer is called here a node, nodes have its own processor with its own private memory unit along with OS). The current industry standards for providing high throughput and low latency is Infiniband, which allows the passage of messages to occur a lot quicker than ethernet.
Advantages:
Distributed memory architecture has some considerable number of advantages. One of the most prime reason for using distributed memory architecture is same as in the case of shared memory architecture. We can add more computing power, either in the form of additional processor cores, sockets, or nodes in a cluster, we can then start more and more processes and take benefit of the added resources by the increased computational power to get the results of the simulations faster.
As we know very well that with the distributed memory architecture approach, we also get the advantage that with every compute node added to the cluster, we have more memory available as each processor has its individual private memory unit. We are no longer constrained by the amount of memory our mainboard allows us to utilize, theoretically, we can compute models that require high computational powers. So we can clearly say that Most of the time, distributed memory architecture is more scalable than shared memory computing i.e. the increase in speed will saturate at a much larger number of processes (compared to the number of threads used).
Disadvantages:
However, we should be aware of the limitations of distributed memory model as well. Just as in the shared memory case, there are some problems that are hampering the computations with the distributed memory approach. This time, we also need to look at the amount of communication that is needed to solve the problem, not only if it is easily parallelized.
Let’s take a scenario where there is a time-dependent problem in which a large number of elements are interacting in such a way that recursively after each step, all of the elements need to have the information about every other element. We are assuming that each of the element is computed by its own process having a private memory unit. Therefore the rapid growth in the number of messages which needs to be passed after each iteration along with the increase in number of elements and processes are causing a major communication bottleneck.
So this was my Blog about Distributed Memory Model. I hope you must have liked it. Thank You!!