Multicore Applications
Edit this on GitLab
Multi-Core Applications
The following guide was created using a 75G5
Summary
Multi-Core applications are applications that use multiple cores on a CPU
There are 2 main ways to create a multi-core application
Multi-threading
Multithreading creates multiple threads within a single process to boost computing speed.
-
Threads share the same memory space and resources within a process.
-
Threads execute concurrently, improving throughput.
-
Process creation in multithreading is more economical.
-
Example: A web server handling multiple client requests concurrently using threads.
Multi-processing (Forking)
Multiprocessing involves using two or more CPUs to enhance system performance.
-
Multiple processes run simultaneously, each with its own memory space and resources.
-
It increases CPU utilization by distributing tasks across different processors.
-
Process creation is time-consuming.
-
Example: Rendering frames in a video editing software using separate CPU cores for each frame.
Context Switching
-
Suspension and State Storage: When the operating system decides to switch from executing one process (or thread) to another, it suspends the current process. It saves the CPU’s state (context) for that process in memory. This context includes information like register values, program counter, and other relevant data.
-
Context Retrieval: The OS then retrieves the context of the next process (or thread) that needs to run. This context was previously stored in memory during a previous context switch.
-
Resuming Execution: Finally, the OS restores the retrieved context into the CPU’s registers. It sets the program counter to the location where the process was interrupted, allowing it to resume execution from that point.
Context switching allows efficient multitasking by allowing the CPU to switch between different processes or threads, maintaining their individual states.
Applications
Here are some examples of applications:
-
Multithreading:
-
GUI Applications: Improve responsiveness by handling UI events in separate threads.
-
File I/O: Read/write files concurrently without blocking the main thread.
-
Network Servers: Handle multiple client connections simultaneously.
-
Parallel Data Processing: Divide large datasets into smaller chunks for parallel processing.
-
Process Failure Testing: Multi-Threading can be used to make sure that a process is operating correctly without hurting the performance of the process by testing it on a separate core.
-
-
Multiprocessing:
-
Image Processing: Transformations to images using multiple processes.
-
Scientific Computing: Solve complex mathematical problems by distributing computations across processors.
-
Video Encoding: Encode videos faster by splitting frames across multiple processes.
-
Simulation: Simulate real-world scenarios (e.g., physics simulations) using parallel processes.
-
Data Parallelism: Train machine learning models on subsets of data in parallel.
-
Project Guide
Overview
We will be creating 3 different applications
-
The first 2 applications are the same, just use different forms of multi-core applications, it this case it doesn’t matter which you use as you will get the same result either way.
-
A multi-threaded sorting algorithm that sorts 2 identical 10,000 integer lists
-
A multi-processed sorting algorithm that sorts 2 identical 10,000 integer lists
Here’s an example of what this looks like
-
-
The third one is a state machine that is set to fail at a specific state, and is detected by tester in a thread running on a different core
-
This will use multi-threading as it requires shared memory to detect failures
Here’s an example of the third sample app:
-
What you will learn
-
How to create a multi-threaded application
-
How to create a multi-processed application
-
Reasons to pick either multi-processing or multi-threading
-
How to synchronize multi-threaded applications
Requirements
Required Hardware:
-
Any NAI Multi-Core Board
For this guide I used a 75G5 as it uses a dual-core processor (specifically the Xilinx Zync 7015)
Xilinx Setup
Note
|
This is the setup for a empty project, If you are using an NAI SSK, please follow the proper project setup for your SSK version. |
-
Press "Browse"
-
Pick a location for your workspace, then create a new folder with your workspace name, Then press "Ok"
-
Expand your workspace
Setup your application
-
Right click on the empty space "C/C++ Projects" tab on the left size of the screen
-
Select New → Project..
-
Select "C Project", Then click "Next"
-
Now pick a project name, then in "Project Type:" under "Makefile Project" select Empty Project
NoteYour toolchain depends on your board’s processor. See board specific documentation for the correct option. The shown option is for a 75G5 (Xilinx Zync 7015)
-
Now right click on the project and click "properties"
-
In "C/C++ Build" turn on "Generate Makefiles Automatically"
-
Now right click on your project click New → Source File
-
You can name it whatever you want, For this example I am naming it "main.c"
-
Create a main method, I am using the following code
#include <stdio.h> #include <stdlib.h> int32_t main(void) { printf("Hello World\n"); return 0; }
-
When you are done, press CTRL+B to build the project
-
Now go back to project properties (Right click project, the "properties") and turn of generate make files
-
Now we have to make sure the the compiler uses the "pthreads" library for multi-threading
-
In the "C/C++ Projects" tab, open the default folder of your project, then open the makefile
The make file should contain the following lines
Multicore_Multiprocessed: $(OBJS) $(USER_OBJS) @echo 'Building target: $@' @echo 'Invoking: ARM Linux gcc linker' arm-xilinx-linux-gnueabi-gcc -o "Multicore_Multiprocessed" $(OBJS) $(USER_OBJS) $(LIBS) @echo 'Finished building target: $@' @echo ' '
-
In the following line
arm-xilinx-linux-gnueabi-gcc -o "arm-xilinx-linux-gnueabi-gcc -pthread -o "Multicore_Multiprocessed" $(OBJS) $(USER_OBJS) $(LIBS)
add
-pthread
afterarm-xilinx-linux-gnueabi-gcc
, so the section of code should look likeMulticore_Multiprocessed: $(OBJS) $(USER_OBJS) @echo 'Building target: $@' @echo 'Invoking: ARM Linux gcc linker' arm-xilinx-linux-gnueabi-gcc -pthread -o "Multicore_Multiprocessed" $(OBJS) $(USER_OBJS) $(LIBS) @echo 'Finished building target: $@' @echo ' '
Now you can start making your application
-
Sample Apps
All the sample apps will use the following base template:
#define _GNU_SOURCE
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <ctype.h>
#include <stdbool.h>
#include <pthread.h>
#include <sched.h>
#include <unistd.h>
int32_t main(void)
{
//code goes here
}
Sorting
For the 2 sorting applications we will use some of the same functions:
-
Let’s start with a bubble sort function. Bubble sort is slow, which helps visualize the processing time
//Bubble Sort Helper void swap(int* xp, int* yp) { int temp = *xp; *xp = *yp; *yp = temp; } void bubbleSort(int arr[], int n) { int i, j; bool swapped; for (i = 0; i < n - 1; i++) { swapped = false; for (j = 0; j < n - i - 1; j++) { if (arr[j] > arr[j + 1]) { swap(&arr[j], &arr[j + 1]); swapped = true; } } if (swapped == false) break; } }
-
This is optional but a print array function can be useful for testing
// Function to print an array void printArray(int arr[], int size) { int i; for (i = 0; i < size; i++) { printf("%d ", arr[i]); } }
Multi-processed Sorting
To start with the multi-processed sorting, add the following variables and macros that will be used in the application
#define TIME (int)time(NULL) - startTime
# define CPU_ZERO(cpusetp) __CPU_ZERO_S (sizeof (cpu_set_t), cpusetp)
# define CPU_SET(cpu, cpusetp) __CPU_SET_S (cpu, sizeof (cpu_set_t), cpusetp)
bool bQuit = false;
To have multiple processes we need to create a function that each process will run. In this process we will be creating our array and sorting it, while keeping track of the time.
/*
* Work of a single process.
*/
void ProcessWork(int core)
{
cpu_set_t mask; //A CPU Set keeps track of cpu cores
CPU_ZERO(&mask); //This sets the spot in memory of a CPU set to zero
CPU_SET(0, &mask); // This sets a CPU set to the specified core
int result = sched_setaffinity(core, sizeof(mask), &mask); // This sets teh core
if (result != 0)
{
printf("Failed to set affinity\n");
return;
}
int startTime = (int)time(NULL);
// Keeps track of the start time
// Create an array of integers from 10000 - 1
int arr[10000];
srand(0);
int i;
for (i = 10000; i > 0; i--) {
arr[10000 - i] = i;
}
int n = sizeof(arr) / sizeof(arr[0]);
bubbleSort(arr, n); // Sort the array
printf("Sorted array created: \n");
// This code gets the current core
cpu_set_t setcpuset;
CPU_ZERO(&setcpuset);
CPU_SET(0, &setcpuset);
sched_getaffinity(0, sizeof(cpu_set_t), &setcpuset);
fprintf(stdout, "Timestamp: %d seconds", TIME);
printf(" on core %d\n", setcpuset);
return (void*)0;
}
Now that we created a function for what a process does, we have to create the processes, we will do this in the main function.
int main(int argc, const char * argv[])
{
int i = 0;
while (!bQuit) //this loops runs 30 times
{
printf("\n set: %d\n", i); //this prints which set of processes we're at
pid_t pid[2];
int core1 = 0;
int core2 = 1;
pid_t pid1 = fork(); //fork() creates the fork and enters it
if(pid1 == 0)
{
pid[1] = pid1; //ads the fork to our array of processes
ProcessWork(core1);
exit(0); //this exits the fork
}
pid_t pid2 = fork();
if(pid2 == 0)
{
pid[2] = pid2;
ProcessWork(core2);
exit(0);
}
int j; //this waits for both processes to be done before continuing on
for(j = 0; j < 2; j++)
{
wait(&pid[j]); //example of context switching
}
if (i == 29)
{
bQuit = true;
}
i++;
}
return 0;
}
Multi-threaded Sorting
Similar to the multi-processed version, we need to declare our variables and macros.
#define TIME (int)time(NULL) - startTime
# define CPU_ZERO(cpusetp) __CPU_ZERO_S (sizeof (cpu_set_t), cpusetp)
# define CPU_SET(cpu, cpusetp) __CPU_SET_S (cpu, sizeof (cpu_set_t), cpusetp)
bool bQuit = false;
pthread_mutex_t lock; //the lock is optional, it locks out other threads from using the shared memory
Now we need to create a function to select a core
int selectCore(int core_id)
{
const int num_cores = sysconf(_SC_NPROCESSORS_ONLN); //gets num of cores
if (core_id < 0 || core_id >= num_cores)
return -1;
cpu_set_t cpuset; //creates cpu set
CPU_ZERO(&cpuset); //zeros the cpu set
CPU_SET(core_id, &cpuset); //set the core of the cpu set
pthread_t current_thread = pthread_self(); //gets the current thread
int toReturn = pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset);
//sets the core for the thread to run at
return toReturn;
}
Now we need to make the function that each thread will run. The commented out lines are optional.
void* runThread(void* data)
{
// pthread_mutex_lock(&lock); OPTIONAL (slower): Enable the lock on shared memory
int startTime = (int)time(NULL); //start reading time
int* corePtr = (int*)data; //integer of what core to use
if (!corePtr) //check null pointer
{
printf("Channel Doesn't Exist\n");
return (void*)1;
}
int core_id = *corePtr;
// Sets the core of the thread
if (selectCore(core_id) != 0)
{
printf("Failed to set thread affinity for channel %d\n", core_id);
return (void*)1;
}
//Gets the current core being used
pthread_t current_thread = pthread_self();
cpu_set_t setcpuset;
CPU_ZERO(&setcpuset);
CPU_SET(core_id, &setcpuset);
pthread_getaffinity_np(current_thread, sizeof(cpu_set_t), &setcpuset);
//create the array
int arr[10000];
srand(0);
int i;
for (i = 10000; i > 0; i--)
{
arr[10000 - i] = i;
}
int n = sizeof(arr) / sizeof(arr[0]);
bubbleSort(arr, n);
printf("Sorted array created: \n");
// printArray(arr, n);
fprintf(stdout, "Timestamp: %d seconds", TIME);
printf(" on core %d\n", setcpuset);
// pthread_mutex_unlock(&lock); Unlock the shared memory lock
return (void*)0;
}
Now we just need the main method to create the threads
int32_t main(void)
{
int i = 0;
while (!bQuit)
{
printf("\n set: %d\n", i);
int core1 = 0;
int core2 = 1;
pthread_t th1, th2;
//create the threads
pthread_create(&th1, NULL, runThread, &core1);
pthread_create(&th2, NULL, runThread, &core2);
//wait for the threads to be complete before continuing
pthread_join(th1, NULL);
pthread_join(th2, NULL);
//this is an example of context switching
if (i == 29)
{
bQuit = true;
}
i++;
}
return 0;
}
And now you have multi-threaded sorting algorithm
Multi-threaded Failure Detection
For multi-threaded failure detection we need teh following variables and macros
#define TIME (int)time(NULL) - startTime
# define CPU_ZERO(cpusetp) __CPU_ZERO_S (sizeof (cpu_set_t), cpusetp)
# define CPU_SET(cpu, cpusetp) __CPU_SET_S (cpu, sizeof (cpu_set_t), cpusetp)
//shared memory
bool bQuit = false;
int currentState = 1; //keeping track of states
pthread_t th1, th2; //Declaring this as a global variable so we can cancel the thread
Note
|
We will be using the sorting algorithm that we made earlier for this application |
Now we need to add a function to select a core. We will be using the same from the Multi-core Sorting application
int selectCore(int core_id)
{
const int num_cores = sysconf(_SC_NPROCESSORS_ONLN); //gets num of cores
if (core_id < 0 || core_id >= num_cores)
return -1;
cpu_set_t cpuset; //creates cpu set
CPU_ZERO(&cpuset); //zeros the cpu set
CPU_SET(core_id, &cpuset); //set the core of the cpu set
pthread_t current_thread = pthread_self(); //gets the current thread
int toReturn = pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset);
//sets the core for the thread to run at
return toReturn;
}
Next we need to create a function to run the sorting. This function will take the parameters for cpuset and state for logging purposes so we can make sure the application is working properly
void* RunSorting(cpu_set_t* setcpuset, int state)
{
int startTime = (int)time(NULL);
int arr[10000];
srand(0);
int i;
for (i = 10000; i > 0; i--)
{
arr[10000 - i] = i;
}
int n = sizeof(arr) / sizeof(arr[0]);
bubbleSort(arr, n);
printf("Sorted array created: \n");
printf(stdout, "Timestamp: %d seconds", TIME);
// The reason the printf statements are separate like this is because the cpuset causes issues with printing as it can have multiple indexes
printf(" on core %d ran on state ", *setcpuset);
printf("%d\n", state);
return (void*)0;
}
Now we need to create our main program, this will set a core and switch between states.
void* mainProgram(void* data)
{
//set the core of the thread
int* corePtr = (int*)data;
if (!corePtr)
{
printf("Channel Doesn't Exist\n");
return (void*)1;
}
int core_id = *corePtr;
// Set thread affinity to core based on channel number
if (selectCore(core_id) != 0)
{
printf("Failed to set thread affinity for channel %d\n", core_id);
return (void*)1;
}
//keep track of the current core for logging
pthread_t current_thread = pthread_self();
cpu_set_t setcpuset;
CPU_ZERO(&setcpuset);
CPU_SET(core_id, &setcpuset);
pthread_getaffinity_np(current_thread, sizeof(cpu_set_t), &setcpuset);
int i = 0;
//switch between the states
while (bQuit == false)
{
//state 1
currentState = 1;
RunSorting(&setcpuset, 1);
//state 2
currentState = 2;
if (i != 2) //don't run state 2 after 3rd cycle to test error detection
{
RunSorting(&setcpuset, 2);
}
i++;
}
}
Next we have to create the state error testing code that will run on the other core
void* stateTest(void* data)
{
//set core of the thread
int* corePtr = (int*)data;
if (!corePtr)
{
printf("Channel Doesn't Exist\n");
return (void*)1;
}
int core_id = *corePtr;
// Set thread affinity to core based on channel number
if (selectCore(core_id) != 0)
{
printf("Failed to set thread affinity for channel %d\n", core_id);
return (void*)1;
}
//create local variables
bool stateFail = false;
int lastState = 1;
int startTime = (int)time(NULL);
//loop until program closes
while(bQuit == false)
{
//detect state change
if (lastState != currentState)
{
//log failure if it changed too quickly
if (TIME < 2) //NOTE: THIS NUMBER IS FOR 75G5 PROCESSING TIME
{
stateFail = true;
}
startTime = (int)time(NULL);
lastState = currentState;
}
//Log failure if processing time is too long
if (TIME > 6) //NOTE: THIS NUMBER IS FOR 75G5 PROCESSING TIME
{
stateFail = true;
}
//If failure is detected cancel main thread and close program
if (stateFail == true)
{
printf("FAILURE DETECTED\n");
pthread_cancel(th1);
bQuit = true;
}
else
{ //This is optional, I added it for visualisation, but it just tells us that it's working properly
printf("Working Properly\n");
}
//check for errors every seconds
sleep(1);
}
}
The main method is pretty simple, just creating the the threads, assigning what core to put them in, and waiting for the threads to finish executing
int32_t main(void)
{
int core1 = 0;
int core2 = 1;
pthread_create(&th1, NULL, mainProgram, &core1);
pthread_create(&th2, NULL, stateTest, &core2);
pthread_join(th1, NULL);
pthread_join(th2, NULL);
return 0;
}
And now you have a simple application the tests for a state machine’s failure based on execution time.
Conclusion
-
There are 2 ways to handle multi-core applications
-
Multi-threading
-
Multi-threading has shared memory
-
-
Multi-processing
-
Multi-processing has separated memory
-
Multi-processing is good for avoiding memory conflicts between processes
-
-
-
Multi-core applications dramatically improve processing speeds