Understanding fork() and Process Duplication
In Unix-like systems, fork() is the fundamental system call for creating a new process. When invoked from a C++ program (via #include <unistd.h>), the operating system spawns a child process that is an almost exact copy of the parent.
How fork() Works
- One call, two returns: The parent receives the child's PID (positive integer), the child gets 0, and on failure -1 is returned.
- Copy-on-Write (CoW): Memory pages are shared until either process modifies data, then a private copy is made for the writing process. This optimizes performance.
- Independent execution: Both processes continue from the point after
fork(), but scheduling order is unpredictable.
Basic Example with Error Handling
#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
#include <cstdlib>
int main() {
pid_t pid = fork();
if (pid == -1) { // Fork failed
std::cerr << "Fork error!" << std::endl;
exit(EXIT_FAILURE);
} else if (pid == 0) { // Child process
std::cout << "Child PID: " << getpid()
<< ", Parent PID: " << getppid() << std::endl;
exit(EXIT_SUCCESS); // Terminate child
} else { // Parent process
std::cout << "Parent PID: " << getpid()
<< ", Created child PID: " << pid << std::endl;
wait(nullptr); // Wait for child to exit
std::cout << "Child done." << std::endl;
}
return 0;
}
Use getpid() to get current process ID, and wait() to avoid zombie processes. The child calls exit() to avoid running parent code.
Creating Multiple Children
Loop with careful exit prevents grandchildren. Each child process must terminate after its work or the loop may create an unintended process tree.
#include <iostream>
#include <unistd.h>
#include <sys/wait.h>
#include <cstdlib>
int main() {
static const int N = 3;
for (int i = 0; i < N; ++i) {
pid_t pid = fork();
if (pid == -1) {
perror("fork");
exit(EXIT_FAILURE);
}
if (pid == 0) {
std::cout << "Child " << i+1 << " PID: " << getpid() << std::endl;
exit(0);
}
}
// Parent waits for all children
while (wait(nullptr) > 0);
std::cout << "All children terminated." << std::endl;
return 0;
}
Inter-Process Communication (IPC) After fork()
Since processes have private memory spaces, any data exchaneg requires explicit IPC. The most common methods are pipes and shared memory.
Anonymous Pipe (Single Direction)
A pipe gives two file descriptors: one for reading (fd[0]) and one for writing (fd[1]). After fork(), close the unused end in each process to create a unidirectional channel.
#include <iostream>
#include <unistd.h>
#include <cstring>
#include <sys/wait.h>
int main() {
int fd[2];
if (pipe(fd) == -1) { perror("pipe"); return 1; }
pid_t pid = fork();
if (pid == 0) {
close(fd[1]); // close write end in child
char buf[256];
ssize_t n = read(fd[0], buf, sizeof(buf) - 1);
if (n > 0) {
buf[n] = '\0';
std::cout << "Child received: " << buf << std::endl;
}
close(fd[0]);
return 0;
} else {
close(fd[0]); // close read end in parent
const char *msg = "Hello from parent!";
write(fd[1], msg, strlen(msg));
close(fd[1]);
wait(nullptr);
}
return 0;
}
Shared Memory (Fastest IPC)
With shmget(), shmat(), and shmdt(), processes can directly read/write the same physical memory. Synchronization (e.g., semaphores) is needed to avoid race conditions.
#include <iostream>
#include <unistd.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <cstring>
#include <sys/wait.h>
int main() {
key_t key = IPC_PRIVATE;
int shmid = shmget(key, 1024, IPC_CREAT | 0666);
if (shmid == -1) { perror("shmget"); return 1; }
char *data = (char *)shmat(shmid, nullptr, 0);
if (data == (char *)-1) { perror("shmat"); return 1; }
pid_t pid = fork();
if (pid == 0) {
std::cout << "Child sees: " << data << std::endl;
shmdt(data);
return 0;
} else {
strcpy(data, "Parent wrote this.");
wait(nullptr);
std::cout << "Parent done." << std::endl;
shmdt(data);
shmctl(shmid, IPC_RMID, nullptr);
}
return 0;
}
fork() vs. Threads
fork() creates a new process with its own address space (heavy, but isolated). Threads share the same address space (light, but risky). Use fork() when you need strong isolation, or when the child will call execve() to run a different program. Use threads when low overhead and easy data sharing is desired, but beware of synchronization issues.
Key Differences
| Feature | fork() (Process) | pthread (Thread) |
|---|---|---|
| Memory space | Private (CoW) | Shared |
| Communication | Requires IPC (pipe, shm, etc.) | Shared variables + synchronization |
| Creation overhead | High (copy page tables, etc.) | Low (only stack and context) |
| Crash isolation | Independent | Whole process may crash |
Note: If a multithreaded process cals fork(), only the calling thread is duplicated. This can cause deadlocks if locks were held by other threads. Use pthread_atfork() to handle such cases.