April 2026
M	T	W	T	F	S	S
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Designing a Simple Real-Time Operating System from Scratch

Log Entry 1: Defining Core Objectives and Hardware Abstraction

The initial phase involved establishing a clear set of requirements to prevent scope creep. I decided that this RTOS would target a single-core ARM Cortex-M class microcontroller (like the STM32F4 series) due to its prevalent use in real-time embedded systems. The primary objectives were to implement preemptive, priority-based scheduling with a fixed number of tasks (up to 16), provide inter-task communication via message queues and semaphores, and ensure deterministic interrupt handling with a worst-case latency below 10 microseconds. I began by writing a hardware abstraction layer (HAL) in C and assembly to manage the SysTick timer (for generating periodic ticks), the PendSV exception (for context switching), and the NVIC (Nested Vectored Interrupt Controller) for priority management. The first milestone was successfully configuring the SysTick timer to fire an interrupt every 1 millisecond, which would serve as the system’s heartbeat.

Log Entry 2: Task Control Block and Context Switching Mechanism

The next step was designing the Task Control Block (TCB), a C structure that holds a task’s stack pointer, program counter, priority, state (ready, running, blocked, or suspended), and a unique identifier. Each task requires its own private stack space, which I allocated statically as an array to avoid the complexity of dynamic memory allocation. The critical challenge was implementing the context switch—the process of saving the current task’s CPU registers (including program counter, link register, and general-purpose registers) onto its stack and restoring the next task’s registers. Using ARM’s AAPCS convention, the hardware automatically pushes xPSR, PC, LR, R12, and R0-R3 on interrupt entry. For the remaining registers (R4-R11), I wrote an assembly function called PendSV_Handler that pushes these onto the current task’s stack, then calls a scheduler to select the next task, and finally restores the new task’s registers. I tested this with two dummy tasks that toggle GPIO pins, verifying that the PendSV interrupt could be triggered manually via software.

Log Entry 3: Scheduler Implementation and Priority Management

With the context switch working, I turned to the scheduler. I chose a fixed-priority preemptive scheduling algorithm with round-robin for equal priorities. The scheduler maintains a bitmap of ready tasks (16 bits, where each bit represents a priority level 0-15, with 0 being highest). Finding the highest-priority ready task reduces to a single __CLZ (count leading zeros) CPU instruction, making it O(1). The tick handler (SysTick ISR) increments a counter and, if the currently running task’s time slice expires (default 10 ms) or a higher-priority task becomes ready, sets the PendSV flag to trigger a context switch at the end of the interrupt. I implemented functions: os_task_create(), which initializes a TCB and sets up the initial stack frame (mimicking an interrupt return), and os_start(), which launches the first task by manually triggering PendSV. A bug emerged where a task with priority 0 (highest) would starve lower priority tasks—I fixed this by ensuring the idle task (lowest priority, always ready) is never blocked.

Log Entry 4: Intertask Communication and Synchronization

To enable cooperation between tasks, I implemented semaphores (binary and counting) and message queues. The semaphore structure holds a count and a waiting list of TCBs. The os_sem_take() function disables interrupts (briefly, to protect critical sections), checks if the count is > 0; if so, decrements and returns. If the count is 0, the calling task is blocked (its state set to WAITING) and added to the semaphore’s list, then a yield occurs. The os_sem_give() increments the count and, if any tasks are waiting, moves the highest-priority waiter to the ready queue and triggers a reschedule. Message queues followed a similar pattern with a ring buffer. I verified correctness by creating a producer task that sends integers and a consumer task that receives them, ensuring that the consumer blocks when the queue is empty. A subtle issue was priority inversion: a low-priority task holding a semaphore could block a high-priority task. I addressed this by implementing a simple priority inheritance protocol in the semaphore code—when a high-priority task waits on a semaphore held by a lower-priority task, the lower task inherits the higher priority temporarily.

Log Entry 5: Timer Services and Idle Task Optimization

Real-time systems often require delays and timeouts. I added a system timer tick counter (32-bit, increments every 1 ms) and a os_sleep() function that puts the calling task into a WAITING_DELAY state for a specified number of ticks. The tick handler checks a delta list of delayed tasks and moves those whose delay expires back to READY. I also implemented os_sem_take_timeout(), allowing a task to wait for a semaphore with a bounded time. The idle task (priority 15) runs when no other task is ready; initially it simply executed a WFI (wait for interrupt) instruction to save power. However, I noticed that tick interrupts could wake the CPU unnecessarily—so I added a dynamic tick mechanism where the next wake-up time is calculated and the SysTick period is adjusted, but later reverted due to complexity, opting instead for a simple counter in the idle task that puts the system into deep sleep until the next tick. This reduced power consumption by ~70% in simulation.

Log Entry 6: Debugging, Profiling, and Final Validation

The final phase involved rigorous testing for race conditions and real-time guarantees. I wrote a test suite that creates 8 tasks with different priorities, each toggling a pin and measuring jitter using a logic analyzer. The worst-case context switch time measured 3.2 µs (including PendSV handling). A memory footprint analysis showed the RTOS kernel occupied 2.8 KB of ROM and 600 bytes of RAM plus per-task stacks (256 bytes each). I discovered a critical bug: if an interrupt occurred while a task was modifying the ready queue bitmap, the scheduler could see an inconsistent state. I fixed this by using a “global interrupt disable” pattern in all kernel entry points, but later optimized to a “baseline priority masking” that only disables interrupts up to the kernel’s priority level, allowing high-priority hardware interrupts to still fire. After passing 48 hours of stress testing with random task creation and deletion (though dynamic deletion was not originally planned, I added it for robustness), the RTOS achieved all design goals. The final documentation includes a 25-page guide on using the API, memory configuration, and porting to other ARM Cortex-M devices.

Menu

Archives

Calendar

Categories