Virtual Memory With 256 Bytes of RAM - Interactive Demo
2016-01-10 - By Robert Elder
Introduction
I just finished updating my C compiler and minimal CPU specification to include virtual memory. The process of adding virtual memory support to a CPU, a kernel, and a compiler toolchain has made me realize that there is a lack of good explanations for how all of these parts fit together. I also realized that a surprising number of programming decisions end up relating back to virtual memory in a way that beginners might not realize. Therefore, I will review some of the aspects of virtual memory that I feel aren't explained elsewhere using this interactive explanation:
Complete Virtual View |
Complete Physical (Linear) View |
Sizes in Bytes
|
What You See Above
You can click on a memory location above for a detailed explanation. The two views you see above represent the full physical, and full virtual views of a 256 byte address space. The virtual view represents what a typical program would see, and the physical view represents what states are actually sitting in physical RAM. This representation does not include a TLB (translation lookaside buffer). Here are a few descriptions that reference what you're seeing above. The objects as they are represented below will respond to any size changes you make using the controls above.
This is a pointer that points to the current top level paging structure. In our example, that is the page directory table. On Intel machines this value is stored in the CR3 register. On ARM it is more complicated[1] | |
First-level paging structure: These are often called 'page directories' in a 2-level paging scheme. You will generally have a page directory for each process in your system. It is common on newer machines that must support more than 4gb of memory to use additional levels of page tables, such as that found in Physical Address Extension[1]. In our example, each entry in the directory is only 8 bits, or 1 byte. Each entry contains information about the location of a page table, and some bits that tell us information about that table. | |
Second-level paging structure: These are what you'd refer to as 'page tables'. It is possible to have one page mapped into multiple page directories. This would be a good idea if you wanted to allow a large number or processes to access a read-only shared library without copying the library itself multiple times. Once again, each entry in the page table is only 8 bits, or 1 byte. Each entry contains information about the location of a physical page, and some bits that tell us information about that memory page. | |
Physical page currently being accessed. | |
The currently active page directory entry, or page table entry. The index of this entry comes from the decomposition of the currently selected virtual address as shown above. | |
Currently Selected Memory Location | |
Readable Memory. Permissions aren't actually checked in this example, but in a real implementation this bit would be checked against the required method of access. | |
Writable Memory. Permissions aren't actually checked in this example, but in a real implementation this bit would be checked against the required method of access. | |
Executable Memory. Permissions aren't actually checked in this example, but in a real implementation this bit would be checked against the required method of access. | |
Inaccessible Virtual Memory. In this example, if we traverse a page table or page directory entry where the initialised bit is not set, we assume that this represents a page fault. In a real implementation, there could be many reasons for a page fault, and each would be handled differently. | |
Unitialised Physical Memory. This represents memory in the physical view that would result in a page fault if it was accessed from the virtual memory address space. | |
Inaccessible Physical Memory. This represents memory locations that we cannot even attempt to access because there are no page table mappings that would translate to addresses here. |
Identity Mapping
An identity mapping is one of the simplest virtual memory mappings that you can think of: Every physical address is mapped to a virtual address that is exactly the same. This type of mapping isn't very useful for a fully featured OS, but is a useful step in bootstrapping some systems. This is the type of mapping that is currently used in the toy microkernel I've been working on.
Recursive Mapping
When implementing virtual memory in your kernel, in order to be able to manage memory you will need to know the physical memory location of your paging structures. Otherwise, how would you know what to set the contents of a page directory entry to? Once you enable the MMU you're only able to interact directly with virtual memory addresses, unless you disable it or find a way to go around it. This can make keeping track of physical addresses very complicated. One solution is to use recursive page tables. When using recursive page tables, we can answer one very useful question: "What is the current virtual address that corresponds to the physical address of any page, page table entry, or page directory entry?". We can answer this question by considering the indices in the virtual address itself that would allow us to traverse down to that entry or page.
Think about it this way: With the top level paging structure pointer, you already know at least one physical memory pointer, and you know what it points to. If you set a recursive entry in the top level paging structure, then you can easily predict how to create a virtual address that would access any physical address within that top level paging structure. All you need to do is think about what virtual address you would have to generate that would hit that recursive entry (when inspecting the first paging structure index from the virtual address). You'll then need to make sure that the remaining indices and offset reference memory within your page directory or page table too (this will depend on your implementation details, like page sizes, number of entries etc.). As you generate virtual addresses that match up with entries in your page directory (by going through the recursive entry), your page directory entries will be treated as page table entries. As long as the structure of your page table entries is compatible with the structure of page director entries, then treating them the same in the address translation should not cause a problem, and you'll end up being able to reference any paging structure directly from virtual memory, including the page directory table. For referencing page table entries, you'll generate a virtual address that goes through the recursive directory entry once. For referencing page directory entries, you'll generate a virtual address that goes through the recursive directory entry twice.
One disadvantage of a recursive mapping is that it can waste some address space. The exact quantity that will be wasted depends on your implementation.
Everything Mapped to the Same Page
One important feature of virtual memory is that it allows you to map physical pages to multiple virtual addresses in memory. This is great because it allows you do things like map pages that belong to a piece of shared read only memory into multiple processes. Because virtual memory also allows you to check the access permissions, we can also enforce the read-only policy for everything that has access to this page. This is something that just wouldn't be possible with purely physical memory access.
Page Faults Everywhere
The example above for 'Page Faults Everywhere', is meant to demonstrate an example case where we could detect a page fault from traversing the page tables. In this toy implementation, a page fault occurs when the initialised bit is not set on a page directory or page table. In addition, it would also be a page fault to attempt to access a memory location with the wrong permissions (for example attempting to execute data). In this implementation there is no way to trigger a page fault based on access permissions, they are just there for show.
Context Switching Between 2 Processes
The example above for 'Context Switching Between 2 Processes', is meant to demonstrate how you only need to switch a single pointer in order to change the entire virtual memory layout. By changing the top level paging pointer, we point to a different paging directory and the contents of the virtual view change completely. You can see that the addresses that are available in the virtual view are still the same, but their contents are different. This explains why in an OS with virtual memory, multiple processes can use the same virtual pointer, but in reality it can point to different physical memory locations.
Solving External Fragmentation
There are yet more applications to virtual memory, as seen in the above example 'Solving External Fragmentation'. External fragmentation is a nasty problem, and for some cases there is no way implement a reasonable solution with pure physical memory. Image the following case: Your computer has 4gb of memory, no hard disk, and you happen to have some sequence of memory allocations and de-allocations (calls to malloc and free) that cause you to end up in a situation where the entire memory space is free, except for a 1 byte entry in the very middle of memory. If you then get a request to allocate a large 3gb block, you can't do it even though you have more than that much memory unused! There are a couple things that you might consider doing: 1) Just move the 1 byte entry to the end, or 2) Somehow give 2 disconnected pieces of memory and let the process figure it out. The first possible solution would cause terrible performance if the 1 byte were in fact a larger amount of memory (say 1gb) that we had to copy around, but there is an even bigger problem with the idea of moving the memory: If we change what is under the original pointer location, then we would need to reach into the process that we originally gave the pointer to, and somehow update every reference to that memory location! This isn't likely to be practical because the process could do all sorts of crazy things like sorting based on pointers returned from malloc, storing pointers or metadata in a file etc. There is no reliable way to tell the process that the pointer changed. The other idea of giving 2 disconnected pieces of memory to the process won't work, because the process expects the memory block to be contiguous. If this stops being true, it would have to generate completely new instructions for how things like array member access works, and maintain some bookkeeping information about how to get the correct address for its data items. Virtual memory allows us to solve this problem very efficiently, because we can simply re-map the virtual address space to make disconnnected chunks of physical memory appear connected in the virtual address space. To do this, we don't move any data around, we just update page table entries.
Swapping to Disk
If you ask many people what the purpose of virtual memory is, they will often mention something that involves the fact that virtual memory is used to implement swapping to disk. Although this is true, as you can see from the rest of this article, virtual memory is about far more than being able to swap to a hard disk.
Copy on Write
One of the places where the use of virtual memory is extremely useful for performance is in the implementation of fork[1]. The reason is because at the moment of forking a process, it is true that both the parent and child are duplicates of each other. If we performed a deep copy of every memory page that the forked process was using, this would eat up a lot of CPU cycles and waste a lot of RAM. Instead, fork uses copy on write. The idea is that we simply copy the page tables of the parent process, and let the new child process reference the same underlying physical pages. In order to prevent writes from one process showing up in the other, each page table entry is marked with a 'copy-on-write' flag. When one of the processes attempts to do a write to one of these pages, it is interrupted, and a copy of the page in question is made only when needed. The process doing the write will have no knowledge that this copy process occured. In practice, most pages will never be modified after a fork, so this can make forking very processor and memory efficient.
The Need for Executable Packages Like ELF
You may have wondered at times why we have 'executable package' like files, such as 'ELF' and '.EXE'. After all, a program is really just a stream of instructions and data, why not just lay it out as one flat image? There are a few good answers, such as packaging debugging symbols, specifying the Application binary interface, but I would argue that the main reason is to support virtual memory (or segmentation, as we used to use). In order to take advantage of things like re-using read-only pages from multiple process instances, or disallowing write access for constants, we need to take advantage of virtual memory. In order to make the overhead of virtual memory realistic, memory has to be split up into pages that are of reasonable size (4096 bytes for example). If we allowed memory pages of an arbitrary number of bytes, we could do it, but then we need larger amounts of metadata to support the page tables. This requirement on page size also brings with it a lot of other requirements like page alignment, and the need to collect data in the executable into groupings according to what kind of data it is.
Scanning Pages From Inside A Process
Let's do some experimenting on my machine to see what we can learn about the virtual memory layout of simple processes on my machine (Ubuntu 14.04) with GCC. Note that these examples are very platform dependent, so you will almost certainly have to modify them.
Let's declare a few variables beside each other, and see if they end up beside each other in the process according to their pointers:
#include <stdio.h>
const char a = 'a';
char b = 'b';
char c(void){return 0;};
const char d = 'd';
char e = 'e';
char f(void){return 0;};
int main(){
printf("a: %p, b: %p, c: %p, d: %p, e: %p, f: %p\n", (void *)&a, (void *)&b, (void *)&c, (void *)&d, (void *)&e, (void *)&f);
return 0;
}
And the result I get is
a: 0x400618, b: 0x601040, c: 0x40052d, d: 0x400619, e: 0x601041, f: 0x400538
These pointers definitely aren't all beside each other as they were declared, but you'll notice that the constants are beside each other, the writable characters are beside each other, and the functions are beside each other. Let's snoop around those memory locations to see what we can find until we get a page fault. In the next code example, you will likely have to adjust your value of VARIABLES_OFFSET, and CONSTANTS_OFFSET. If you're on a different platform than me, this might not work at all, but this was the example that worked for me. Because I happen to know that pages on my system are exactly 4096 bytes, I can scan the entire page without getting a page fault, but if I get the offset wrong by even one byte, then the program will cause a page fault and be terminated with the message 'Segmentation Fault'. If I get a page fault before any printing happens, I know the offset is too large. If I get a page fault at the end, I know that it is too small.
#include <stdio.h>
#define PAGE_SIZE 4096
#define CONSTANTS_OFFSET 2216
#define VARIABLES_OFFSET 88
const char a = 'a';
char b = 'b';
char c(void){return 0;};
const char d = 'd';
char e = 'e';
char f(void){return 0;};
void nice_print(unsigned int i, char c){
if(i != 0 && (i % 128) == 0){ printf("\n"); }
if(c > 31 && c != 127){ printf("%c", c); }else{ printf("."); }
}
int main(){
signed int i;
printf("%p\n", (void *)&a);
printf("%p\n", (void *)&b);
printf("%p\n", (void *)&c);
printf("%p\n", (void *)&d);
printf("%p\n", (void *)&e);
printf("%p\n", (void *)&f);
printf("Memory contents of constants page:\n");
for(i = 0; i < 4096; i++){
nice_print(i,*(&a + (i - CONSTANTS_OFFSET)));
fflush(stdout);
}
printf("\nConstants begin in page at address %p\n", (void *)(&a - CONSTANTS_OFFSET));
printf("Memory contents of variables page:\n");
for(i = 0; i < 4096; i++){
char c = *(&b + (i - VARIABLES_OFFSET));
*(&b + (i - VARIABLES_OFFSET)) = c; /* Test write permission */
nice_print(i,c);
fflush(stdout);
}
printf("\nVariables begin in page at address %p\n", (void *)(&b - VARIABLES_OFFSET));
return 0;
}
And here is the result of running the above on my computer. Note that I've highlighted the variables I declared above in red:
0x4008a8
0x601058
0x40064d
0x4008a9
0x601059
0x400658
Memory contents of constants page:
.ELF..............<.....`.@.....@...................@.8...@.............@.......@.@.....@.@.....................................
8.......8.@.....8.@...............................................@.......@....................... .......................`.....
..`.....J.......`......... .............(.......(.`.....(.`.....................................T.......T.@.....T.@.....D.......
D...............P.td....P.......P.@.....P.@.....L.......L...............Q.td....................................................
R.td..............`.......`............................./lib64/ld-linux-x86-64.so.2.............GNU.............................
GNU......&...#..b..t...)................................).......................................................................
................................-.......................?... ...........................................&.......`.`.............
.libc.so.6.fflush.puts.putchar.printf.stdout.__libc_start_main.__gmon_start__.GLIBC_2.2.5.......................................
u.i.....N.........`.....................`.`.......................`..................... .`.....................(.`.............
........0.`.....................8.`.....................@.`.....................H...H.... .H..t..[...H...........5.. ..%.. ...@.
.%.. .h..........%.. .h..........%.. .h..........%.. .h..........%.. .h..........%.. .h.........1.I..^H..H...PTI....@.H.. .@.H..
..@.......f..D...g.`.UH-`.`.H...H..w.]......H..t.].`.`...........`.`.UH-`.`.H...H..H..H..?H..H..u.]......H..t.]H...`.`..........
.=a. ..u.UH...~...]..N. ......@.H.=.. ..t......H..t.U. .`.H....].{.......s...UH.......].UH.......].UH..H....}....E..}..t..E.....
.u.......s....}..~..}..t...E....\...........P.....UH..H......@....@.......R....X.`....@.......>....M.@....@.......*......@....@.
...........Y.`....@............X.@....@..............@.......E......6.E.H.H-....H...@........E..........H.... .H........E...}...
..~....@....@..............@..n....E......N.E.H.H..XH.X.`.....E..E.H.H..XH..X.`...E.....U..E......}...H..s. .H...[....E...}.....
~....`..(.@.....................AWA..AVI..AUI..ATL.%.. .UH.-.. .SL).1.H...H....}...H..t.........L..L..D..A...H...H9.u.H...[]A\A]
A^A_.ff.............H...H...............ad%p....Memory contents of constants page:.......Constants.be.in in page at address %p..
Memory contents of variables page:.......Variables.be.in in page at address %p.....;L...................h.......................
....b... .......@...@....................zR..x......................*....................zR..x..........$...........p......F..J.
.w...?.;*3$"........D...5........A....C..F..........d... ........A....C..F..................O....A....C...J.............:...k...
.A....C...f.....D...........e....B....E....E. ..E.(..H.0..H.8..M.@l.8A.0A.(B. B..B..B...........................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................ .@.......@.......................................@...............@...............`.............................
..`........................o......@...............@...............@.............Z...............................................
..`.............................................@.@...............@.............0..........................o......@........o....
...........o......@.............................................................................................................
Constants.be.in in page at address 0x400000
Memory contents of variables page:
(.`.......m.w.....L.w.......w...0...w....T..w....-..w...F.@.........w...................be........K.w...........................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
................................................................................................................................
Variables.be.in in page at address 0x601000
Doing some math based on where we get segfaults at, our knowledge of page sizes, and the offset we used, we can come up with the addresses of 0x400000 and 0x601000 for the read-only section and the read-write section respectively. You can do some quick Googling on those addresses and find that they are in fact common defaults for their respective data sections. You might also notice that the functions we declared are in the same page as the read-only constants that we declared. Since we can execute those functions and we know that virtual memory permissions are set on a page based level, that must mean that we could also execute carefully crafted constants...
Executing Numerical Constants as Code
The program below will declare some dummy constants that we'll update later, and a function that takes an integer and adds 8 to it. This example is also horribly non-portable, but it is at least interesting to show what it does on my machine. In this example, we are relying on the main method to be directly after the 'func1' method, and fortunately that is the case. After we run this program, we will see some output that gives us values of the bytecode for the implementation of the func1 function.
#include <stdio.h>
const unsigned int a = 0x12345678; /* Just placeholders for now */
const unsigned int b = 0x90123456;
const unsigned int c = 0x78901234;
const unsigned int d = 0x56789012;
unsigned int func1(unsigned int i){
return i + 8;
}
int main(void){
unsigned int * i;
unsigned int num;
/* Print out the bytecode for 'func1' */
for(i = (unsigned int*)func1; i < (unsigned int*)main; i++){
printf("%p: 0x%08X\n", (void *)i, *i);
}
num = func1(29);
printf("%u\n", num); /* Prints 37*/
}
The output for me is:
0x40052d: 0xE5894855
0x400531: 0x8BFC7D89
0x400535: 0xC083FC45
0x400539: 0x55C35D08
37
We can simply copy these values into the integer constants, and then (very unportably) rely on them being ordered one after another in memory. Since they are in the same page as executable data, we can cast a pointer to 'a' to a function pointer, and call it like a normal function:
#include <stdio.h>
const unsigned int a = 0xE5894855; /* Update these based on */
const unsigned int b = 0x8BFC7D89; /* what was output in the */
const unsigned int c = 0xC083FC45; /* last run of this program. */
const unsigned int d = 0x55C35D08;
unsigned int func1(unsigned int i){
return i + 8;
}
int main(void){
unsigned int * i;
unsigned int num;
/* Print out the bytecode for 'func1' */
for(i = (unsigned int*)func1; i < (unsigned int*)main; i++){
printf("%p: 0x%08X\n", (void *)i, *i);
}
/* Cast the address of 'a' to a function pointer and call it */
num = ((unsigned int (*)(unsigned int))&a)(29);
printf("%u\n", num); /* Prints 37*/
}
I also tried the above where func1 attempted to do other things like call other functions and access globals, but I wasn't able to get it to work well. I'm not very familiar with Intel assembly, but I believe the reason for this is due to the generated code containing relative offsets, so you can't just move the instructions around without updating the relative offsets in them. I would expect that you could overcome this problem by crafting functions that did not rely directly on global state, but were instead passed parameters (function pointers) in order to do what they needed:
const unsigned int a = 0xE5894855;
const unsigned int b = 0x10EC8348;
const unsigned int c = 0xF87D8948;
const unsigned int d = 0xF0758948;
const unsigned int e = 0xF04D8B48;
const unsigned int f = 0xF8458B48;
const unsigned int g = 0xF0558B48;
const unsigned int h = 0x48CE8948;
const unsigned int i = 0x00B8C789;
const unsigned int j = 0xFF000000;
const unsigned int k = 0x55C3C9D2;
void func1(const char * c, int (*f)(const char *, ...)){
f(c, (void*)f);
}
int main(void){
unsigned int * i;
unsigned int num;
/* Print out the bytecode for 'func1' */
for(i = (unsigned int*)func1; i < (unsigned int*)main; i++){
printf("%p: 0x%08X\n", (void *)i, *i);
}
((void (*)(const char * c, int (*f)(const char *, ...)))&a)("Fcn is %p\n", printf);
}
It looks like this technique works, as I was able to get the above example to print out 'Fcn is 0x400420'. This would probably generalize to things that are even more complicated, just make sure you brush up on your function pointers first.
Conclusion
As you can see, virtual memory is about more than just being able to swap things to disk. It brings a variety of advantages that can be enjoyed on a computer that doesn't even have a hard drive. These benefits are: process isolation; enforcing code segment privileges; solving issues caused by external fragmentation; providing lightweight copy-on-write for process forking; re-use of read-only data in shared libraries; and of course, paging things onto disk.
If you find any errors or have feedback on this page, you can let me know at info@robertelder.org.
Why Is It so Hard to Detect Keyup Event on Linux?
Published 2019-01-10 |
$1.00 CAD |
How To Force The 'true' Command To Return 'false'
Published 2023-07-09 |
The Regular Expression Visualizer, Simulator & Cross-Compiler Tool
Published 2020-07-09 |
A Surprisingly Common Mistake Involving Wildcards & The Find Command
Published 2020-01-21 |
A Guide to Recording 660FPS Video On A $6 Raspberry Pi Camera
Published 2019-08-01 |
The Most Confusing Grep Mistakes I've Ever Made
Published 2020-11-02 |
Use The 'tail' Command To Monitor Everything
Published 2021-04-08 |
Join My Mailing List Privacy Policy |
Why Bother Subscribing?
|