Category Archives: Uncategorized

AArch64 Bare Metal Boot Code

The basic bare metal code required to boot an AArch64 system is not terribly complicated, however, basic code will not do much. The code below handles most basic setup for a Raspberry 3 or 4 and has a few advanced features not found in other boot code examples. The code assumes RPi 3 or 4 HW, as elements in the BCM 283x or BCM2711 peripheral are initialized in the code. It is commented and the BCM specific code is generally abstracted out, so hopefully it is transparent enough that it can be adapted to different AArch64 platforms and different use cases. Features include:

  • Ability to handle entry in EL2 or EL3
  • Auto-detects Raspberry Pi version
  • Sets up the RPi Physical Timer
  • Sets up the General Interrupt Controller (GIC) for the RPi4
  • Sets up the Stacks for EL1 and EL0 Exception Processing
  • Initializes Environment for C Runtime
  • Initializes Environment for C++ Runtime

There are few few limitations at this point:

  • Single Core Only
  • No Virtual Memory Management
  • Semi-specific to RPi 3 & 4, though compatibles *should* work

This is a lengthy post but splitting it into multiple separate posts would probably be distracting. At the 100,000ft level, the idea is that the processor enters the top of this code in either EL3 or EL2, initializes the functions listed above and then exits in EL1 to the C++ code which performs the ‘kernel initialization’ and the kernel itself. All the code discussed in this post for the Raspberry Pi Bare Metal OS project can be found in my Github repository. Not all code in the kernel boot sequence is contained below, particularly the handful of subroutines which initialize different parts of the RPi hardware. Consulting Github for this code will be helpful.

RPi Boot Process Overview

Raspberry Pis have a somewhat unique boot process which works well to prevent bricking of the device. When powered on, it is actually the GPU in the BCM peripheral chip’s video core which starts to run boot code in an internal ROM and eventually starts the ARM processor. The ‘config.txt‘ file and the ‘command_line.txt‘ files are loaded by the video core, parsed and a variety of internal attributes are configured.

Once the two files are parsed and the video core is configured, the GPU loads the ‘armstub‘ file into the right spot in physical memory for the ARM processor to start executing it. On entry to the armstub the ARM core will be running in EL3. It is the armstub file which will eventually jump to the start of the kernel code.

The Raspberry Pi OS ships with ‘armstub8.bin‘ which is the default, but the armstub loaded by the video core can be changed in the ‘config.txt‘ file (consult Github for an example). The default armstub does some initialization before shifting the Exception Level down to EL2 prior to jumping into the kernel.

This project includes a custom ‘armstub’, named ‘armstub_minimal.bin‘. This minimal armstub does nothing more than jump into the kernel code – still at the EL3 Exception Level. This permits the startup code to handle any EL3 initialization that might be required for different use cases. There are a number elements of the HW that need to be configured in EL3, those appear in the boot code below. The boot code below can be used with the default armstub file shipped with the Raspberry Pi OS, as it will detect on entry if the core is running in EL3 or EL2 – and will skip all the EL3 initialization if the core is already running in EL2.

Exception Levels

There are four Exception Levels built into ARM 8 cores. I suspect the term ‘Exception Level’ comes from ARM 8 interrupt processing (interrupts are a subset of more general ‘exceptions’) where different hardware or software ‘exceptions’ (not to be confused with C++ or Java exceptions) are tied to different Exception Levels. Additionally, there are sets of instructions that are restricted to a specific Exception Level.

  • EL3 – Highest Exception Level and the only level in which the processor can switch from ‘secure mode’ to ‘insecure mode’. Code running at EL3 is typically called a ‘Secure Monitor’. EL3 is optional in ARM processors.
  • EL2 – Hypervisor Exception Level, virtualization code will run at this level and page fault exceptions generated by the memory manager when using 2 Stage address translation will be handled in this level. EL2 is also optional in ARM processors.
  • EL1 – What used to be called ‘Ring 0’ in OS development. This is the level the kernel and most interrupt handlers should execute within.
  • EL0 – What used to be called ‘Ring 3’ in OS development. This is the level within which application code will execute.

Exception Levels can change as a result of either (1) an exception which is handled at a specific (usually higher) exception level -or- (2) execution of the ‘eret‘ (exception return) call which permits the core (PE or ‘Processing Element’ in ARM documentation) to potentially drop to a lower exception level. Exceptions can leave the PE at the same EL or move to a higher level and conversely the exception return can leave the PE at the same EL or move it down.

The boot code in this post only supports execution in EL1 and EL0. Maybe in the future I will dabble in a lightweight hypervisor which would pull in EL2 but I doubt I will write a Secure Monitor for EL3. It should be noted that either EL3 or EL2 may be used – but not together. If running in Trusted or Secure Mode, EL2 is disabled.

Boot Code

The code below is a pretty complete Raspberry Pi Aarch64 boot up example which has been tested on RPi 3 and 4 but which should be generalizable to any AArch64 system with EL3, processors without EL3 would require more modifications to initialize subsystems correctly in EL1.

Linker Directives

Below the #defines but just before the assembly code, there are 3 linker directives. The first:

instructs the linker to place the code in the file into the ‘text.boot‘ section of the memory map. The ‘text‘ section is referenced in the linker script described in the previous post. The next directive simply tells the linker to expose the symbol _start global.

Determining Current Exception Level

AArch_64 has a dedicated register for holding the current exception level, unsurprisingly named ‘CurrentEL‘. Bits 2 and 3 of this register hold the exception level – which is binary 0 through 3 for exception levels 0 through 3 respectively.

Configuration in EL3

There are a number of probably non-obvious initialization steps in the boot code. I found these in the RPi Armstub, though I believe they are also described in the ARM Documentation.

First, the L2 Cache for EL1 is configured with a latency of 3 cycles. This register needs to be configured early in the boot process, before memory access occurs. Next, the floating point and SIMD instruction sets are enabled.

The Secure Configuration Register for EL3 (SCR_EL3) is initialized next. The bits set in the SCR_EL3 register appear in the code, and their meaning can be found in the ARM documentation. After the SCR, the Auxiliary Configuration Register for EL3 (ACTLR_EL3) is initialized. The ACTLR_EL3 register contains implementation defined features – so the reference for them will be with the actual processor documentation. After the ACTLR_EL3 register, the CPU Extended Control Register for EL1 (CPUECTLR_EL1) is initialized. Again, documentation is the best place for more detail.

Identifying the RPI Type and Setting Up the Physical Timer

Next, the boot code jumps to a subroutine which identifies the Raspberry Pi Board Type, currently RPi3 or RPi4. This needs to be done to correctly configure the Physical Timer and to determine how to configure the interrupt controller.

The Physical Timer must be configured in EL3 and configuration is different between BCM 283x and BCM 2711 peripherals. The code for identifying the board type and setting up the timer can be found my Github repo. After initializing the Physical Timer, the code will then initialize the Generic Interrupt Controller (GIC 400 in this case) if the board is an RPi4, otherwise the GIC400 initialization is skipped for the RPi3 family. The RPi3 does not contain a GIC.

Further down the boot code, IdentifyBoardType is called again in the boot code, which may seen odd. This is a bit inefficient but fortunately there is not a lot of code in the identification subroutine. The second call to IdentifyBoardType is needed as it occurs in EL1 and is then stored in a global variable which is then accessible to the kernel code. The value cannot be stored after the first call to IdentifyBoardType, as that call is made in EL3 and EL3 has a separate memory space which is not shared with EL1.

Switching to EL2

Just after configuring the Physical Timer and conditionally configuring the GIC, the boot code initializes the System Control Register for EL2 (SCTLR_EL2). This register and the initialization values are in the ARM Documentation. After SCTLR_EL2 is initialized, we jump to the EL2 Exception Level – if we entered in EL3.

In the code snippet above, the Saved Program Status Register for EL3 (SPSR_EL3) is initialized and the address of the ‘running_in_el2‘ symbol is loaded into the Exception Level Return Register for EL3 (ELR_EL3). The meaning of the bits set in the SPSR_EL3 register are in the ARM documentation, but the one worth noting here is the last 4 bits which are initialized with the value 9 which tells the PE to shift to EL2H mode on return from the exception routine. When the ‘eret‘ instruction is executed, then the Exception Level is changed to EL2 and the program counter picks up at the address in ELR_EL3, which is the ‘running_in_el2‘ symbol.

Single-Core Execution

At present, the boot code exits running in single-core mode. PE 0 is used for execution and PE2 1, 2 and 3 are parked in an infinite loop. This is temporary and will be relaxed for SMP execution later – right now, single threaded execution is all that is needed. There are a number of other examples of booting with multiple cores available.

In the code above, the bottom 2 bits of the Multiprocessor Affinity Register (MPIDR_EL1) are checked to see if they hold the value of 0. The MPIDR_EL1 register holds information on the multi-processing state of the hardware and the different PEs. The value of zero in the bottom 2 bits of the register indicates PE 0 is running for an RPi quad PE CPU. There is not much documentation on this register, so for other CPU configurations – you will likely have to do some research to find the magic values to check.

If the current PE is not PE 0, then the PE is simply put into an infinite loop with the ‘wfe‘ instruction to let the CPU know the PE can go into a low power state.

Setting the Stack Pointer for EL1

In the code snippet that follows, the stack pointer for EL1 is set to values associated with symbols defined in the linker script. The stack grows down from the indicated location toward the heap which grows up from the end of the program code. The EL1 stack pointer can only be set in EL2 or EL3.

Processor Configuration

After setting the stack pointer for EL1, there are a handful of configurations for the counter/timer register, disabling EL2 traps for a variety of architectural features, enabling AArch64 in EL1 and finally configuring the CPU for execution in EL1 and EL0. A number of these settings are rather cryptic, particularly for CPTR_EL2, HSTR_EL2 and CPACR_EL1, so consult the ARM documentation before modifying values. In general, all traps from EL1 or EL0 to EL2 are disabled (as we are not implementing a hypervisor – at least yet) and traps from EL0 to EL1 for various instructions are also disabled. If you change the architectural features enabled, you should double-check the instruction traps.

I am not expert on these settings, they appear to be ‘standard’ for RPi bare-metal code. For other AArch64 implementations with special execution requirements (like Streaming SVE Mode) the configuration will be different.

Switching to EL1

Much like moving from EL3 to EL2, to move from EL2 to EL1, it is necessary to setup the ‘EL return address’ register and execute the ‘eret‘ instruction.

After switching to EL1, the code gets the board identity again and stores it in a global variable for use from kernel code.

Setting Up Exception Vectors

AArch64 exception vector tables are setup in memory and are used to identify the correct handler for different exceptions. Recall, exceptions are a super-set of just interrupts. The code required for setting up the vectors can be found in my Github repository in the isr_kernel_entry.S file. The key elements in that file are the kernel entry and exit code which just saves the registers on entry and restores them on exit and the exception table itself.

Setting the Stack Pointer for EL0 Exceptions

Code required to setup the stack pointer for EL0 exceptions appears below. The comments in the snippet describe the interaction of exception processing and stack pointers. In short, if SPSel == 1, then h suffixed vectors are used and each exception level will have its own stack pointer. The CPU could be configured to share a stack pointer between EL1 and EL0 and that could be fine for bare metal code executing only in EL1 but for an OS with processes running in EL0, we should have different stacks.

For clarity, each process in EL0 will have a different stack of its own. The EL0 stack here is for exception processing where the exception code is run in EL0. Looking through the documentation, it appears as if SPSel settings are partly driven by support for the Linux exception processing model.

As is the case for EL1, the symbol used for the EL0 top of stack is found in the linker script.

Clearing the BSS Segment for C Code

The C Language model prescribes that the BSS segment, which holds uninitialized data, be zeroed prior to execution jumping to the ‘main()‘ function. The code below relies on symbols from the linker script to zero out the BSS. The is done in 8 byte chunks, so the alignment needs to be correct.

There is some discussion of the bss segment in my post on Linker Scripts.

Initializing C++ Static Globals

In the C++ Language Model, global static variables must be initialized prior to execution of the ‘main()‘ function. For static class instances, this will require invoking the class constructor and passing the correct memory location for the class instance.

Fortunately, C++ compilers do the heavy lifting for us. The compiler generates an array of void functions which can be called just prior to jumping to the ‘main()‘ function which will initialize each static variable. All the initialization code must do is walk the array and call the functions. The code below does just that.

This is the last step performed in the boot code before jumping to the kernel main.

Jumping to Kernel Main

Finally, we have the branch to kernel_main(). One detail here is that I chose to use the symbol name kernel_main() instead of main() specifically to avoid any risk of ‘special handling of main()’ applied by the compiler or linker.

If execution returns from kernel_main(), the PE is just parked.

Conclusion

The post above provides *mostly complete* bare metal boot code for RPi3 or 4 platforms running in AArch64. Code referenced above can be found in the associated Github Repository.

Building a Raspberry Pi 64 Bit Operating System with C++

I have undertaken many different projects through the years, one area which I have not really explored is Operating System development. When I started developing software on 8 bit computers, the closest you came to an OS was a ‘monitor’ or perhaps a ‘Master Control Program’ for those old enough to remember Tron.

I have started tinkering with a 64 bit operating system for Raspberry Pi based computers. Given how powerful those small single board computers have become, they make a great platform for OS experimentation.

My goals for this project are four-fold:

  1. Get back to ‘bare metal basics’ for a while
  2. Provide a platform for experimentation with different approaches to OS architecture
  3. Explore the advantages and disadvantages of C++ for OS development
  4. Provide a collection of tutorials and working code for others to learn from

C++ for OS Development

There is a definite bias against C++ for bare metal programming, though increasingly there are bare metal projects utilizing C++. In the Raspberry Pi ecosystem, the Circle – C++ Bare Metal Environment for Raspberry Pi is perhaps the best and most useful example. It is a remarkable system.

Prior to C++ 11, I probably would not have considered this but now at C++ 20 and beyond the language system is both rich enough and flexible enough to span from bare metal up to the highest application layer development. At the time of writing, this project is built with C++ 20.

Part of my goal is to create a single image which runs across multiple RPI versions and makes obvious the points at which board specific code is required. My approach is to create interfaces using classes and abstract virtual functions which, yes adds a bit of overhead but anymore it is minimal. The optimizations available in modern compilers and increased clarity associated with C++ code may help close the performance gap between C and C++.

I am not particularly concerned about size at present. On systems with gigabytes of RAM, the difference between a 64k kernel and a 128k kernel is negligible. Honestly the kernel size is going to be much more tightly correlated to the OS architecture than the implementation language or optimizations. A monolithic kernel containing lots of services will be big whereas a microkernel with most services running in user space will be much smaller. These days though, I tend to favor speed over size.

Struct Timespec Utilities

There are a number of different representations for time in C and C++, ‘struct timespec’ was added in C++11 to provide a representation of times that range beyond a simple integer. A ‘struct timespec’ contains two long fields, one for seconds and another for nanoseconds. Unlike the std::chrono classes, there are no literal operators or other supporting functions for timespec. C++17 has added std::timespec but that still lacks literals and other operators.

Examples

The utilities can be found in my github repository:

https://github.com/stephanfr/TimespecUtilities

Including the ‘TimespecUtilities.hpp’ header file is all that is required. Literal suffix operators ‘_s’ for seconds and ‘_ms’ for milliseconds are defined as well as addition, subtraction and scalar multiplication operators. Examples follow:

const struct timespec five_seconds = 5_s;
const struct timespec one_and_one_half_seconds = 1.5_s;
const struct timespec five_hundred_milliseconds = 500_ms;

const struct timespec six_and_one_half_seconds = five_seconds + one_and_one_half_seconds;
const struct timespec three_and_one_half_seconds = five_seconds - one_and_one_half_seconds;

const struct timespec ten_seconds = five_seconds * 2;
const struct timespec fifty_milliseconds = five_hundred_milliseconds * 0.1;

Cork – A High Performance Library for Geometric Boolean/CSG Operations

Gilbert Bernstein is currently a Ph.D. student at Stanford and had published some remarkable papers on computational geometry.  I was first drawn to his work by his 2009 paper on Fast, Exact, Linear Booleans as my interest in 3D printing led me to create some tooling of my own.  The various libraries I found online for performing Constructive Solid Geometry (CSG) operations were certainly good but overall, very slow.  CGAL is one library I had worked with and I found that the time required for operations on even moderately complex meshes was quite long.  CGAL’s numeric precision and stability is impeccable, the 3D CSG operations in CGAL are based on 3D Nef Polyhedra but I found myself waiting quite a while for results.

I exchanged a couple emails with Gilbert and he pointed me to a new library he had published, Cork.  One challenge with the models he used in his Fast, Exact paper is that the internal representation of 3D meshes was not all that compatible with other toolsets.  Though the boolean operations were fast, using those algorithms imposed a conversion overhead and eliminates the ability to take other algorithms developed on standard 3D mesh representations and use them directly on the internal data structures.  Cork is fast but uses a ‘standard’ internal representation of 3D triangulated meshes, a win-win proposition.

I’ve always been one to tinker with code, so I forked Gilbert’s code to play with it.  I spent a fair amount of time working with the code and I don’t believe I found any defects but I did find a few ways to tune it and bump up the performance.  I also took a swag at parallelizing sections of the code to further reduce wall clock time required for operation, though with limited success.  I believe the main problem I ran into is related to cache invalidation within the x86 CPU.  I managed to split several of the most computationally intensive sections into multiple thread of execution – but the performance almost always dropped as a result.  I am not completely finished working on threading the library, I may write a post later on what I believe I have seen and how to approach parallelizing algorithms like Cork on current generation CPUs.

Building the Library

My fork of Cork can be found here: https://github.com/stephanfr/Cork.  At present, it only builds on MS Windows with MS Visual Studio, I use VS 2013 Community Edition.  There are dependencies on the Boost C++ libraries, the MPIR library, and Intel’s Threading Building Blocks library.  There are multiple build targets, both for Win32 and x64 as well as for Float, Double and SSE arithmetic based builds.  In general, the Win32 Float-SSE builds will be the quickest but will occasionally fail due to numeric over or underflow.  The Double x64 builds are 10 to 20% slower but seem solid as a rock numerically.  An ‘Environment.props’ file exists at the top level of the project and contains a set of macros pointing to dependencies.

The library builds as a DLL.  The external interface is straightforward, the only header to include is ‘cork.h’, it will include a few other files.  In a later post I will discuss using the library in a bit more detail but a good example of how to use it may be found in the ‘RegressionTest.cpp’ file in the ‘RegressionTest’ project.  At present, the library can only read ‘OFF’ file formats.

There is no reason why the library should not build on Linux platforms, the dependencies are all cross platform and the code itself is pretty vanilla C++.  There may be some issues compiling the SSE code in gcc, but the non-SSE builds should be syntactically fine.

Sample Meshes

I have a collection of sample OFF file meshes in the Github repository:  https://github.com/stephanfr/SolidModelRepository.  The regression test program loads a directory and performs all four boolean operations between each pair of meshes in the directory – writing the result mesh to a separate directory.

These sample meshes range from small and simple to large and very complex.  For the 32bit builds, the library is limited to one million points and some of the samples when meshed together will exceed that limit.  Also, for Float precision builds, there will be numeric over or underflows whereas for x64 Double precision builds, all operations across all meshes should complete successfully.

When Errors (Inevitably) Occur

I have tried to catch error conditions and return those in result objects with some descriptive text, the library should not crash.  The code is very sensitive to non-manifold meshes.  The algorithms assume both meshes are two manifold.  Given the way the optimizations work, a mesh may be self intersecting but if the self intersection is in a part of the model that does not intersect with the second model, the operation may run to completion perfectly.  A different mesh my intersect spatially with the self intersection and trigger an error.

Meshes randomly chosen from the internet are (in my experience) typically not two manifold.  I spent a fair amount of time cleaning up the meshes in the sample repository.  If you have a couple meshes and they do not what to union – use a separate program like MeshLab to look over the meshes and double check that they are both in fact 2 manifold.

Conclusion

If you are interested in CSG and need a fast boolean library, give Cork a shot.  If you run into crashes or errors, let me know – the code looks stable for my datasets but they are far from exhaustive.

 

 

 

 

 

 

Building GCC Plugins – Part 2: Introduction to GCC Internals

Once the basic scaffolding is in place for a GCC Plugin, the next step is to analyze and perhaps modify the Abstract Syntax Tree (AST) created by GCC as a result of parsing the source code.  GCC is truly a marvel of software engineering, it is the de-facto compiler for *nix environments and supports a variety of front ends for different langauages (even Ada…).  That said, the GCC AST is complex to navigate for a number of reasons.  First, parsing and representing a variety of languages in a common syntax tree is a complex problem so the solution is going to be complex.  Second, history – looking at the GCC internals is a bit like walking down memory lane; this is the way we wrote high-performance software when systems had limited memory (think 64k) and CPUs had low throughput (think 16Mhz clock cycles).  Prior to GCC 4.8.0, GCC was compiled with the C compiler, so don’t bother looking for C++ constructs in the source code.

The AST Tree

The primary element in the GCC AST is the ‘tree’ structure.  An introduction to the tree structure appears in the GCC Internals Documentation.  Figure 1 is extracted from the tree.h header file and provides a good starting place for a discussion of the GCC tree and how to approach programming with it.

[sourcecode language=”c”]

union GTY ((ptr_alias (union lang_tree_node),
desc ("tree_node_structure (&%h)"), variable_size)) tree_node {
struct tree_base GTY ((tag ("TS_BASE"))) base;
struct tree_typed GTY ((tag ("TS_TYPED"))) typed;
struct tree_common GTY ((tag ("TS_COMMON"))) common;
struct tree_int_cst GTY ((tag ("TS_INT_CST"))) int_cst;
struct tree_real_cst GTY ((tag ("TS_REAL_CST"))) real_cst;
struct tree_fixed_cst GTY ((tag ("TS_FIXED_CST"))) fixed_cst;
struct tree_vector GTY ((tag ("TS_VECTOR"))) vector;
struct tree_string GTY ((tag ("TS_STRING"))) string;
struct tree_complex GTY ((tag ("TS_COMPLEX"))) complex;
struct tree_identifier GTY ((tag ("TS_IDENTIFIER"))) identifier;
struct tree_decl_minimal GTY((tag ("TS_DECL_MINIMAL"))) decl_minimal;
struct tree_decl_common GTY ((tag ("TS_DECL_COMMON"))) decl_common;
struct tree_decl_with_rtl GTY ((tag ("TS_DECL_WRTL"))) decl_with_rtl;
struct tree_decl_non_common GTY ((tag ("TS_DECL_NON_COMMON"))) decl_non_common;
struct tree_parm_decl GTY ((tag ("TS_PARM_DECL"))) parm_decl;
struct tree_decl_with_vis GTY ((tag ("TS_DECL_WITH_VIS"))) decl_with_vis;
struct tree_var_decl GTY ((tag ("TS_VAR_DECL"))) var_decl;
struct tree_field_decl GTY ((tag ("TS_FIELD_DECL"))) field_decl;
struct tree_label_decl GTY ((tag ("TS_LABEL_DECL"))) label_decl;
struct tree_result_decl GTY ((tag ("TS_RESULT_DECL"))) result_decl;
struct tree_const_decl GTY ((tag ("TS_CONST_DECL"))) const_decl;
struct tree_type_decl GTY ((tag ("TS_TYPE_DECL"))) type_decl;
struct tree_function_decl GTY ((tag ("TS_FUNCTION_DECL"))) function_decl;
struct tree_translation_unit_decl GTY ((tag ("TS_TRANSLATION_UNIT_DECL")))
translation_unit_decl;
struct tree_type_common GTY ((tag ("TS_TYPE_COMMON"))) type_common;
struct tree_type_with_lang_specific GTY ((tag ("TS_TYPE_WITH_LANG_SPECIFIC")))
type_with_lang_specific;
struct tree_type_non_common GTY ((tag ("TS_TYPE_NON_COMMON")))
type_non_common;
struct tree_list GTY ((tag ("TS_LIST"))) list;
struct tree_vec GTY ((tag ("TS_VEC"))) vec;
struct tree_exp GTY ((tag ("TS_EXP"))) exp;
struct tree_ssa_name GTY ((tag ("TS_SSA_NAME"))) ssa_name;
struct tree_block GTY ((tag ("TS_BLOCK"))) block;
struct tree_binfo GTY ((tag ("TS_BINFO"))) binfo;
struct tree_statement_list GTY ((tag ("TS_STATEMENT_LIST"))) stmt_list;
struct tree_constructor GTY ((tag ("TS_CONSTRUCTOR"))) constructor;
struct tree_omp_clause GTY ((tag ("TS_OMP_CLAUSE"))) omp_clause;
struct tree_optimization_option GTY ((tag ("TS_OPTIMIZATION"))) optimization;
struct tree_target_option GTY ((tag ("TS_TARGET_OPTION"))) target_option;
};

[/sourcecode]

Figure 1: The tree_node structure extracted from the GCC code base.

Fundamentally, a tree_node is a big union of structs.  The union contains a handful of common or descriptive members, but the majority of union members are specific types of tree nodes.  The first tree union member: tree_base is common to all tree nodes and provides the basic descriptive information about the node to permit one to determine the precise kind of node being examined or manipulated.  There is a bit of an inheritance model introduced with tree_base being the foundation and tree_typed and tree_common adding another layer of customization for specific categories of tree nodes to inherit but from there on out the remainder of the union members are specific types of tree nodes.  For example, tree_int_cst is an integer constant node whereas tree_field_decl is a field declaration.

Tree nodes are typed but not in the C language sense of ‘typed’.  One way to think about it is that the tree_node structure is a memory-efficient way to model a class in C prior to C++.  Instead of member functions or methods, there is a large library of macros which act on tree nodes.  In general, macros will fall into two categories: predicate macros which will usually have a ‘_P’ suffix and return a value which can be compared to zero to indicate a false result and transformation macros which take a tree node and usually return another tree node.  Despite the temtpation to dip directly into the public tree_node structure and access or modify the data members directly – don’t do it.  Treat tree nodes like a C++ classes in which all the data members are private and rely on the tree macros to query or manipulate tree nodes.

Relying on the macros to work with the tree_node structure is the correct approach per GCC documentation but will also simply make your life easier.  GCC tree_node structures are ‘strongly typed’ in the sense that they are distinct in the GCC tree type-system and many of the macros expect a specific tree_node type.  For example the INT_CST_LT(A, B) macro expects to have two tree_int_cst nodes passed as arguments – even though the C++ compiler cannot enforce the typing at compile time.  If you pass in the wrong  tree_node type, you will typically get a segmentation violation.  An alternative approach is to compile GCC with the –enable-checking flag set which will enforce runtime checking of node types.

In terms of history, this type of modelling was common back in the day when machines were limited in memory and compute cycles.  This approach is very efficient in terms of memory as the union overlays all the types and there are no virtual tables or other C++ class overhead that consumes memory or requires compute overhead.  The price paid though is that it is 100% incumbent on the developer to keep the type-system front-of-mind and insure that they are invoking the right macros with the right arguments.  The strategy of relying on the compiler to advise one about type mis-matches does not work in this kind of code.

Basics of AST Programming

There are 5 key macros that can be invoked safely on any tree structure.  These three are: TREE_CODE, TREE_TYPE, TREE_CHAIN, TYPE_P and DECL_P.  In general after obtaining a ‘generic’ tree node, the first step is to use the TREE_CODE macro to determine the ‘type’ (in the GCC type-system) of the node.  The TREE_TYPE macro returns the source code ‘type’ associated with the node.  For example, the node result type of a method declaration returning an interger value will have a TREE_TYPE with a TREE_CODE equal to INTEGER_TYPE.  The code for that statement would look like:

[sourcecode language=”c” wraplines=”false”]

TREE_CODE( TREE_TYPE( DECL_RESULT( <em>methodNode</em> ))) == INTEGER_TYPE

[/sourcecode]

Within the AST structure, lists are generally represented as singly-linked lists with the link to the next list member returned by the TREE_CHAIN macro.  For example, the DECL_ARGUMENTS macro will return a pointer to the first parameter for a function or method.  If this value is NULL_TREE, then there are no parameters, otherwise the tree node for the first parameter is returned.  Using TREE_CHAIN on that node will return NULL_TREE if it is the only parameter or will return a tree instance for the next parameter.  There also exists a vector data structure within GCC and it is accessed using a different set of macros.

The TYPE_P and DECL_P macros are predicates which will return non-zero values if the tree passed as an argument is a type specification or a code declaration.  Knowing this distinction is important as it then quickly partitions the macros which can be used with node.  Many macros will have a prefix of ‘TYPE_’ for type nodes and ‘DECL_’ for declaration nodes.  Frequently there will be two sets of identical macros, for instance TYPE_UID will return the GCC generated, internal numeric unique identifier for a type node whereas DECL_UID is needed for a declaration node.  In general, I have found that calling a TYPE_ macro on a declaration or a DECL_ macro on a type specification will result in a segmentation violation.

Other frequently used macros include: DECL_NAME and TYPE_NAME to return a tree node that contains the source code name for a given element.  IDENTIFIER_POINTER can then be used on that tree to return a pointer to the char* for the name.  DECL_SOURCE_FILE, DECL_SOURCE_LINE and DECL_SOURCE_LOCATION are available to map an AST declaration back to the source code location.  As mentioned above, DECL_UID and TYPE_UID return numeric unique identifiers for elements in the source code.

In addition to the above, for C++ source code fed to g++, the compiler will inject methods and  fields not explicitly declared in the c++ source code.  These elements can be identified with the DECL_IS_BUILTIN and DECL_ARTIFICIAL macros.  If as you traverse the AST you trip across oddly named elements, check the node with those macros to determine if the nodes have been created by the compiler.

Beyond this simple introduction, sifting through the AST will require a lot of time reviewing the tree.h and other header files to look for macros that you will useful for your application.  Fortunately, the naming is very consistent and quite good which eases the hunt for the right macro.  Once you think you have the right macro for a given task, try it in your plugin and see if you get the desired result.  Be prepared for a lot of trial-and-error investigation in the debugger.  Also, though there are some GDB scripts to pretty-print AST tree instances, looking at these structure in the debugger will also require some experience, as again the debugger isn’t able to infer much about GCC’s internal type system.

Making the AST Easier to Navigate and Manipulate

I have started a handful of C++ libraries which bridge the gap between the implicit type system in the GCC tree_node structure and explicit C++ classes modelling distinct tree_node types.  For example, a snippet from my TypeTree class appears below in Figure 2.

[sourcecode language=”c” wraplines=”false”]

class TypeTree : public DeclOrTypeBaseTree
{
public :

TypeTree( const tree& typeTree )
: DeclOrTypeBaseTree( typeTree )
{
assert( TYPE_P( typeTree ) );
}

TypeTree& operator= ( const tree& typeTree )
{
assert( TYPE_P( typeTree ) );

(tree&)m_tree = typeTree;

return( *this );
}

const CPPModel::UID UID() const
{
return( CPPModel::UID( TYPE_UID( TYPE_MAIN_VARIANT( m_tree ) ), CPPModel::UID::UIDType::TYPE ) );
}

const std::string Namespace() const;

std::unique_ptr<const CPPModel::Type> type( const CPPModel::ASTDictionary& dictionary ) const;

CPPModel::TypeInfo::Specifier typeSpecifier() const;

CPPModel::ConstListPtr<CPPModel::Attribute> attributes();
};
[/sourcecode]

Figure 2: TypeTree wrapper class for GCC tree_node.

Within this library I make extensive use of the STL, Boost libraries and a number of C++ 11 features.  For example, ConstListPtr<> is a template alias for a std::unique_ptr to a boost::ptr_list class.

[sourcecode language=”c” wraplines=”false”]

template <class T> using ListPtr = std::unique_ptr<boost::ptr_list<T>>;
template <class T> using ConstListPtr = std::unique_ptr<const boost::ptr_list<T>>;

template <class T> using ListRef = const boost::ptr_list<T>&;

template <class T> ConstListPtr<T> MakeConst( ListPtr<T>& nonConstList ) { return( ConstListPtr<T>( std::move( nonConstList ) ) ); }

[/sourcecode]

Figure 3: Template aliases for lists.

At present the library is capable of walking through the GCC AST and creating a dictionary of all the types in the code being compiled.  Within this dictionary, the library is also able to provide detailed information on classes, structs, unions, functions and global variables.  It will scrape out C++ 11 generalized attributes on many source code elements (not all of the yet though) and return proper declarations with parameters and return types for functions and methods.  The ASTDictionary and the specific language model classes have no dependency on GCC Internals themselves.

The approach I followed for developing the library thus far was to get enough simple code running using the GCC macros that I could then start to refactor into C++ classes.  Along the way, I used Boost strong typedefs to start making sense of the GCC type system at compile time.  Once the puzzle pieces started falling into place and the programming patterns took shape, developing a plugin on top of the libraries is fairly straightforward.  That said, there is a long and painful learning curve associated with GCC internals and the AST itself.

Getting the Code and Disclaimers

The library code is available on Github: ‘stephanfr/GCCPlugin’.  All of the code is under GPL V3.0 which is absolutely required as it runs within GCC itself.  I do not claim that the library is complete, stable, usable or rational – but hopefully some will find it useful if for nothing more than providing some insight into the GCC AST.  For the record, this is not my job nor is it my job to enrich or bug fix the library so you can get your compiler theory class project done in time.  That said, if you pick up the code and either enrich it or fix some bugs – please return the code to me and I will merge what makes sense.

The code should ‘just run’ if you have a GCC Plugin build environment configured per my prior posts.  One detail is that the ‘GCCPlugin Debug.launch’ file will need to be moved to the ‘.launches’ directory of Eclipse’s ‘org.eclipse.debug.core’ plugin directory.  If the ‘.launches’ directory does not exist, then create it.

Debugging GCC in Eclipse CDT 4.2

My favored C++ development environment on Linux is Eclipse CDT.  There are a number of different IDEs available for Linux but I use the Eclipse IDE for Java development and find it easier to stick to that tool for C++.  Eclipse 4.2 CDT is mature and many of the rough edges found in prior releases have been filed off in Juno.

As I am working on a GCC plugin, I needed to create a debug build of GCC and then figure out how to debug it in the IDE.  The procedure to build a debug version of GCC 4.7.2 in Ubuntu 12.04 can be found in my post here.  Once you have a debug GCC built, adding Eclipse CDT and configuring a project for GCC debugging is a straightforward process – but there are a few details that can be added to the environment that make GCC development in Eclipse much more tractable.

Step 1:  Install Java 1.7

Eclipse is supported with the Oracle, IBM and OpenJDK packages.  In the past I’ve typically relied on the Sun JDKs and feel most comfortable using those JDKs for Eclipse.  I use the Web Upd8 PPA repository for Ubuntu to install the JDK.

[sourcecode language=”text”]
$ sudo add-apt-repository -y ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install -y oracle-jdk7-installer
$ sudo update-java-alternatives -s java-7-oracle
[/sourcecode]

You can check that the JDK installed correctly by asking for the java version

[sourcecode language=”text”]
$ java -version
[/sourcecode]

Step 2: Install Eclipse 4.2

Eclipse 4.2 is not yet in the official Ubuntu repository, so it must be installed manually.  I haven’t been able to find a way to use wget to pull the Eclipse archive directly, so I use the version of Firefox bundled with Ubuntu 12.04 to download the archive.  The Eclipse archives can be found at: http://www.eclipse.org/downloads/?osType=linux.  For Eclipse 4.2 CDT you want to download: eclipse-cpp-juno-linux-gtk-x86_64.tar.gz, assuming you are using 64 bit Ubuntu.

[sourcecode language=”text”]
$ mkdir ~/eclipse
$ cd ~/eclipse
$ mkdir 4.2
$ cd 4.2

#
# Download Eclipse 4.2 from: http://www.eclipse.org/downloads/?osType=linux
# The archive you want should be: eclipse-cpp-juno-linux-gtk-x86_64.tar.gz
#

$ tar -xzvf eclipse-cpp-juno-linux-gtk-x86_64.tar.gz
$ sudo cp -r eclipse /usr/lib/
$ sudo cp eclipse/icon.xpm /usr/share/pixmaps/eclipse.xpm
[/sourcecode]

If this is a first install of Eclipse, you will probably want to create a script in /usr/bin that simply launches Eclipse from /usr/lib/eclipse.

[sourcecode language=”text”]
$ sudo sh -c "echo ‘/usr/lib/eclipse/eclipse’ >> /usr/bin/eclipse"
$ sudo chmod +x /usr/bin/eclipse
[/sourcecode]

If you want to create a menu entry for the IDE, the best bet is to use the menu management tool: Applications > System Tools > Preferences > Main Menu.  It should automatically pick up the icon from the pixmaps directory.

Step 3: Create two Simple C++ Projects to Debug GCC

From here on, the example assumes that GCC was built with –prefix=/usr/gcc-4.7.2 and –program-suffix=-4.7.2.  If your debug version of GCC was built with different values, then substitute accordingly.

Inside Eclipse, create two new ‘Hello World’ C++ Projects.  Use the ‘Executable’ project – not a ‘GNU Autotools’ project.  For this example I labelled the first project ‘GCC Debug Test’ and the second ‘Project To Build’.  ‘GCC Debug Test’ will be configured to build ‘Project to Build’ using the debug version of GCC.  For clarity, ‘GCC Debug Test’ isn’t even built, it is needed to configure the debug settings for GCC.

First, create a custom .gdbinit file in the ‘GCC Debug Test’ working directory.  Do this by copying the .gdbinit file from the gcc build directory into the project working directory and then fixup the paths in the file to point back to the build directory. The .gdbinit file created during the gcc build process provides a collection of scripts that can be used to print gcc data structures while debugging – this will prove invaluable.   FInally, add ‘set schedule-multiple’ as the first line of the  file – this option causes gdb to track multiple processes in a single debugging session.  The .gdbinit file should look something like this:

[sourcecode language=”text”]
set schedule-multiple

dir ~/gcc_build/4.7.2/build/gcc
dir ~/gcc_build/4.7.2/gcc
dir ~/gcc_build/4.7.2/gcc/cp
dir ~/gcc_build/4.7.2/gcc/lto
source ~/gcc_build/4.7.2/build/gcc/gdbinit.in
[/sourcecode]

In the ‘GCC Debug Test’ project, go to the ‘Run > Debug Configurations’ dialog and create a new ‘C/C++ Application’ debug configuration.  On the ‘Main’ tab, enter the path to the ‘gcc-4.7’ debug executable in the ‘/usr/gcc-4.7.2/bin’ directory in the ‘C/C++ Application:’ field in the dialog and click ‘Apply’.

After completing the ‘Main’ tab dialog, click on the ‘Arguments’ tab and enter the path to the test file to be compiled and click ‘Apply’.

Next, click on the ‘Environment’ tab and create two variables: LD_LIBRARY_PATH and PATH.  For LD_LIBRARY_PATH, add the paths to the ‘lib’, ‘lib64’ and ‘lib64/debug’ directories for the GCC build to the front of the environment variable.  For PATH, add the path to the ‘bin’ directory to the front of the path as well.  For both environment variables, add the original paths to the end of the variable.  Make sure the ‘Replace native environment with specified environment’ radio button is selected.

Finally, click on the ‘Debugger’ tab.  In that dialog, insure the ‘Stop on startup at: main’ checkbox is checked.  Enter the path to the .gdbinit file created above into the ‘GDB command file’field.  Finally, check the ‘Automatically debug forked processes’ checkbox.  Since the gcc-4.7.2 application is just a driver for the actual compiler, unless this field is selected the debugger will not debug into the actual compiler when that process is forked by gcc.  Click ‘Apply’ and the configuration is complete.

 

With the debug configuration finished, click ‘Debug’ and gdb should launch and stop at the ‘main’ function for gcc-4.7.

Step 4 : Making Debug GCC the version of GCC to use for Builds

This step is not *strictly* necessary, though building a plugin or modifying the gcc suite and compiling those modifications with a different version of gcc is ill advised for all sorts of good reasons.  Changing compilers in GCC is straightforward, though a multi-step process.

First, navigate to the Project->Properties->Settings dialog and select ‘GCC C++ Compiler’.  Set the ‘Command’ field to the debug version of g++.  I also set the CXX0X experimental symbol and the -std=c++0x option.

Set the ‘Command’ field for the ‘GCC C++ Linker’ as well.

After pressing ‘OK’, the newly built compiler will not be used by Eclipse for compiling this project.  There doesn’t appear to be a way to set these options globally, so the same changes will have to be made for each project you wish to compile with the the debug GCC suite.