Running advanced C++ software on MCUs




Many RTOSes provide a C++ compatibility layer but unlike “big” systems (with an MMU), most RTOSes have some restrictions. In this article, we look at the internals of C++ and find out the reasons for these limitations.

There are lots of advantages to developing user applications in C++ so it is not surprising that the language becomes more and more popular everywhere including in MCU-based systems. The ‘mbed’ project is fully focused on this language. A lot of RTOSes provide a C++ compatibility layer but in contrast to “big” systems (with MMU), most of the RTOSes have some restrictions. In this article, we look at the internals of C++ and find out the reasons for these limitations.

There are two main restrictions for C++ on MCU: relaunching applications and multi-thread features of a standard C++ library.

Most of the examples in the article will be considered on the Embox RTOS. This RTOS allows running such complex C++ projects as OpenCV on MCUs. This project requires thread support in the standard C++ library. In addition, Embox, unlike other RTOS on MCUs, allows relaunch of C++ applications. We will use the STM32F769i-board with external SDRAM to demonstrate OpenCV since this framework requires hundreds of kilobytes of RAM. However, several kilobytes of RAM is enough to run simple C++ applications.

Basic syntax

The syntax of the C++ language is implemented by the compiler. These features are included in the language support library named ‘libsupc++’. There are some parts that must be handled while the application is running. For example, it is necessary to handle global constructors and destructors.

Global constructors and destructors

Let’s take a look at how any C++ application works with global constructors and destructors. All global C++ objects are created before the program calls main(). There are some special sections for this purpose: ‘.init_array’, ‘.init’, ‘.preinit_array’, ‘.ctors’.  These are an array of pointers to functions, which must be traversed from beginning to end calling the corresponding element of the array.

The code from Embox for calling global objects constructors is as follows:

void cxx_invoke_constructors(void) {
   extern const char _ctors_start, _ctors_end;
   typedef void (*ctor_func_t)(void);
   ctor_func_t *func = (ctor_func_t *) &_ctors_start;
   ...
   for (; func != (ctor_func_t *) &_ctors_end; func++) {
       (*func)();
   }
}

Let’s see how the termination of a C++ application works, namely, the call of the destructors of global objects. There are two ways.

First, the most frequently used in compilers is to use __cxa_atexit() from the C++ application binary interface (ABI). This is an analogue of the POSIX atexit(). That is, you can register special handlers that will be called during the program termination. When the global constructors are called at the start of the application, as described above, there is also compiler-generated code that registers the destructor handler with  __cxa_atexit().

The second way is to store pointers to destructors in special sections ‘.fini_array’ and ‘.fini’. The GCC compiler would use this way if the ‘-fno-use-cxa-atexit’ flag was specified. In this case, during application termination, the destructors must be called in reverse order (from high address to low). This method is less common but can be useful in microcontrollers. Because, in this case, it can be found out how many handlers are required at compile time.

The code from Embox  for calling global objects destructors is as follows:

int __cxa_atexit(void (*f)(void *), void *objptr, void *dso) {
   if (atexit_func_count >= TABLE_SIZE) {
       printf("__cxa_atexit: static destruction table overflow.n");
       return -1;
   }
   atexit_funcs[atexit_func_count].destructor_func = f;
   atexit_funcs[atexit_func_count].obj_ptr = objptr;
   atexit_funcs[atexit_func_count].dso_handle = dso;
   atexit_func_count++;
   return 0;
};
void __cxa_finalize(void *f) {
   int i = atexit_func_count;
   if (!f) {
       while (i--) {
          if (atexit_funcs[i].destructor_func) {
             (*atexit_funcs[i].destructor_func)(atexit_funcs[i].obj_ptr);
             atexit_funcs[i].destructor_func = 0;
          }
       }
       atexit_func_count = 0;
   } else {
       for (; i >= 0; --i) {
          if (atexit_funcs[i].destructor_func == f) {
             (*atexit_funcs[i].destructor_func)(atexit_funcs[i].obj_ptr);
             atexit_funcs[i].destructor_func = 0;
          }
       }
   }
}
void cxx_invoke_destructors(void) {
   extern const char _dtors_start, _dtors_end;
   typedef void (*dtor_func_t)(void);
   dtor_func_t *func = ((dtor_func_t *) &_dtors_end) - 1;
   /* There are two possible ways for destructors to be calls:
   * 1. Through callbacks registered with __cxa_atexit.
   * 2. From.fini_array section.  */
   /* Handle callbacks registered with __cxa_atexit first, if any.*/
   __cxa_finalize(0);
   /* Handle.fini_array, if any. Functions are executed in teh reverse order. */
   for (; func >= (dtor_func_t *) &_dtors_start; func--) {
       (*func)();
   }
}

The global destructors are required to be able to restart C++ applications. Most RTOSes for microcontrollers run a single application that does not need to restart. Therefore, in such RTOSes, global destructors are empty, because they are not supposed to be used.

The code from Zephyr RTOS for calling global destructions is as follows:

/**
 * @brief Register destructor for a global object
 *
 * @param destructor the global object destructor function
 * @param objptr global object pointer
 * @param dso Dynamic Shared Object handle for shared libraries
 *
 * Function does nothing at the moment, assuming the global objects
 * do not need to be deleted
 *
 * @return N/A
 */
int __cxa_atexit(void (*destructor)(void *), void *objptr, void *dso)
{
   ARG_UNUSED(destructor);
   ARG_UNUSED(objptr);
   ARG_UNUSED(dso);
   return 0;
}

When you need rerunnable applications, you have to use operating systems where global destructors handling is implemented. Embox provided this feature of C++ for various platforms including MCUs.

The video below demonstrates running different C++ applications multiple times on an STM32F769i-discovery board.

new/delete operators

In the GCC compiler, an implementation of the new/delete operators is placed in the libsupc++ library and their declarations are in the header file. So you can use the new/delete implementations from libsupc++, but they are quite simple in the base variant and you can implement them yourself, for example, with standard malloc/free or analogues.

Here’s the Embox code that implements new/delete operators (only base C++):

void* operator new(std::size_t size)  throw() {
   void *ptr = NULL;
   if ((ptr = std::malloc(size)) == 0) {
       if (alloc_failure_handler) {
          alloc_failure_handler();
       }
   }
   return ptr;
}
void operator delete(void* ptr) throw() {
   std::free(ptr);
}

RTTI & exceptions

If your application is simple, basic C++ without exceptions and RunTime Type Information (RTTI) may be enough. In this case, these features can be disabled with the compiler flags ‘-no-exception’ and ‘-no-rtti’.

But if these C++ features are required, they need to be implemented. This is much more difficult to do than new/delete. Moreover, in this case new/delete must have more complex implementations.

The good news is that implementations of these features are OS independent and have already been designed in the libsupc++ library in your cross-compiler. Accordingly, the easiest way to add their support is to use the libsupc++. The prototypes are placed in the and header files.

There are small requirements that you need to use the ‘standard C++ support library’ from the cross-compiler with its own C++ runtime. The linker script must have a special ‘.eh_frame’ section. And before using runtime,  ‘standard c++ support library’ must be initialized with the address of the beginning of this section. It needs to use the ‘libunwind’ library. It is the library that defines a portable and efficient C application programming interface (API) to determine the call-chain of a program.

Here’s the code in Embox for the ‘libunwind’ initialization:

void register_eh_frame(void) {
   extern const char _eh_frame_begin;
   __register_frame((void *)&_eh_frame_begin);
}

For the ARM architecture, other sections with their own information structure are used – ‘.ARM.exidx’ and ‘.ARM.extab’. The format of these sections is defined in the “Exception Handling ABI for the ARM Architecture” (EHABI) standard. ‘.ARM.exidx’ is an index table, and ‘.ARM.extab’ is a table of actual items required to handle an exception. To use these sections for handling exceptions, you need to include them in the linker script:

  .ARM.exidx: {
      __exidx_start =.;
      KEEP(*(.ARM.exidx*));
      __exidx_end =.;
   } SECTION_REGION(text)
  .ARM.extab: {
      KEEP(*(.ARM.extab*));
   } SECTION_REGION(text)

To enable the handler exceptions, the start and the end symbols of the ‘.ARM.exidx’ section must be specified – ‘__exidx_start’ and ‘__exidx_end’.

More details about stack organization can get in the article “How stack trace on ARM works”.

Standard language library (libstdc++).

Self-supporting implementation of libstdc++

C++ support includes not only the language syntax but also the libstdc++ standard library. It, as well as for the syntax, can be ranged into different levels. There are basic things like ‘libc’ wrappers for examples working with strings or C++ version of setjmp() . They are easily implemented with the standard C library. And there are more complex things like the Standard Template Library (STL), for example.

Libstdc++ from a cross-compiler

The basic things such as those described above are implemented in Embox. If these functions are enough, then you need not include the external C++ standard library. But if you need, for example, STL, then the easiest way is to use the library and the header files from the cross-compiler.

There are neverthelesssome important options for the cross-compiler. Let’s take a look at the standard arm-none-eabi-gcc:

$ arm-none-eabi-gcc -v
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/home/alexander/apt/gcc-arm-none-eabi-9-2020-q2-update/bin/../lib/gcc/arm-none-eabi/9.3.1/lto-wrapper
Target: arm-none-eabi
Configured with: ***    --with-gnu-as --with-gnu-ld --with-newlib   ***
Thread model: single
gcc version 9.3.1 20200408 (release) (GNU Arm Embedded Toolchain 9-2020-q2-update)

It has been built with –with-newlib It means It needs ‘newlib’ as the C standard library.

To minimize overhead, Embox uses its own implementation of the C standard library. However, it also means that for the runtime support, we need to implement a compatibility layer with ‘newlib’.

The code in Embox which implements one of the necessary but not obvious parts to support the standard library is as follows:

struct _reent {
   int _errno;         /* local copy of errno */
  /* FILE is a big struct and may change over time.  To try to achieve binary
    compatibility with future versions, put stdin,stdout,stderr here.
    These are pointers into member __sf defined below.  */
   FILE *_stdin, *_stdout, *_stderr;
};
struct _reent global_newlib_reent;
void *_impure_ptr = &global_newlib_reent;
static int reent_init(void) {
   global_newlib_reent._stdin = stdin;
   global_newlib_reent._stdout = stdout;
   global_newlib_reent._stderr = stderr;
   return 0;
}

All ‘newlib’ parts of the compatible layer for using the libstdc++ cross-compiler can be viewed in Embox in the ‘third-party/lib/toolchain/newlib_compat/’ folder.

Advanced support for the standard library std::thread and std::mutex

If you try to compile the following code:

#include        // std::thread
#include         // std::mutex
std::mutex mtx;

In your ‘mbed’ project you get the error:

namespace "std" has no member "mutex"

It happens because there is one more important attribute in the cross-compiler. Let’s take another look at the output:

$ arm-none-eabi-gcc -v
***
Thread model: single
gcc version 9.3.1 20200408 (release) (GNU Arm Embedded Toolchain 9-2020-q2-update)

When GCC is built with the “Thread model: single” option, threads support is disabled in the STL. It means that, for example, std::thread or std::mutex are not available. Therefore, there will be problems with building such complex C++ applications as OpenCV. In other words, this version of the library is not enough to build applications that require such features.

The solution we use in Embox is to build a gcc cross-compiler for the standard library with posix thread model. In this case ‘std::thread’ and ‘std::mutex’ are implemented with the standard ‘pthread_ *’ and ‘pthread_mutex_ *’.

Configure Embox

Rebuilding the compiler is the most reliable approach and provides the most complete and compatible solution. At the same time, it takes a lot of time and may require additional resources, which are not so much available in MCUs. Therefore, this method is not advisable to use everywhere.

In order to provide the best choice of C++ support for user application, several abstract classes (interfaces) with different implementations have been added to Embox:

  • ‘embox.lib.libsupcxx’ – determines which method to use to support the basic syntax of the language.
  • ‘embox.lib.libstdcxx’ – determines which implementation of the standard library to use.

There are three choices for ‘ libsupcxx’:

  • ‘embox.lib.cxx.libsupcxx_standalone’ – own implementation in Embox (only basic features).
  • ‘third_party.lib.libsupcxx_toolchain’ – use libsupc++ from the host cross-compiler
  • ‘third_party.gcc.tlibsupcxx’ – build libsupc++ from the sources

The minimal option can work even without the C ++ standard library. Embox has an implementation based on the simplest functions from the C standard library. If it is not enough, you can choose three variants of ‘libstdcxx’:

  • STLport.libstlportg – standard library including STL based on STLport project. Doesn’t require building gcc. But the project has not been supported since 2008
  • lib.libstdcxx_toolchain – Standard C++ library from the host cross-compiler
  • gcc.libstdcxx – full build libstdc++ from the sources

In this way, Embox successfully runs such complex C++ applications as Qt or even OpenCV.

Conclusion

Using C++ is very convenient, including on MCUs. Most C++ features are supported by cross-compiler through (libsupc++ and libstdc++). Also, C++ needs support from operating systems. Most OSes for MCUs do not assume the application needs to relaunch therefore they do not implement calling global destructions. Another restriction is the support for multi-thread in libstdc++. To avoid this, you need to build the cross-compiler with other than ‘single-thread’ model support. Embox RTOS solves both of the problems, enabling running OpenCV applications on MCUs multiple times.


Anton Bondarev is the founder of Embox RTOS. Anton graduated from Saint Petersburg Electrotechnical University (LETI) in 2003 with a master degree in electrical engineering and attended postgraduate courses in Saint-Petersburg State University specializing in software engineering. He has over 20 years in embedded and system programming.
Alexander Kalmuk is the cofounder of Embox RTOS. Alexander graduated from Saint-Petersburg State University in 2014 with a master degree in mathematics and software engineering and attended postgraduate courses in Saint-Petersburg State University specializing in control theory. He has over 10 years in embedded systems programming.

Related Contents:

For more Embedded, subscribe to Embedded’s weekly email newsletter.

The post Running advanced C++ software on MCUs appeared first on Embedded.com.





Original article: Running advanced C++ software on MCUs
Author: Anton Bondarev and Alexander Kalmuk