How do you balance efficiency and performance with embedded code in production?

Published on 8 February 2024

It’s a question we’re regularly asked by the development or product teams we talk to, particularly if the company has real ambitions in terms of the environment and energy.

And often the answer is… positive. It is about software eco-design and frugal code. This was the starting point for our collaboration with Bouygues Telecom on a project to develop a range of their TV boxes.

The global communications operator is strongly committed to responsible digital for all. In practical terms, this means that its teams are reviewing the way they think about and design future solutions, as well as questioning the products and services already in use, on which action obviously has a significant and tangible impact because of the volume already deployed.

In this case, the aim was to investigate the performance potential of a software application (C++) running on a TV box that filters network frames according to several protocols in order to transmit video data over an Internet connection. Our task was to diagnose the software in order to assess its optimization potential.

How can we address this issue?

What makes our expertise unique is that we examine the source code to understand its organization and data structure in relation to the hardware target on which it will be executed. By making this connection, we are able to determine whether the target’s potential is being fully exploited in relation to its objectives and, in conjunction with a range of optimization techniques, identify what could be optimized.

It’s this diagnostic phase that allows us to quantify the potential gains, select the areas to optimize, and then execute the optimization.

 

Diagnostic implementation

We worked on a Box with an ARM Cortex-A9 quad core. Interestingly, the instruction set (ARMv7) has a 128-bit SIMD Neon unit. Ideal for vectorization.

Diagnostics focused on :

. identifying for and while loops
. characterizing their vectorization potential
. evaluating the number of calls to each function and the number of instructions used in each function.

Diagnostic results

Several opportunities for optimization were identified. The recommended optimization strategy was to focus on vectorizable functions (44), in particular 15 of them with high potential.

Optimization: 39% gains

At the end of the optimization phase, an average gain of 39% in the number of instructions and 36.3% in the number of cycles was achieved.
The optimization techniques used were as follows:

  • Vectorization
  • Convert for loops to memset/memcpy
  • Loop replacement
  • More efficient initialization

What technical lessons can be learned from this?

This collaboration highlighted several points in terms of technical lessons learned:

It is possible to make gains in software applications that primarily perform control operations. Until now, we’ve mainly worked on code that performs computations. In its diagnostic phase, our future beLow solution will identify and classify code operations according to their nature (control, storage, computation).

Exploiting the vectoring potential of Arm v7 is a source of significant gains, as SIMD allows an instruction to be executed on multiple previously packed data (useful for repetitive and intensive computations). This requires rewriting the code to work around the limitations of the compiler.

Working with existing deployed equipment is an exciting approach to maximizing the impact of volume, reducing energy consumption and even extending equipment life.

This was our first collaboration with a large corporation. Thanks to the Bouygues Telecom teams we worked with (Herminio de Faria, Vincent Paillet). It’s proof that it’s possible for a startup to work with a major corporation and its operational teams, when the latter are motivated and characterized by a strong culture of innovation combined with the desire to take concrete actions in favor of a more responsible and frugal digital world.