Move fast and break things? Not so fast in embedded.

Written by Shawn Prestridge, Industry profile and Senior Field Application Engineer/Team Leader for US FAEs at IAR Systems

This article is part of IAR Systems article series focused on the importance of code quality.

“Move fast and break things” is the approach that Mark Zuckerberg said he inculcates into the development culture at Facebook. While it sounds wonderful for getting new features up and running quickly (even if they are not perfect), it loses its luster if you try to apply that same approach to embedded development. The reason is that the domains are completely different: Facebook is based around web- and database-centric development with many function points that probably don’t suffer too greatly if the hot new feature doesn’t work correctly. Embedded systems are – by their nature – resource-constrained and generally intended to only do one function, or perhaps a few functions. Therefore, the philosophy of “move fast and break things” when applied to an embedded system can potentially render the entire system useless. Depending on what function that embedded system is providing, the results can be embarrassing at best and disastrous at worst.
Does this mean you can’t use Rapid Application Development (RAD) methods in embedded systems? You can use RAD, but you need to be very careful about how you do it.

“Rushing makes us neither faster, nor more productive”

Moving fast implies that code quality takes a back seat to getting the code delivered quickly, and is what is sometimes called the WISCY (pronounced “whiskey”) syndrome: Why Isn’t Somebody Coding Yet? It also implies that testing of said code is either not done at all or practiced with a high degree of informality. Both of these can get your project into trouble and you therefore should adhere to best practices on Software Quality Assurance.

When developers are rushed to add functionality (or even to fix bugs), they tend to skip any sort of integration testing with the rest of the system and only do desk-checking of their code by running a handful of tests that only target the code they have just created. The reason is a culture of rushing to get the code out the door. As Lemi Ergin says, “Rushing makes us neither faster, nor more productive; it increases stress and distracts focus. We need creativity, effectiveness, and focus.” [1] The ability to be fast at deploying software starts with code quality.

Embedded development needs scalability

When developers craft code, they need to focus on development processes that promote scalability, minimize system complexity, and they should always look for ways to reduce “technical debt”. By keeping the system simple, it is easier for developers to add functionality or scale up the system when necessary. However, doing this takes time to design the software properly.

The temptation in a RAD situation is to go for a “quick and dirty” solution that solves the current problem facing the development team, and this is exactly why the term “technical debt” was introduced. By not planning out an extensible solution, developers essentially doom themselves to rewriting applications for follow-on projects and also make fixing bugs or adding features a nightmare because of the tight coupling of the software. Coupling refers to how dependent the software modules are on one another such that changing any line of code (or in embedded systems, even the timing of the code) can make the whole system break. But this level of planning can seem like an extravagance when you’re staring at a deadline and can lead to what Kelsie Anderson calls “cowboy coding”. This behavior cuts corners on software quality for the sake of expediency and over time increases the system complexity through tight coupling and higher technical debt. Therefore, it actually slows down development and makes projects more expensive. [2] As Ergin states, “Without having a quality codebase, you cannot be agile.” [1]

Everything starts with code quality – it’s just that simple

How can you improve your code quality when you barely have time to even slap code together to meet the schedule? Fortunately, there are coding standards such as MISRA, CWE, and CERT C that can help you do that. We have covered these standards in a previous paper, but the crux of these standards is that they promote safe and reliable coding practices by avoiding both risky coding behavior and holes in the C and C++ language standards.

By using code analysis tools that can automatically scan your code looking for deviations from these standards, you can quickly find and fix these problems while you are still desk-checking your code. In other words, you get instantaneous feedback on your code while you are still “in the zone”. This instantaneous feedback results in the developer fixing the bugs more than 50% more often than they would if the feedback came later from a build server. [3] Following these standards helps you to instantly up your code quality game which in turn helps you reduce the technical debt in your solution. Concordantly, you inject fewer bugs into the code and what bugs do exist can be both found and fixed faster. This structured approach to coding also makes it easier to add functionality which means you can do it much faster. But that is only half of the equation because even with a quality codebase, you still have to test properly.

A combination of code quality and testing is essential

In his seminal work Code Complete, Steve McConnell ruminates on the relationship between testing and code quality:

“Testing by itself does not improve software quality. Test results are an indicator of quality, but in and of themselves they don’t improve it. Trying to improve software quality by increasing the amount of testing is like trying to lose weight by weighing yourself more often. What you eat before you step on to the scale determines how much you will weigh, and the software-development techniques you use determine how many errors testing will find. If you want to lose weight, don’t buy a new scale; change your diet. If you want to improve your software, don’t just test more; develop better.” [4]

Traditionally, testing has been a very manual process but there are new tools available to development teams that make it possible to do automated testing. While the efficacy of these tools varies from vendor to vendor, they can do amazing things such as: requirements tracing (does this code actually map back to a requirement in the specification, or is it just “gold-plating”), unit test (does this particular module do all of its functions well), integration test (do all the units play nicely with each other), and much more. The allure of automated testing can be a bit of a siren song for developers; the US Department of Defense cautions that development teams need to be realistic about their approach to automated testing in terms of what such testing can really do and what successfully passing automated tests tells you about your code’s readiness for release. [5]

Two of the most common problems in embedded software involve race conditions (where you get different outputs depending on which condition finishes first) and initialization problems. Both of these can lead to defects that manifest themselves once in a blue moon, but are very difficult to replicate and therefore difficult to fix and test. Automated testing is generally not good at finding these types of problems and oftentimes it takes creativity to find ways to make these problems manifest themselves; one classic way is to do a simultaneous multi-button push on the device. Do you get different results if the buttons are pushed at exactly the same time (within one machine cycle)? What about if one button is pushed and then another is pushed before the Interrupt Service Routine (ISR) completes for the first button push? Normally these are done with manual testing and although it may take many attempts to get the inputs just right, people are incredibly good at doing it.

One mantra of Quality Assurance is to “test early, test often” to measure how your development processes are holding up to scrutiny. This enables you to take corrective action early on in the development which can save you quite a bit of time and money. IBM estimates that if a bug costs $100 to fix during the Requirements Gathering phase of a project, it would be about $1,500 to fix during a normal test-and-fix cycle and $10,000 to fix once in production, but you can scale those numbers to your organization. [6] You quickly realize that you want to find as many bugs as possible, as early as possible. This has led to the rise of Test-Driven Development (TDD) that creates test cases out of the requirements specification and usually starts at the same time as design begins. The idea is that code is written and improved such that it will pass these tests that are based on the requirements. This leads to repetition of very short development cycles to write, test, repeat until that module works, then you move on to the next module. However, you still continue to test the first module and its integration to other modules. The idea is that by initially slowing down development, you rapidly speed it up at the end.

Move fast by starting with a focus on code quality

In summary, if you really want to move fast in development, you first have to improve your code quality through coding standards. Fortunately, there are tools that can help you with this so that you can quickly find and eradicate bugs that result from sloppy or error-prone coding practices. Secondly, you need to design your code to be scalable with loose coupling so that it’s easily changed, maintained, and extended. Finally, you need to be realistic about what automated testing can do for your organization and be aware that you can never fully automate all testing. Doing these things takes more time up front, but it pays dividends in the long run.

[1] https://www.infoq.com/articles/slow-down-go-faster/

[2] https://www.targetprocess.com/articles/speed-in-software-development/

[3] https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext

[4] Steve McConnell, Code Complete, Second Edition (Microsoft Press, 2009), 501.

[5] https://www.afit.edu/stat/statcoe_files/0214simp%202%20AST%20IG%20for%20Managers%20and%20Practitioners.pdf

[6] https://crossbrowsertesting.com/blog/development/software-bug-cost/

Move fast and break things? Not so fast in embedded.

“Rushing makes us neither faster, nor more productive”

Embedded development needs scalability

Everything starts with code quality – it’s just that simple

A combination of code quality and testing is essential

Move fast by starting with a focus on code quality

Previous PostRISC-V: Will There Be Other Open-Source Cores? | Ed Sperling, Semiconductor Engineering

Next PostERASER: Early-Stage Reliability and Security Estimation for RISC-V an Open Source Framework for Resilience/Security Evaluation and Validation in RISC-V Processors | FOSDEM, (YouTube)

Stay Connected With RISC-V