Post

The 3 types of Static Analysis


TL;DR (with a link to the official Ted Lasso biscuit recipe at the end)

Full disclosure up-front: I am employed as a Code Scanning Architect at GitHub at the time of publishing this post.


It has been two decades since Fortify released what is widely considered one of the first Static Analysis tools focused on security. Much has changed in the world of software development since then, and yet much has remained the same when it comes to statically analyzing software for vulnerabilities.

In the course of the last twenty years we have seen at least fifteen companies offer such broadly-named “Static Analysis” tools on the market, but in reality these tools can be boiled down to just three methods - with the vast majority falling into the first category. If you’re looking to leverage a static analyzer in 2022, your technology options boil down to one of the following:

  1. Static Code Analysis (the OG “SCA”, before supply chain was the new hotness)
  2. Binary Static Analysis (i.e. BSA)
  3. Comprehensive Static Analysis (my term, but for now - CSA)
  4. How they’re different: Local vs. Remote Sources

The real difference between these three methods for analysis comes from how the technologies interpret the source code / application being scanned. Let’s start with the first - and arguably most (re)used - method of scanning: Static Code Analysis (SCA).

Static Code Analysis (sometimes referred to as Semantic Code Analysis) does something I like to think of as “reading the recipe”. While newer SCA tools have supposedly enhanced their solutions by either translating source code to simplified representations before scanning, or introducing “machine learning” and/or “artificial intelligence” to somehow improve their scans - these solutions generally lack an understanding of how a compiler or interpreter optimizes code, and therefore cannot accurately understand how data flows through an application.

You can test this yourself by dropping a vulnerable function into your code that takes input but isn’t reachable, called, or executed - and never passes results to other functions of your application. You might even try introducing local sources such as importing properties information. Is this vulnerability remotely exploitable? No, but you might be surprised by the scanning results.

On the opposite end of the static analysis spectrum there is Binary Static Analysis (BSA). The simplest explanation for how BSA operates is that it “eats the biscuits” - no recipe required. During my brief time working at Veracode as a Code Security Engineer, I learned how this technology was developed by DilDog while at L0pht circa 1999, and was later purchased by @Stake in 2002. Apparently Symantec’s acquisition of @Stake lead to the technology being shelved for a couple of years before it was spun-out by Chris Wysopal, Christien Rioux and others to become the static analysis company known as Veracode in 2006.


How to support this content: If you find this post useful, enjoyable, or influential (and have the coin to spare) - you can support the content I create via Patreon.️ Thank you to those who already support this blog! 😊 And now, back to that content you were enjoying! 🎉


Anyway, the key differentiator between Static Code Analysis (SCA) and Binary Static Analysis (BSA) is that the former requires source code (and doesn’t care about compilation / interpretation) while the latter wants the executable file (and doesn’t necessarily care about the source code). Unlike SCA, Binary Static Analysis will map real data flows through an application - but lacks complete visibility during this process, as it doesn’t see the build steps. Unfortunately, mapping an executable has to be declared “done” at some point in very large applications, lest the scanning engine run for very long periods of time before assessing vulnerabilities.

Finally, we have the newest technology to enter the scene - Comprehensive Static Analysis (a term I’ve coined - i.e. CSA). Unlike Static Code Analysis (”reading the recipe”) and Binary Static Analysis (”eating the biscuits”), Comprehensive Static Analysis is akin to a “baker” following the process start-to-finish. What differentiates CSA from the other static analysis technologies is that it needs access to both the source code and the build process in order to function properly; as for interpreted languages, CSA sort-of performs a “compilation” process in order to map data flows and build linkers between variables, functions, expressions, classes, etc. The important takeaway here is that CSA builds a comprehensive map of the application before scanning, thus understanding how data flows and the difference between remote sources controlled by user inputs, local sources controlled by application developers / the underlying system, or whether sections of code are optimized-out of the application and are otherwise benign. Of course this requires accurate modeling of languages, frameworks, and libraries to be effective when scanning for security vulnerabilities.

Regardless of the technology you choose, it’s safe to say that each method for static analysis was born out of the times in which they were built. In the early 2000’s, companies were unwilling to share their source code with external companies - and so static analysis tools had to be hosted in company data centers alongside source code management solutions in order to function. Toward the mid-2000’s companies were still reluctant to share their source code, and so companies like Veracode simply asked for executables to scan (albeit with some compilation requirements from their customers before doing so). In the twenty-teens we have witnessed a massive shift toward implementing static analysis directly in source code management platforms, where security works alongside software engineering in the development process. This latest evolution has lead to a shift in static analysis methods that more accurately map applications for vulnerabilities.

TL;DR / Summary

There are ultimately just 3 kinds of static analysis technologies - static code analysis, binary static analysis, and comprehensive static analysis (the last being a term I’ve coined, anyway). The first reads the recipe, the second eats the biscuits, and the third acts as a baker reading the recipe, baking the biscuits, and then eating them. They’ll all try to tell you what’s wrong with your recipe, but comprehensive static analysis is the only one that accurately maps data flows in your code when it is built / interpreted. These technologies are largely a product of the time when they were built, and each have their respective benefits and drawbacks - which will become clearer in part 2 of this series. In the mean time, have a biscuit and remember to git commit && stay classy!

Cheers,

Keith // securingdev


If you found this post useful or interesting, I invite you to support my content through Patreon 😊 and thanks once again to those who already support this content!

This post is licensed under CC BY 4.0 by the author.