Post

Binary Static Analysis: Accuracy, Speed, or Completeness


TL;DR / Summary at the end of the post.

Full Disclosure up-front: I am employed as a Code Scanning Architect at GitHub at the time I published this article.


As in my previous post on Static Code Analysis, I will reference points made in Part 1 of this series where I discussed the technological differences between Static Code Analysis (SCA), Binary Static Analysis (BSA), and Comprehensive Static Analysis (CSA). In this post I will dive further into why Binary Static Analysis produces different results from other scanning methods; I will likewise discuss what implementation looks like for BSA, and what that means for your AppSec / DevSecOps program as a result.

When it comes to Binary Static Analysis (BSA), the idea behind the technology was a good one when it was built - and while much has changed in the software development process over the last two decades, there are still a few benefits worth mentioning. First off, for companies unwilling to expose their source code to external security tools, BSA offers a way to safely perform static analysis. Secondly, because the scan is being performed on an executable, it will assess real data flows that exist in the final application - as opposed to performing shallow data flow mapping when performing Static Code Analysis. That being said, things get complicated depending on how large and/or complex your application is.

When it comes to large and/or complex applications, Binary Static Analysis needs to declare data flow mapping “done” at some point in the analysis process. What this means is that entire sections of an application might not get mapped and scanned - thus giving security and development teams a false sense of how safe the application really is. Moreover, large and/or complex applications tend to take a long time to scan - making it difficult to implement as part of the Continuous Integration / Continuous Deployment process.


How to support this content: If you find this post useful, enjoyable, or influential (and have the coin to spare) - you can support the content I create via Patreon.️ Thank you to those who already support this blog! 😊 And now, back to that content you were enjoying! 🎉


On the other hand, when it comes to small / simple applications, Binary Static Analysis can quickly and completely map data flows. This allows for rapid analysis of things like microservices, which is a good use case for the technology. Although, BSA suffers from the same flaw as nearly all-other forms of static analysis (with the exception being Comprehensive Static Analysis) in that BSA treats all sources - including local sources such as properties files, environment variables, and operating system arguments - as tainted inputs. That said you can’t really blame BSA for failing to distinguish local from remote sources - it doesn’t have access to the source code. Either way, since BSA can’t distinguish local from remote sources, you will uncover a large volume of false positive findings - which in turn leads to increased friction between software development and security teams.

When it comes to implementing BSA as part of your software development process, there have been some improvements over the last several years. For example, BSA tools now export to Static Analysis Results Interchange Format (SARIF), which allows for results to be ingested by other tools and platforms; but unless you are consolidating your application security findings - you are likely going to be stuck with reporting vulnerabilities through PDFs and proprietary web interfaces. Standard reporting methods for BSA tools often means software engineers and project managers need to navigate away from their standard tools in order to understand how a vulnerability was found in their code. Which in turn means that security teams interrupt the development team’s flow by forcing them to endure context switching in order to understand why they can’t move their software to production.

So how does Binary Static Analysis (BSA) stack up to the competition? As previously stated, that largely depends on the size and/or complexity of the application you’re scanning. In my experience, the performance for BSA breaks down as follows:

  • Accuracy:
    • For Large / Complex Applications: Low [ – ]
    • For Small / Simple Applications: Medium [ = ]
  • Speed:
    • For Large / Complex Applications: Low [ – ]
    • For Small / Simple Applications: High [ + ]
  • Completeness:
    • For Large / Complex Applications: Low [ – ]
    • For Small / Simple Applications: Medium [ = ]
  • Impl. Complexity: Medium [ = ]
  • Developer Friction: High [ – ]

TL;DR / Summary

While Binary Static Analysis was a clever technology two decades ago, it hasn’t kept up with changes to the software development process during that time. Likewise, given challenges with mapping data flows based on the size and complexity of a binary, scan speeds, accuracy, and completeness will vary. Moreover, due to the nature of how Binary Static Analysis functions you will find that both local and remote sources are treated as vulnerable inputs - even if they cannot be exploited by user-controlled input. The best use case for this technology is for scanning microservices, but the standard methods for getting results back to your developers will disrupt their flow and introduce context switching - unless you’ve paid for a solution to aggregate your findings where the developers work.

And with that, thank you for your time and stay tuned for my next Accuracy / Speed / Completeness review on Comprehensive Static Analysis! In the interim remember to git commit && stay classy!

Cheers,

Keith // securingdev


If you found this post useful or interesting, I invite you to support my content through Patreon 😊 and thanks once again to those who already support this content!

This post is licensed under CC BY 4.0 by the author.