Our Static Analysis Metrics and Features
If you've seen our post about the score distributions in OSX, Linux, and Windows 10 base installs, your first question is probably about what factors go into computing those scores. This post will provide some high level understanding of what factors we consider for those static analysis scores.
The main question we're trying to find the answer to is, "How difficult is it for an attacker to find a new exploit for this software?". Attackers have limited resources, and just like anyone else, they don't like to waste their time doing something the hard way if an easier path is available. They have tricks and heuristics they use to assess the relative difficulty to exploit software, so that they can focus on the easy targets and low hanging fruit. Mudge has had a long career of doing just that (legally, and for research purposes), so he's developed his own personal toolkit of measurements to take when assessing software risk. By consulting with other luminaries of the security field who have extensive exploit development experience, we've been able to build up the list of static analysis metrics and features which we currently assess.
There are three main categories of static analysis features that we look at for a software binary:
- Complexity
- Application Armoring
- Developer Hygiene
For each of these, I'm going to list some of the main features and why these are important to have when estimating software risk.
Complexity
First up, why does complexity matter? Because more complex code is harder to review and maintain, and is more likely to contain bugs. This is why NASA/JPL put limits on things like function size for code that's going into critical systems.
We look at more complex measurements like cyclomatic complexity, but my favorite features are the simpler ones:
- Code Size
- Number of Conditional Branches
- Size & Number of Stack Adjusts
- Function Size
Measurements like code size, number of branches and stack adjusts have a roughly lognormal distribution. In general, we expect the density of branches to be constant. That is, as code size increases, the counts for branches or stack adjusts should increase correspondingly. If the number of branches is highly disproportionate compared to the code size, giving the code a high branch density, that indicates a high level of complexity (and thus a higher level of risk).
Application Armoring
Modern compilers, linkers, and loaders come with lots of nifty safety features, but they won't do you any good if the software doesn't have them enabled. These features are to software what airbags and seatbelts are to cars: things that are known and proven to improve safety, and whose use should be established by now as industry-standard. If your car doesn't have airbags, you're entitled to know that before you buy it.
We've broken these features out by which part of the software lifecycle is responsible for implementing it.
- Compiler
- Stack Guards
- Function Fortification
- Control Flow Integrity (CFI/CPI)
- ...
- Linker
- Address Space Layout Randomiziation (ASLR)
- Segment and Section ordering
- ...
- Loader
- Section, Segment execution chars
- Allocations and access
- Code signing and/or verification
- ...
Developer Hygiene
There are things we can learn about a developer's security skill and knowledge based on what functions are used in the code they write. Some functions, like strcpy and strcat, are hard to use without introducing vulnerabilities. Less risky functions like strncpy and strncat are better, but still easy to use incorrectly. On the other hand, there are functions like strlcpy and strlcat, which were written with security considerations in mind, and which are difficult to use incorrectly. Finally, there are a few functions, such as system or gets, that should never be used in commercial code, because they simply pose too great of a security risk.
We call these four categories of fuctions "bad", "risky", "good", and "ick". We have about 500 POSIX and ANSI functions that fall into these categories, and by looking at the frequency, count, and consistency of the function categories used we can learn a lot about the developer practices of a particular software vendor.