A place to cache linked articles (think custom and personal wayback machine)
您最多选择25个主题 主题必须以字母或数字开头,可以包含连字符 (-),并且长度不得超过35个字符

title: Thou shalt not depend on me: analysing the use of outdated JavaScript libraries on the web url: https://blog.acolyer.org/2017/03/07/thou-shalt-not-depend-on-me-analysing-the-use-of-outdated-javascript-libraries-on-the-web/ hash_url: 666ebda4e2

Thou shalt not depend on me: analysing the use of outdated JavaScript libraries on the web Lauinger et al., NDSS 2017

Just based on the paper title alone, if you had to guess what the situation is with outdated JavaScript libraries on the web, you’d probably guess it was pretty bad. It turns out it’s very bad indeed, and we’ve created a huge mess with nowhere near enough attention being paid to the issue. The first step towards better solutions is recognising that we have a problem, and Lauinger et al., do a tremendous job in that regard.

In this paper, we conduct the first comprehensive study of client-side JavaScript library usage and the resulting security implications across the Web. Using data from over 133K websites, we show that 37% of them include at least one library with a known vulnerability; the time lag behind the newest release of a library is measured in the order of years.

For example, 36.7% of jQuery includes, 40.1% of Angular, and an astonishing 86.6% of Handlebars includes use a vulnerable version. Those are headline grabbing numbers, but when you go deeper it turns out there’s no quick fix in sight because the root causes are systemic in the JavaScript ecosystem:

Perhaps our most sobering finding is practical evidence that the JavaScript library ecosystem is complex, unorganised, and quite “ad hoc” with respect to security. There are no reliable vulnerability databases, no security mailing lists maintained by library vendors, few or no details on security issues in release notes, and often, it is difficult to determine which versions of a library are affected by a specific reported vulnerability.

Let’s briefly look at how the authors collected their data before diving deeper into what the results themselves tell us.

Data gathering methodology

The team crawled the Alexa Top 75K websites (ALEXA) and also a random sample of 75K websites drawn from the .com domain (COM). This enables a comparison of JavaScript usage across popular and unpopular websites.

Figuring out which JavaScript libraries were actually in use, and their versions, took quite a bit of work. As did figuring out which versions contain vulnerabilities. Details of the tools and techniques used for this can be found in the paper, in brief it involved:

  • Manually constructing a catalogue of all releases versions of the 72 most popular open source libraries (using popularity statistics from Bower and Wappalyzer).
  • Using static and dynamic analysis techniques to cope with the fact that developers often reformat, restructure, or append code making it difficult to detect library usage in the wild
  • Implementing an in-browser causality tracker to understand why specific libraries are loaded by a given site.

Vulnerabilities

The last step towards building our catalogue is aggregating vulnerability information for our 72 JavaScript libraries. Unfortunately, there is no centralised database of vulnerabilities in JavaScript libraries; instead, we manually compile vulnerability information from the Open Source Vulnerability Database (OSVDB), the National Vulnerability Database (NVD), public bug trackers, GitHub comments, blog posts, and the vulnerabilities detected by Retire.js. Overall, we are able to obtain systematically documented details of vulnerabilities for 11 of the JavaScript libraries in our catalogue.

Causality trees

To figure out why a certain library is being loaded, the authors develop a causality tree chrome extension. Nodes in the tree are snapshots of elements in the DOM at a specific point in time, and edges denote “created by” relationships.

For example:

and

The median causality tree in ALEXA contains 133 nodes, and the median depth is 4 inclusions.

JavaScript library market share

jQuery remains by far the most popular library, found on 84.5% of ALEXA sites. Note also SWFObject (Adobe Flash) still used on 10.7% of ALEXA sites despite being discontinued in 2013.

When externally loaded, scripts are mostly loaded from CDNs (note also the domain parking sites popping up in the long tail of the COM sites):

Overall though, there seems to be a pretty even split between internally hosted and CDN-delivered script libraries:

Distribution of vulnerable libraries

37.8% of ALEXA sites use at least one library version known to the authors to be vulnerable.

Highly-ranked websites tend to be less likely to include vulnerable libraries, but they are also less likely to include any detected library at all. Towards the lower ranks, both curves increase at a similar pace until they stabilise. While only 21 % of the Top 100 websites use a known vulnerable library, this percentage increases to 32.2 % in the Top 1 k before it stabilises in the Top 5 k and remains around the overall average of 37.8 % for all 75 k websites.

37.4% of the COM sites use at least one vulnerable library. Within the ALEXA grouping, financial and government sites are the worst, with 52% and 50% of sites containing vulnerable libraries respectively.

The following table shows the percentage of vulnerable copies in the wild for jQuery, jQ-UI, Angular, Handlebars, and YUI 3.

(Click for larger view)

In ALEXA, 36.7% of jQuery inclusions are known vulnerable, when at most one inclusion of a specific library version is counted per site. Angular has 40.1% vulnerable inclusions, Handlebars has 86.6%, and YUI 3 has 87.3% (it is not maintained any more). These numbers illustrate that inclusions of known vulnerable versions can make up even a majority of all inclusions of a library.

Many libraries it turns out are not directly included by the site, but are pulled in by other libraries that are. “Library inclusions by ad, widget, or tracker code appear to be more vulnerable than unrelated inclusions.

Another interesting analysis is the age of the included libraries – the data clearly shows that the majority of web sites use library versions released a long time ago, suggesting that developers rarely update their library dependencies once they have deployed a site. 61.7% of ALEXA sites are at least one patch version behind on one of their included libraries, and the median ALEXA site uses a version released 1,177 days before the newest release of the library. Literally years out of date.

Duplicate inclusions

If you like a little non-determinism in your web app (I find it always make debugging much more exciting 😉 ), then another interesting find is that many sites include the same libraries (and multiple versions thereof) many times over!

We discuss some examples using jQuery as a case study. About 20.7 % of the websites including jQuery in ALEXA (17.2 % in COM) do so two or more times. While it may be necessary to include a library multiple times within different documents from different origins, 4.2 % of websites using jQuery in ALEXA include the same version of the library two or more times into the same document (5.1 % in COM), and 10.9 % (5.7 %) include two or more different versions of jQuery into the same document. Since jQuery registers itself as a window-global variable, unless special steps are taken only the last loaded and executed instance can be used by client code. For asynchronously included instances, it may even be difficult to predict which version will prevail in the end.

What can be done?

So where does all this leave us?

From a remediation perspective, the picture painted by our data is bleak. We observe that only very small fraction of potentially vulnerable sites (2.8 % in ALEXA, 1.6 % in COM) could become free of vulnerabilities by applying patch-level updates, i.e., an update of the least significant version component, such as from 1.2.3 to 1.2.4, which would generally be expected to be backwards compatible. The vast majority of sites would need to install at least one library with a more recent major or minor version, which might necessitate additional code changes due to incompatibilities.

Version aliasing could potentially help (specifying only a library prefix, and allowing the CDN to return the latest version), but only a tiny percentage of sites use it (would you trust the developers of those libraries not to break your site, completely outside of your control?). Note that:

Google recently discontinued this service, citing caching issues and “lack of compatibility between even minor versions.”

We need proper dependency management which makes it clear which versions of libraries are being used, coupled with knowledge within the supply chain of vulnerabilities. “This functionality would ideally be integrated into the dependency management system of the platform so that a warning can be shown each time a developer includes a known vulnerable component from the central repository.

Of course, that can only work if we have some way of figuring out which libraries are vulnerable in the first place. The state of the practice here is pretty damning :

Unfortunately, security does not appear to be a priority in the JavaScript library ecosystem. Popular vulnerability databases contain nearly no entries regarding JavaScript libraries. During this entire work, we did not encounter a single popular library that had a dedicated mailing list for security announcements (in fact, most libraries we investigated did not have a mailing list for announcements at all). Furthermore, only a few JavaScript library developers provide a dedicated email address where users can submit vulnerability reports….

Consider jQuery, one of the most widely used libraries:

Although jQuery is an immensely popular library, the fact that searching for “security” or “vulnerability” in the official learning centre returns “Apologies, but nothing matched your search criteria” is an excellent summary of the state of JavaScript library security on the Internet, circa August 2016

Since we also know that many libraries are only indirectly loaded by web sites, and are brought in through third-party components such as advertising, tracking, and social media code, even web developers trying to stay on top of the situation may be unaware that they are indirectly introducing vulnerable code into their websites.