Regex & Minified Files

I came across an interesting bug today that could show up in environments where source content is minified. The core of the issue was a regular expression (aka regex) that would work on content that had whitespace and newlines, but not on the same content which had been minified (whitespace & newlines removed, names shortened etc.)

Scenario

I was testing my site (this site) for Microsoft Edge compatability using the static code analysis tool available on modern.ie and began working on the first issue it reported back which was that the jQuery version it found was not compatible/outdated. Ok easy fix right? However there was something odd about the details in the report (check the version numbers):

weird version numbers

Well that’s weird, jQuery isn’t up to version 3 yet, in fact the most recent version is 2.1.4, what’s going on here?

Open Source

Luckily for me the tester is opened sourced and available on GitHub! So I cloned away and got to debugging the issue. I traced it down to the following regular expression in lib\checks\check-libs.js:284

var regex = /(?:jquery[,\)].*=")(\d+\.\d+)(\..*?)"/gi;

regex101.com

regex result

regex101.com is an awesome tool that will help you not only understand what the regex is doing but how it’s doing it. As you can see the above regex is incorrectly extracing jQuery & a version number from this bootstrap.min.js file (which has been minified). However it works as expected (as in does not match) on the non minified version of bootstrap.js.

What went wrong?

The regex starts off by looking for either jquery, or jquery) (that’s what jquery[,\)]. does). Then is skips ahead .* until it finds =", from there is tries to match and capture the version #. The problem? It’s that .* which by definition “matches any character (except newline)”.

Except newline eh? Well that could be a problem when there are no newlines because the file has been minified!

Beware the .*

.* is a powerful and often overused expression in regex and can cause these sorts of headaches. You are essentially saying “match anything to infinity”, how can that possibly go wrong right? :)

Limit your range with a quantifier (other than *)

What you can do instead of using .* is use a quantifier like .{0,100} which is saying “Match anything between 0 and 100 times”.

More on quantifiers in regular expressions here.

Obligitory XKCD

Perl Problems
(link to comic)

@marc_gagne

Leave a Comment

Your email address will not be published. Required fields are marked *