Performance: Compiled vs. Interpreted Regular Expressions

August 6, 2009 at 10:33 PMBen

When a regular expression in .NET will be used multiple times, it’s common to create that Regex with the Compiled flag, e.g. RegexOptions.Compiled.  Compiled regexp’s take a bit more time to create initially, but will run faster than a regexp created without the Compiled flag.  At least that’s what the documentation states!

Without the Compiled flag, your regexp will be interpreted.  There’s even "precompiled” regular expressions.  You need to compile these regular expressions into an assembly before runtime.  This might be a good option if you have constant regexps that don’t change.  If your regexps are subject to change, pre-compiled is not a good option.  These three types of regexp’s (interpreted, compiled and pre-compiled) are explained with a few more technical details in this somewhat dated MS blog article.

Theory is great, but real benchmarks are more meaningful.  I’ve assembled some code that benchmarks the difference in speed it takes to create and run 5,000 regular expressions.  There’s actually a big difference in the time taken to run a compiled regular expression the first time, versus subsequent times.  So the results shown here will include the first run time as well as the subsequent run times.

Here’s some code to get us started:

    private static List<Regex> _expressions;
    private static object _SyncRoot = new object();

    private static List<Regex> GetExpressions()
    {
        if (_expressions != null)
            return _expressions;

        lock (_SyncRoot)
        {
            if (_expressions == null)
            {
                DateTime startTime = DateTime.Now;

                List<Regex> tempExpressions = new List<Regex>();
                string regExPattern =
                    @"^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@{0}$";

                for (int i = 0; i < 5000; i++)
                {
                    tempExpressions.Add(new Regex(
                        string.Format(regExPattern,
                        Regex.Escape("domain" + i.ToString() + "." +
                        (i % 3 == 0 ? ".com" : ".net"))),
                        RegexOptions.IgnoreCase | RegexOptions.Compiled));
                }

                _expressions = new List<Regex>(tempExpressions);
                DateTime endTime = DateTime.Now;
                double msTaken = endTime.Subtract(startTime).TotalMilliseconds;
            }
        }

        return _expressions;
    }

We’re storing 5,000 regular expressions in a static list.  Notice the RegexOptions.Compiled flag is being used.  The regexp’s are just looking for email addresses with specific domain names – domain1.net, domain2.net, domain3.com, etc.  Not very useful, but I just wanted the regexp’s to vary.  You can see we’re also recording the number of milliseconds taken to create the regular expressions.  Now here’s the code that calls GetExpressions() and actually invokes the IsMatch function on each regexp.

    private static void CheckForMatches(string text)
    {
        List<Regex> expressions = GetExpressions();
        DateTime startTime = DateTime.Now;

        foreach (Regex e in expressions)
        {
            bool isMatch = e.IsMatch(text);
        }

        DateTime endTime = DateTime.Now;
        double msTaken = endTime.Subtract(startTime).TotalMilliseconds;
    }

And we call CheckForMatches like so:

    CheckForMatches("some random text with email address, address@domain200.com");

How much time does it take to create and run these 5,000 compiled expressions?  Here’s what I get:

Compiled Regular Expressions
CREATION TIME: 1662 ms
FIRST RUN TIME: 25137 ms
SUBSEQUENT RUN TIMES: 41 ms

Subsequent runs of all 5,000 expressions is very fast.  However, look how much time it takes the first time these 5,000 expressions are run in CheckForMatches() – 25 seconds!!!

Let’s make ONE change.  Remove the RegexOptions.Compiled flag.  By doing this, our regular expressions will be interpreted.  Here’s what we get:

Interpreted Regular Expressions
CREATION TIME:
493 ms
FIRST RUN TIME: 22 ms
SUBSEQUENT RUN TIMES: 20 ms

Interpreted regexp’s beat compiled in every category!  Running these tests several times produces similar results.  The BIG difference here is obviously the First Run Time.  25 seconds versus .022 seconds.

I’ve seen some benchmarks showing static regexp’s performing a little slower than instance regexp’s.  I ran the same tests without the static modifier on the fields and methods above.  Same results – using the Compiled flag takes around 25 seconds for the regular expressions to run the first time.  Without the Compiled flag, they run in hundredths of a second.

Clearly, interpreted regexps are the winner.  Granted, if you’re only dealing with a small number of regular expressions, and you use the compiled flag, the first run time isn’t going to be anywhere near what I’ve shown here with 5,000 regexps.  However, even with just a few regular expressions, in .NET, you’ll see me sticking with interpreted regular expressions!

Comments are closed