New Multiple Blog Feature - Site Aggregation

April 15, 2012 at 12:10 AMBen

A new feature named Site Aggregation is available in the BlogEngine.NET developer builds, and will of course be available in the next public release.  Site Aggregation is a feature related to Multiple Blog instances and this has been asked about by several people.

What is Site Aggregation?

You can designate up to 1 blog instance as being the Site Aggregation instance.  Typically it would be the primary blog instance.  When a blog instance becomes a Site Aggregation instance, the data displayed on the homepage, RSS feed and widgets contain blog posts from all the blog instances (see below for a complete list).  So the data is an aggregate of the blog posts across all instances.

This is useful for site visitors who will be able to visit the main blog, and see blog posts across all instances.  And if the RSS feed is subscribed to, that too aggregates blog posts across all instances.

What's Included

The main homepage, Posts by Tag page, and Posts by Date Range pages display the site aggregate data.  All the main widgets including the Tag Cloud, Recent Comments, Month List, Recent Posts, Calendar also take into account the site aggregate posts.  The Search results page, and RSS feed too.

For each post listed, clicking on the post title will take you to that post within that post's blog instance.  Each post usually also lists Tags and Categories tied to that post.  Clicking on a tag or category link will take you to the tag/category listing page for that post's blog instance (i.e. you will leave the site aggregate blog).

What's Not Included

Category based pages/widgets do not currently aggregate posts across all sites.  In particular, this includes the Category List widget and the Archive page.  These items still can be used on the site aggregate blog instance, but they will only display Posts for the site aggregate blog instance -- not for any of the non-site aggregate blogs.  The code needed to make these Category pages work with the site aggregated data is partially implemented in the Category core class (in particular, Category.AllBlogCategories and Category.ApplicableCategories), but some more work is needed for this.  It may not be until the following release where these category widgets/pages are site-aggregate capable.

The other item which is not included are Pages.  I think in most cases, it is preferable that Pages are not aggregated across all sites.

Configuring the Site Aggregate Blog Instance

It's very easy to designate a blog instance as being the site aggregate blog instance.  On the Blogs management page, there is a new "Is For Site Aggregation" property/checkbox you can check -- as shown below.

 

A Look at the Site Aggregate Blog Instance

In my test system, I have a primary blog instance, which is also the site aggregate blog instance, and 2 sub-blogs.  One sub-blog uses a virtual path, and the other sub-blog uses a different hostname.  Each blog instance has a post in it.  Here's what the homepage of the site aggregate blog instance looks like:

 

Similarly, here's a view of the RSS feed:

 

Site Aggregation Site - Includes "Local" Posts

The Site Aggregation site includes posts from other blog instances, as well as posts from its own instance (local posts).  This means you can create posts in the site aggregation instance as well, which is probably a handy feature for some.

Relative & Absolute Links

One issue that is accounted for is if you are using different hostnames for your blog instances.  For example, if your site aggregation instance is running at www.example.com and you have a sub-blog running at sub1.example.com, a lot of the links in BlogEngine.NET use relative paths, via RelativeLink and RelativeWebRoot.  A typical example is a hyperlink that you would find in PostView.ascx.

<a href="<%=Post.RelativeLink %>" class="taggedlink"><%=Server.HtmlEncode(Post.Title) %></a>

This URL in the HREF would end up being an invalid URL when the site aggregation site is displaying a post for sub1.example.com.  It would need to be an absolute link to "escape" out of the current hostname and get over to the other hostname (sub1.example.com).  The simple solution is to convert all these RelativeLink to AbsoluteLink.  A new property has been created off of IPublishable named RelativeOrAbsoluteLink.  This will return a relative link if it works, and if not, it will return an absolute link.  While an absolute link in theory always works, there are some advantages to using relative paths whenever possible.  So when using this new RelativeOrAbsoluteLink property, if the sub-blog's hostname matches the hostname of the site aggregation blog instance, you'll get a relative link, otherwise an absolute link.  Because the minority of BlogEngine.NET users won't be this situation where they are (a) using multiple blogs, (b) using the Site Aggregation feature, and (c) using different hostnames, RelativeOrAbsoluteLink will basically preserve backwards compatibility for everyone else not in this situation.

On this note, if you are using a custom theme, you may need to change RelativeLink in the PostView.ascx page to RelativeOrAbsoluteLink -- if you are in this situation.

A similar new property has been created in the Utils class, named RelativeOrAbsoluteWebRoot.  Just like RelativeOrAbsoluteLink, RelativeOrAbsoluteWebRoot will return the RelativeWebRoot if it works, otherwise it will return AbsoluteWebRoot.

AllBlogPosts and ApplicableBlogPosts

These are 2 new properties in the Post class, which are used to support this site aggregation feature.  AllBlogPosts is a list of posts across all blog instances.  It's very similar to Post.Posts.  Post.Posts is just for the current blog instance, where Post.AllBlogPosts is posts across all instances.  The only real challenge here is the blog posts for a particular blog instance are not loaded into memory until the first time that blog instance is hit.  If the site aggregation blog is hit before some of the sub-blogs have been hit, the data (Posts) for those sub-blogs will not yet be loaded in memory.  The AllBlogPosts property takes care of this by loading the posts of these sub-blogs into memory that have not yet had their first hit.

ApplicableBlogPosts is a shortcut property that either returns AllBlogPosts or Posts depending on which one is needed.

The Category class has similar properties that have been added to it -- AllBlogCategories and ApplicableBlogCategories.  These properties will be useful when the Archive page and Category based widgets such as the Category widget become site aggregate data capable (as mentioned above, these items are not yet site aggregate capable).

Blog and BlogId added to BusinessBase

The last notable change for this feature is the BusinessBase class now has Blog and BlogId properties.  This means objects such as Post, Category, BlogRollItem  (and anything else inheriting from BusinessBase) are now aware of which blog instance they belong to.  This has a lot of potential uses.  The main purpose for this at this point is properties such as RelativeLink (as in Post.RelativeLink) now take into account its blog instance's virtual path and hostname.  So when working with Post.AllBlogPosts which contains blog posts across all the blog instances, calling Post.RelativeLink or Post.AbsoluteLink on any of the posts in this AllBlogPosts collection will return the correct path.  As noted above, it's not safe to use Post.RelativeLink especially when working with Post.AllBlogPosts data.  Instead, Post.RelativeOrAbsoluteLink or just Post.AbsoluteLink are the safer choices.

Give it a Try

If you think this new Site Aggregation feature may be of use to you, I encourage you to try it out, either now or once the beta for the next version of BlogEngine.NET comes out.  The best place to bring up any issues or suggestions regarding this feature is in the CodePlex discussions group.

BlogEngine.NET Theme Contest

October 19, 2011 at 5:17 AMBen

If you haven't heard yet about the BE.NET theme contest, it started a couple of weeks ago, and continues on for about 1 more month -- until November 15th.

The 1st BlogEngine.NET Theme Contest!
http://www.dotnetblogengine.net/post/The-1st-BlogEngineNET-Theme-Contest!.aspx

If you know your way a bit around HTML/CSS and BE.NET themes, submit a theme, and you'll be contributing to the growing collection of BE.NET themes, and a chance to win one of the prizes.

So far there hasn't been too many submissions, so winning a prize is easier odds than a lot of other contests out there.

Posted in: General

Tags: , ,

Introducing Multiple Blogs in Single Instance for BlogEngine.NET

June 19, 2011 at 1:00 PMBen

The much requested and highly anticipated "multiple blogs" feature has made it into BlogEngine.NET 2.5.  BE.NET 2.5 will be released at the end of June.  This post goes over how it works and some information on making themes, widgets, extensions and your code be multiple-blog compatible.

Blogs Management Page

Each installation of BE.NET will have a "primary" blog instance.  By default this will be the normal blog you are accustomed to.  The primary blog has a new "Blogs" admin page.  That page looks like:

Here the blog instances (the primary blog instance and child blogs) can be managed.  A blog instance can be made Inactive.  An inactive blog instance will appear as if it does not exist.  Other properties that can be configured for each blog instance can be seen in the following "Add New Blog" dialog window:

Let's look at the properties for each blog instance.

Name

Any name you'd like to give the blog instance.  The name does not appear anywhere outside this Blogs management page.

Storage Container Name

This is the name of the folder where the blog data will live.  Basically a copy of the files and folders for each blog instance will be made -- i.e. each blog instance will have its own separate folder of data.  This makes the blog instances portable and keeps the data separated.  Databases can also be used with multiple blogs.  Even when using a database, there is still some data (blog post images/files, and a few other small files) that is stored in the file system.  So a separate folder will be created for each blog instance even when using the DbBlogProvider.

As in the past, the primary blog instance data is stored in App_Data.  There's now a new Blogs folder under App_Data.  Within App_Data\Blogs, a new folder matching the "Storage Container Name" will be created when a new blog instance is created.  The data that is initially put into this child blog storage folder is explained next.

Existing Blog to Create New Blog from

This is partially related to the above Storage Container Name.  When a new blog instance is created, the "Existing Blog to Create New Blog From" dropdown list will contain all of the existing blog instances.  The files & folders from the existing blog instance you choose here will be copied into the storage folder for the new blog instance as "initial data".  BE.NET 2.5 also includes a "Template" blog instance (which is inactive), and can be used as the existing blog instance to create new blog instances from.  The Template blog instance has the same data and settings as the primary blog.  You can optionally change the Template blog data, or even create multiple template blog instances to choose from when creating new blog instances.

If using a database, in addition to the storage container folder being copied for the new blog instance, the DB data from the existing blog instance will be copied in the DB for the new blog instance.  So for both XML and DB, the new blog instance created will start off using the exact data, files, etc that the blog instance you copied from has.

Note, there is no Template blog included with the database.  You can still create your own template blog for the DB by creating a new blog instance of your Primary blog, naming it "Template" (or any name) and customizing it to your liking.

Host Name & Virtual Path

These two properties along with the "Accept Any Text before Host Name" checkbox are what is used for BE.NET to determine which blog instance a request to the application is being made for.  Virtual Path refers to the path following the root.  Host name is the domain name (plus any subdomain).  You can mix and match Hostname and Virtual Path to create many URL combinations.  For example, the following URLs can be used:

http://www.example.com
http://www.example.com/blog1
http://blog1.example.com
http://blog2.domain.com
http://blog.domain.com/blog2
... etc ...

The 2nd and 5th examples above would be given a Virtual Path of ~/blog1 and ~/blog2 respectively.  Prefixing the virtual path with ~/ is required, and the ~/ will automatically be inserted for you in the Virtual Path field.  In the 3rd example above, the Host Name would be "blog1.example.com", in the 4th example the host name would be "blog2.domain.com" and in the 5th example, the host name would be "blog.domain.com".

If you are only using a single hostname for your blog, you can omit the hostname.  Entering it is really only required if you will be using multiple hostnames.

With the Virtual Paths above of blog1 and blog2 (examples 2 and 5), you do not need to create physical directories named blog1 and blog2.  These are virtual directories that BE.NET will look for in the URL to treat that request to the web server as being on behalf of those blog instances.  At the same time, you will want to make sure that physical directories matching the virtual paths (blog1, blog2, etc) do not already exist.

The "Accept Any Text before Host Name" checkbox is another option to allow any text (subdomains) to appear before the Host Name you enter.  If you leave this box unchecked, then the Host Name you enter will need to match exactly what is in the browser address bar.

Multiple-Blog Capable Themes, Widgets and Extensions

If you will be running a single instance blog, all of the themes, extensions and widgets that run under BE.NET 2.0 will still work perfectly fine for BE.NET 2.5.

If you will be running multiple blog instances, existing themes, extensions and widgets may need some adjustments to work with this new multiple blogs feature.  The following types of issues need to be addressed.

Caching - HttpRuntime.Cache

If an existing theme, widget or extension is storing data in the HttpRuntime.Cache, a likely needed change is to store that data separately for each blog instance.  With multiple blog instances, the same set of (physical) themes, extensions and widgets are being used across all the blog instances.  So if you have code in a widget that looks like:

HttpRuntime.Cache["recent-posts"] = recentPostList;

This same data you are storing will be shared across all blog instances that are using this widget.  This is probably not what was intended.  You can either change that code so it uses a dynamic cache key based on the "current blog instance", or as a convenient shortcut, you can use a new Caching provider built into BE.NET 2.5.  That line of code above using the new caching provider would look like:

Blog.CurrentInstance.Cache["recent-posts"] = recentPostList;

The cache provider will use your "recent-posts" cache key, prefixing it with the ID of the current blog instance (a GUID).  You can retrieve the current blog instance and/or the ID of the current blog instance via:

Blog currentInstance = Blog.CurrentInstance;
Guid currentInstanceId = Blog.CurrentInstance.Id;

Caching - Static

If your theme, widget or extension is storing data in static fields or properties, the cache provider described above cannot be used, however one approach is to convert your static data into a generic dictionary, with Guid as the key and your data type as the value.  The Guid key will represent the current blog instance ID.  You need to manage this dictionary by adding in each blog instance in the dictionary, ideally as the data needs to be stored the first time.  That code might look like:

private static Dictionary<Guid, string> _recentPostsMarkup =
    new Dictionary<Guid, string>();

private static string RecentPostsMarkup
{
	get
	{
		if (!_recentPostsMarkup.ContainsKey(Blog.CurrentInstance.Id))
		{
			_recentPostsMarkup[Blog.CurrentInstance.Id] = GetRecentPostsMarkup();
		}

		return _recentPostsMarkup[Blog.CurrentInstance.Id];
	}
}

URL Paths

This is important and a problem you may notice, especially if you are using Virtual Paths for your blog instances.  Let's look at a simple line of code in BE.NET 2.0 that worked fine there:

Response.Redirect("~/Account/login.aspx");

Simple enough.  However, if you're using Virtual Paths and you have a blog instance with a Virtual Path of "blog1", the homepage for that blog instance will be something like:

http://www.example.com/blog1

If you have code that uses that redirect shown above, you will end up at:

http://www.example.com/Account/login.aspx

When you probably wanted to end up at:

http://www.example.com/blog1/Account/login.aspx

So instead of redirecting to the login page for "blog1", you've incorrectly redirected the person to the login page for the primary blog instance.  That redirect should now look like the following to work well with multiple-blogs:

Response.Redirect(Utils.RelativeWebRoot + "Account/login.aspx");

In the past, Utils.RelativeWebRoot would resolve to the root of the blog.  Starting with BE.NET 2.5, Utils.RelativeWebRoot will resolve to the homepage/root of the current blog instance.  So when you are in "blog1", Utils.RelativeWebRoot will resolve to:

/blog1/

Similarly, Utils.AbsoluteWebRoot also takes the relative path of the current blog instance into consideration.

If you need the "real" web root of the application, you could use ~/ or you can use a new Utils.ApplicationRelativeWebRoot property.

While on the topic, a couple of other similar new properties are Blog.CurrentInstance.StorageLocation which will return a storage location beginning with ~/ to the physical location of the storage container for the current blog instance.  So Blog.CurrentInstance.StorageLocation may return a value looking like:

~/App_Data/Blogs/blog1/

BlogConfig.StorageLocation will return the same type of path, but to the direct App_Data folder (which equates to the storage location of the primary blog instance), so BlogConfig.StorageLocation will generally return:

~/App_Data/

Extensions

With extensions, the primary blog instance has the "master" switch over whether extensions are enabled.  If the primary blog instance disables an extension, that extension is no longer available to any of the child blogs.  If the primary blog instance enables an extension, by default this extension will be enabled for all the child blogs.  A child blog can disable an extension.

As long as the primary blog instance has an extension enabled, that extension will be loaded into memory and if it wires up any of the normal event handlers (Post.Serving, Commnet.Serving, etc), that extension is going to fire even if the event is occurring in a child blog that has explicilty disabled the extension.  In this scenario, the extension is handling an event for the child blog, but it should not do anything since the child blog disabled the extension.  In other words, we want the extension to "pretend" it was never called, or at least not take any action.  This is where a change is required in extensions so they check to see if the blog instance the extension is firing for has the extension enabled or disabled.  If it is disabled, then the extension should not run any of it's normal code.  The extensions included with BE.NET 2.5 have a new line of code added to them for this.  Taking the ResolveLinks extensions as an example, it has this new line of code (technically 2 lines of code) added to the beginning of PostCommentServing:

private static void PostCommentServing(object sender, ServingEventArgs e)
{
	if (!ExtensionManager.ExtensionEnabled("ResolveLinks"))
		return;

	if (string.IsNullOrEmpty(e.Body))
		return;

	e.Body = LinkRegex.Replace(e.Body, new MatchEvaluator(Evaluator));
}

The new lines of code above are lines 3 and 4.  It makes a call into the Extension Manager, passing to it the name of the extension (in this case, ResolveLinks).  ExtensionEnabled() will return false if the extension is disabled for the current blog instance.

In Summary

As mentioned earlier, if you will be running a single instance of BE.NET, these changes described above will not need to be made -- although they can be made.  Overall, I think we maintained a lot of backwards compatibility as the changes you will need to make to have a multiple-blog capable BE.NET application are very minimal.  The themes, widgets and extensions included with BE.NET 2.5 already have these changes  made to them.  Enjoy the new multiple blog capability and if you run into any issues or have any suggestions, the best place to pass along your thoughts is in the CodePlex discussion group.

BlogEngine.NET 2.0 RC - Available now!

November 24, 2010 at 12:36 AMBen

On Tuesday, a release candidate for BlogEngine.NET 2.0 was published.  Many nice features were added, with lots of great contributions from the community.  Switching to Mercurial for source code management made it very easy for people to create their own forks and for us to merge their code in.

Here's the official RC announcement that contains some additional details, including a short 5 minute video demo'ing the redesigned control panel.

If you're already using BlogEngine, I think you'll really like the improvements.  These upgrade instructions will come in handy.  If you're not using BlogEngine, what are you waiting for?!  Stop copying your friends, throw away WordPress and get BlogEngine! :)

I expect we will get some good feedback on issues people run into while using the RC.  During my own testing, it's actually working very good even now.  Depending on how well the RC goes, the final RTW version may be out in December.

If you download it and find any issues, the best place to report those are in the Issue Tracker.

Give it a try and have fun with it!

Download BlogEngine.NET 2.0 Release Candidate

Posted in: General

Tags: ,

BlogEngine 1.6 – Released!

February 1, 2010 at 7:50 PMBen

Today BlogEngine.NET 1.6 is available to everyone as announced here.

The two big features are a new Comment Management area to view, edit and delete comments across all posts.  This also includes a new automated comment moderation system.  Just like how you can build custom Extensions, custom filters can be built and plugged into the event system when a new comment is posted.  BE.NET 1.6 ships with two custom filters – a filter for Akismet filter and one for StopForumSpam.com.  It’s quite slick.

The other notable feature is Multiple Widget Zones.  This has actually been in the code base for several months now.  I blogged about Multiple Widget Zones back in April.

If you’re upgrading from BE.NET 1.5, there are a couple of changes to be aware of, including one change you must make related to the ExtensionManager sub-folder in the App_Code folder.  Simple upgrade instructions are available here in the documentation.

A more complete list of features and changes in BE.NET 1.6 can be found here.  It’s definitely a worthwhile upgrade I recommend going to.

Posted in: General

Tags: ,

CSS & JavaScript Injection Bookmarklets

January 30, 2010 at 11:34 PMBen

I like using bookmarklets to inject scripts into a page I’m on.  One of the popular ones for this is the jQueryify bookmarklet to inject jQuery into a page.

After getting the jQuerify bookmarklet, I created a separate bookmarklet to inject my own JavaScript file into a document.  This is handy when I’m debugging a website I either don’t have the files for, or just want to test changes out within the confines of my own browser without touching any of the live files.  To do this, I simply Edit the bookmark in my browser, and change the URL embedded in the bookmarklet to point to the URL of the JavaScript file I want to inject into the page.  The URL to the script can even be a script on your own computer accessible via http://localhost, if you’re too lazy to upload the script to a public website :)

Recently, I needed to inject a CSS stylesheet into the page I’m on.  The bookmarklet for this is very similar to the JavaScript injection bookmarklet.  The bookmarklet I created for this is at the bottom of this post.

Dynamic URL Bookmarklets

Even though it’s not a big hassle to Edit the CSS/JS injection bookmarks to change the URL to the JS or CSS file embedded within the bookmarklet, I realized a very convenient bookmarklet would be one that would prompt me for the URL to the CSS/JS file, and then inject that URL.  By doing this, I don’t need to edit the bookmark, and can easily inject any CSS or JS file.  It’s also easy to inject multiple URLs by running the bookmark more than once, and entering a different URL each time.

Bookmarklets – For your Browser

For convenience’s sake, I have these 4 bookmarklets down below.  Feel free to use them.  Two of the bookmarklets are for JS injections and two are for CSS injections.  Within each pair, one bookmarklet has the URL already embedded within it, and the other one will prompt you for the URL.

Just drag these links into your Bookmarks toolbar or menu area.  For reference, the bookmarklet code is under each link.

> Inject JS file <
javascript:(function(){var%20s=document.createElement('script');s.setAttribute('src','http://ajax.googleapis.com/ajax/libs/jquery/1.4.1/jquery.min.js');document.getElementsByTagName('body')[0].appendChild(s);alert('Script%20injected!');})();

> Inject JS file (get prompted) <
javascript:(function(){var%20sUrl=prompt('Enter%20URL%20to%20JavaScript%20file');if(sUrl){var%20s=document.createElement('script');s.setAttribute('src',sUrl);document.getElementsByTagName('body')[0].appendChild(s);alert('Script%20injected!');}})();

> Inject CSS file <
javascript:(function(){var%20s=document.createElement('link');s.setAttribute('href','http://l.yimg.com/a/lib/arc/core_1.0.5.css');s.setAttribute('rel','stylesheet');s.setAttribute('type','text/css');document.getElementsByTagName('head')[0].appendChild(s);alert('Stylesheet%20injected!');})();

> Inject CSS file (get prompted) <
javascript:(function(){var%20sUrl=prompt('Enter%20URL%20to%20Stylesheet');if(sUrl){var%20s=document.createElement('link');s.setAttribute('href',sUrl);s.setAttribute('rel','stylesheet');s.setAttribute('type','text/css');document.getElementsByTagName('head')[0].appendChild(s);alert('Stylesheet%20injected!');}})();

Posted in: Development

Tags: , ,

Giving Comment Spammers Less Incentive to Spam You

December 8, 2009 at 10:01 PMBen

The latest check-in of BE.NET (1.5.1.36) has a small, but important change.  The three themes included with BE.NET now include rel=”nofollow” on the links of commenter’s websites.

This is a theme specific change.  So if you’re using a custom theme, and even if you upgrade to the latest build of BE.NET, there’s a good chance you might not have the NOFOLLOW instruction on these links.  It can simply be added in the CommentView.ascx file in your theme’s blog folder.

As I’m using a custom theme myself, I just added NOFOLLOW to this blog too.  Wikipedia has a good writeup on NOFOLLOW, in case you aren’t familiar with its purpose.  I’m a little surprised it’s taken this long to get NOFOLLOW into the themes that are included with BE.NET.  Better late than never!

I get a lot of comment spam on this blog.  As I’m moderating comments, it ends up never showing up since I don’t approve any of it (TIP to spammers, stop wasting your time!).

Comment spammers are a lot more likely to leave comments on blogs that do not include NOFOLLOW.  Yes, I’m sure a lot of the spammers actually look at these types of details when scoping out blogs to attack.

Incidentally, the ResolveLinks extension that comes with BE.NET already includes the NOFOLLOW instructions.  This is the extension that will convert URLs in comments into hyperlinks.  If the extension finds a URL like www.google.com in the comment content, it will convert that into:

<a href="http://www.google.com" rel="nofollow">www.google.com</a>

This conversion is done as the comment is being served.

I’m anxious to see what type of impact adding NOFOLLOW will have on my level of comment spam.
Fingers crossed ...

App_offline.htm – Page Not Found

September 5, 2009 at 6:11 PMBen

For a few years, .NET has had the built-in capability to easily take your entire application offline when you need to make an update or perform some maintenance on your site.

By simply putting a file named app_offline.htm in the root directory of your site, ASP.NET will serve the app_offline.htm file, instead of the requested page.

I recently employed this feature for probably the first time.  I put the app_offline.htm file on the site, and pulled up my site in Firefox.  The contents of app_offline.htm displayed as expected.

However, if I were to pull my site in Chrome or IE, I would get a Page Not Found error that appeared as though my entire site did not exist.

App_offline.htm result in IE8:

App_offline.htm in IE

App_offline.htm result in Chrome: 

App_offline in Chrome

As mentioned above, in Firefox, the contents of App_offline.htm would display as expected.

The problem is that when ASP.NET serves the App_offline.htm file, the HTTP Response code it passes out is 404.  Chrome will display the page shown above for 404 errors.  In IE, you can actually avoid that generic error page shown above if you turn off HTTP Friendly errors.

But I obviously cannot expect IE visitors to my site to have HTTP Friendly errors turned off.

The way ASP.NET has implemented app_offline.htm by passing out a 404 HTTP status code is not well designed, in my opinion.  A much better implementation would be for ASP.NET to return a normal 200 HTTP status code.

To accomplish this, for this site, I created a simple HTTP Module that processed the beginning of each request.  It checks an “offline” appSetting in web.config to see if the application should be offline.  If the offline setting is turned on, the module will do a server transfer to my own app offline HTML file.

One thing I found on an IIS7 server is requests for items such as JPG, GIF, CSS files, etc. will also go through this HTTP module.  This is normally a great benefit of IIS7’s integrated mode pipeline.  However, if the application offline HTML file includes an IMG tag for an image on the same site, or a link to a CSS file on the same site, the HTTP module is also going to do a server transfer for these other files (JPG, CSS, etc).  This will result in the image not displaying on the application offline page, or the CSS file not loading in the browser, etc.

A simple filter in the HTTP module to only do a server transfer for actual pages is all that is required.  The fairly simple HTTP Module I ended up creating is below.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Configuration;
using System.IO;

public class AppOffline : IHttpModule
{
    public void Dispose()
    {
    }

    public void Init(HttpApplication context)
    {
        context.BeginRequest += new EventHandler(context_BeginRequest);
    }

    void context_BeginRequest(object sender, EventArgs e)
    {
        HttpContext context = ((HttpApplication)sender).Context;

        if (ConfigurationManager.AppSettings["offline"] == "true")
        {
            string extension = Path.GetExtension(context.Request.Path);

            // Don't server transfer for extensions like .JPG, .CSS, etc.
            string targetedExtensions = ".aspx.ashx.asmx";
            if (targetedExtensions.IndexOf(extension, StringComparison.OrdinalIgnoreCase) == -1)
                return;
            
            context.Server.Transfer("~/application_offline.html");
        }
    }
}

It’s a pretty simple, but effective HTTP module.  When the server transfer is done to my own application offline HTML file, the HTTP status code returned to the client is 200.  No more Page Not Found problems with browsers like IE and Chrome.

Posted in: Development

Tags: ,

Performance: Compiled vs. Interpreted Regular Expressions

August 6, 2009 at 10:33 PMBen

When a regular expression in .NET will be used multiple times, it’s common to create that Regex with the Compiled flag, e.g. RegexOptions.Compiled.  Compiled regexp’s take a bit more time to create initially, but will run faster than a regexp created without the Compiled flag.  At least that’s what the documentation states!

Without the Compiled flag, your regexp will be interpreted.  There’s even "precompiled” regular expressions.  You need to compile these regular expressions into an assembly before runtime.  This might be a good option if you have constant regexps that don’t change.  If your regexps are subject to change, pre-compiled is not a good option.  These three types of regexp’s (interpreted, compiled and pre-compiled) are explained with a few more technical details in this somewhat dated MS blog article.

Theory is great, but real benchmarks are more meaningful.  I’ve assembled some code that benchmarks the difference in speed it takes to create and run 5,000 regular expressions.  There’s actually a big difference in the time taken to run a compiled regular expression the first time, versus subsequent times.  So the results shown here will include the first run time as well as the subsequent run times.

Here’s some code to get us started:

    private static List<Regex> _expressions;
    private static object _SyncRoot = new object();

    private static List<Regex> GetExpressions()
    {
        if (_expressions != null)
            return _expressions;

        lock (_SyncRoot)
        {
            if (_expressions == null)
            {
                DateTime startTime = DateTime.Now;

                List<Regex> tempExpressions = new List<Regex>();
                string regExPattern =
                    @"^[a-zA-Z0-9]+[a-zA-Z0-9._%-]*@{0}$";

                for (int i = 0; i < 5000; i++)
                {
                    tempExpressions.Add(new Regex(
                        string.Format(regExPattern,
                        Regex.Escape("domain" + i.ToString() + "." +
                        (i % 3 == 0 ? ".com" : ".net"))),
                        RegexOptions.IgnoreCase | RegexOptions.Compiled));
                }

                _expressions = new List<Regex>(tempExpressions);
                DateTime endTime = DateTime.Now;
                double msTaken = endTime.Subtract(startTime).TotalMilliseconds;
            }
        }

        return _expressions;
    }

We’re storing 5,000 regular expressions in a static list.  Notice the RegexOptions.Compiled flag is being used.  The regexp’s are just looking for email addresses with specific domain names – domain1.net, domain2.net, domain3.com, etc.  Not very useful, but I just wanted the regexp’s to vary.  You can see we’re also recording the number of milliseconds taken to create the regular expressions.  Now here’s the code that calls GetExpressions() and actually invokes the IsMatch function on each regexp.

    private static void CheckForMatches(string text)
    {
        List<Regex> expressions = GetExpressions();
        DateTime startTime = DateTime.Now;

        foreach (Regex e in expressions)
        {
            bool isMatch = e.IsMatch(text);
        }

        DateTime endTime = DateTime.Now;
        double msTaken = endTime.Subtract(startTime).TotalMilliseconds;
    }

And we call CheckForMatches like so:

    CheckForMatches("some random text with email address, [email protected]");

How much time does it take to create and run these 5,000 compiled expressions?  Here’s what I get:

Compiled Regular Expressions
CREATION TIME: 1662 ms
FIRST RUN TIME: 25137 ms
SUBSEQUENT RUN TIMES: 41 ms

Subsequent runs of all 5,000 expressions is very fast.  However, look how much time it takes the first time these 5,000 expressions are run in CheckForMatches() – 25 seconds!!!

Let’s make ONE change.  Remove the RegexOptions.Compiled flag.  By doing this, our regular expressions will be interpreted.  Here’s what we get:

Interpreted Regular Expressions
CREATION TIME:
493 ms
FIRST RUN TIME: 22 ms
SUBSEQUENT RUN TIMES: 20 ms

Interpreted regexp’s beat compiled in every category!  Running these tests several times produces similar results.  The BIG difference here is obviously the First Run Time.  25 seconds versus .022 seconds.

I’ve seen some benchmarks showing static regexp’s performing a little slower than instance regexp’s.  I ran the same tests without the static modifier on the fields and methods above.  Same results – using the Compiled flag takes around 25 seconds for the regular expressions to run the first time.  Without the Compiled flag, they run in hundredths of a second.

Clearly, interpreted regexps are the winner.  Granted, if you’re only dealing with a small number of regular expressions, and you use the compiled flag, the first run time isn’t going to be anywhere near what I’ve shown here with 5,000 regexps.  However, even with just a few regular expressions, in .NET, you’ll see me sticking with interpreted regular expressions!

Show the MAX length Requirement Too!

July 12, 2009 at 2:08 AMBen

When creating an account online and you need to enter your desired username and password, it's common for there to be a note regarding the minimum number of characters required for a username/password.  But the maximum character limit is often omitted in this note.  Particularly for passwords, when allowed, I try to make 36 character passwords.  My passwords are just random characters of letters, numbers and special characters.  Every time I spend 30 seconds creating one of these passwords, I always tell myself I need to get one of those random password generators -- but end up never getting one!

Back to my point -- all sites should be making use of the "maxlength" attribute on a text input element.  Not just for usernames/passwords, but for every piece of data that is accepted through a text input.  At the very very least, indicating the maximum length in a note next to the input field would be appreciated.

On numerous sites I've registered at (including large sites), there is no note, and there is no maxlength on the input field.  I spend my 30 seconds typing in a 36 character password to find out when I click the Submit button that the password is too long.  In a couple of cases, the site doesn't even tell you the password is too long.  They just report "Invalid Password".  I'll try shortening it in chunks until they take it.

This may just be a small user experience point, but adding a maxlength attribute takes no time at all and provides immediate knowledge you've reached the max character limit.