Keepin’ me regular…

โ€”

by

in
rexeg = leet!

Well, I finally got it nailed… I think. I have been working on a [[wp:wiki|Wiki]] for our own CMS and as part of the effort for client “R”. The core of this system is the ability to translate specially formatted tags into links as the data is pulled from the database. I use a variation of this format here on this weblog.

For instance, the link to the term “Wiki” above was actually formatted like this:

[ [wp:wiki|Wiki] ]  <- take note that the ]] aren’t next to each other ๐Ÿ™‚

That means that I want to pass the term ‘wiki’ in the namespace ‘wp’ and the linked word will actually be ‘Wiki’. The namespace for ‘wp’ (wikipedia) triggers a driver routine that knows how to handle that type of link and whammo, a link to the [[wp:Wikipedia]].

Anyway, I have obviously wound up working on the right way to handle this type of thing, and clearly it was a job for [[wp:regular_expression|regular expressions]] in my mind. I went ahead and did some research and found out that .Net has some really cool and powerful regex routines including powerful classes. For my purposes, in addition to handling the parsing, I can insert my own code to handle the matches and thus my issue of interfacing to a database is solved.

I spent a lot of time on and off over the last months learnign about how regexes work, using them in projects and so on. For all that tiem an invaluable addition to my life is Expresso – an interactive regex workbench that lets me see what’s happening every step of the way. This thing rocks.

If your interested, here is the expression I am using, if you can show me a better way please [[let me know]]:

[[((?<Namespace>.+):)*(?<Term>.+?)(|(?<Alias>.+))*]]

And it correctly parses these test cases – mind the extra space between the [ [ to avoid tripping my parser here, take that out if your gonna try these at home:

[ [_name_ _name_:_term_ _term_|_alias_ _alias_] ]
[ [_term_ _term_|_alias_ _alias_] ]
[ [_name_ _name_:_term_ _term_] ]
[ [_term_ _term_] ]

The only annoyance is I wind up with two extra capture groupings that aren’t named. I am sure there is a way to avoid that in the regex itself, but for not I just turn on the “Explicit Capture” option and all is good. The regex runs, I get a call to my MatchEvaluator for each hit and the named groups give me access to what was found. A quick trip to the database to handle some stuff and I can pass back a nicely formatted <A> link that is just what I want.

A snippit of the actual C# code for this is deal is below:

using System.Text.RegularExpressions;

Regex regex = new Regex(
@”[[((?<Namespace>.+):)*(?<Term>.+?)(|(?Alias>.+))*]]”,
RegexOptions.Multiline | RegexOptions.ExplicitCapture
);

Features I want to add? So far, the only one I am looking for is a way to escape the [[ pairs. But to be honest I think [[ would do it … I just want it “officially” in the regex.