<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>John Sheehan : Blog &#187; Code Review</title>
	<atom:link href="http://john-sheehan.com/blog/index.php/category/code-review/feed/" rel="self" type="application/rss+xml" />
	<link>http://john-sheehan.com/blog</link>
	<description></description>
	<lastBuildDate>Mon, 30 Aug 2010 22:13:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
<atom:link rel="hub" href="http://pubsubhubbub.appspot.com"/><atom:link rel="hub" href="http://superfeedr.com/hubbub"/>		<item>
		<title>Code Review: A simple markup processor</title>
		<link>http://john-sheehan.com/blog/code-review-a-simple-markup-processor/</link>
		<comments>http://john-sheehan.com/blog/code-review-a-simple-markup-processor/#comments</comments>
		<pubDate>Mon, 29 Dec 2008 06:43:54 +0000</pubDate>
		<dc:creator>John</dc:creator>
				<category><![CDATA[.NET]]></category>
		<category><![CDATA[C#]]></category>
		<category><![CDATA[Code Review]]></category>

		<guid isPermaLink="false">http://john-sheehan.com/blog/?p=103</guid>
		<description><![CDATA[The purpose of this post is to find a better way to write the code shown below. The examples below are not meant to be best practices or examples I would expect to be used in any system. I came up with this and am posting it to get a code review from anyone who [...]]]></description>
			<content:encoded><![CDATA[<p>The purpose of this post is to find a better way to write the code shown below. The examples below are not meant to be best practices or examples I would expect to be used in any system. I came up with this and am posting it to get a code review from anyone who stumbles upon it. I would love to find a better way to accomplish this, so if you have ideas or suggestions, please post them in the comments.</p>
<p>I have a side project I’m working on which requires users to be able to enter in comments. I wanted to support a very limited set of formatting so things like TinyMCE, FCKEditor, Markdown and the like are out. I have some basic requirements:</p>
<ol>
<li>Convert *italics* to &lt;em&gt;italics&lt;/em&gt; and likewise for _bold_ to &lt;strong&gt;bold&lt;/strong&gt;, but not in code blocks. </li>
<li>For code blocks, convert lines starting with at least four spaces to be wrapped in &lt;pre&gt;&lt;/pre&gt; tags. Consecutive lines should be grouped together. </li>
<li>Encode any HTML. </li>
<li>Convert line breaks to HTML line breaks outside of code blocks. </li>
</ol>
<p>I’ve created a <a target="_blank" href="http://john-sheehan.com/blog/wp-content/uploads/sampleparserinput.txt">text file you can look at</a> that has some sample input.</p>
<p>For the first requirement, I’ve created a method to take a string input, a delimiter and an HTML tag to replace the delimiters with. I’ve also created a list of special characters used in regular expressions so I can assemble a proper regex if the delimiter is one of those characters.</p>
<div style="padding-bottom: 5px; padding-left: 5px; padding-right: 5px; font-family: consolas; background: black; color: white; font-size: 9pt; font-weight: bold; padding-top: 5px">
<p style="margin: 0px"><span style="color: #cc7832">private</span> <span style="color: #cc7832">static</span> <span style="color: #ffc66d">List</span><span style="font-weight: normal">&lt;</span><span style="color: #cc7832">string</span><span style="font-weight: normal">&gt; SpecialRegexChars =</span></p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">new</span> <span style="color: #ffc66d">List</span><span style="font-weight: normal">&lt;</span><span style="color: #cc7832">string</span><span style="font-weight: normal">&gt; { </span><span style="color: #a5c25c">&quot;$&quot;</span>, <span style="color: #a5c25c">&quot;^&quot;</span>, <span style="color: #a5c25c">&quot;{&quot;</span>, <span style="color: #a5c25c">&quot;[&quot;</span>, <span style="color: #a5c25c">&quot;(&quot;</span>, <span style="color: #a5c25c">&quot;|&quot;</span>, <span style="color: #a5c25c">&quot;)&quot;</span>, <span style="color: #a5c25c">&quot;]&quot;</span>, <span style="color: #a5c25c">&quot;}&quot;</span>, <span style="color: #a5c25c">&quot;*&quot;</span>, <span style="color: #a5c25c">&quot;+&quot;</span>, <span style="color: #a5c25c">&quot;?&quot;</span>, <span style="color: #a31515; font-weight: normal">@&quot;\&quot;</span> };</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px"><span style="color: #cc7832">private</span> <span style="color: #cc7832">static</span> <span style="color: #cc7832">string</span> ReplaceWithHtml(<span style="color: #cc7832">string</span> input, <span style="color: #cc7832">string</span> delimiter, <span style="color: #cc7832">string</span> tag)</p>
<p style="margin: 0px">{</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">if</span> (SpecialRegexChars.Contains(delimiter))</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; delimiter = <span style="color: #a31515; font-weight: normal">@&quot;\&quot;</span> + delimiter;</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">string</span> regex = delimiter + <span style="color: #a5c25c">&quot;(.+)&quot;</span> + delimiter;</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">string</span> output = input;</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #ffc66d">Regex</span> r = <span style="color: #cc7832">new</span> <span style="color: #ffc66d">Regex</span>(regex);</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">foreach</span> (<span style="color: #ffc66d">Match</span> match <span style="color: #cc7832">in</span> r.Matches(input))</p>
<p style="margin: 0px">&#160;&#160;&#160; {</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">if</span> (match.Groups.Count &gt; <span style="color: #6897bb">1</span>)</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; {</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; output = output.Replace(match.Groups[<span style="color: #6897bb">0</span>].Value, <span style="color: #cc7832">string</span><span style="font-weight: normal">.Format(</span><span style="color: #a5c25c">&quot;&lt;{1}&gt;{0}&lt;/{1}&gt;&quot;</span>, match.Groups[<span style="color: #6897bb">1</span>].Value, tag));</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p>
<p style="margin: 0px">&#160;&#160;&#160; }</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">return</span> output;</p>
<p style="margin: 0px">}</p>
</div>
<p>Originally I ran this method using the entire sample input and it worked great, except for that if you had text surrounded by * or _ in a code block, it was replaced with a tag. So I set this aside and moved on to detecting code blocks so I could know where to avoid performing the replacement.</p>
<p>A code block for this scenario is any single or consecutive group a lines of text that start with four or more spaces. The beginning of every block needs to be prepended with a &lt;pre&gt; tag and appended with a &lt;/pre&gt; closing tag. I considered using regex to handle this, but I couldn’t (and honestly didn’t want to) get my head around it. If there is a simple, easily-readable regex (there’s an oxymoron) that accomplishes what I’m about to demonstrate, please share it in the comments.</p>
<p>Short of an elegant regex, the next option I decided to try is splitting the input into an array of lines and looping through each line and determining whether or not it is a code block along with the lines before and after it. The vast majority of input will be relatively short so while a loop isn’t perfect, it actually works pretty well in this case. It also allows me to process the tag replacement on a per-line basis depending on whether or not that line is code.</p>
<p>Not too long after starting on my Codify method I had a morass of complicated and nearly unreadable ‘if’ statements. But, it worked! I was getting exactly the desired output. But I knew I could never come back and read the code later if the requirements changed. I worked at cleaning it up as much as possible focusing on improving readability, and this is the result:</p>
<div style="padding-bottom: 5px; padding-left: 5px; padding-right: 5px; font-family: consolas; background: black; color: white; font-size: 9pt; font-weight: bold; padding-top: 5px">
<p style="margin: 0px"><span style="color: #cc7832">private</span> <span style="color: #cc7832">static</span> <span style="color: #cc7832">string</span> Codify(<span style="color: #cc7832">string</span> input)</p>
<p style="margin: 0px">{</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">bool</span> isInCodeBlock = <span style="color: #cc7832">false</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">string</span>[] lines = input.Split(<span style="color: #cc7832">new</span>[] { <span style="color: #ffc66d">Environment</span><span style="font-weight: normal">.NewLine }, </span><span style="color: #2b91af; font-weight: normal">StringSplitOptions</span><span style="font-weight: normal">.None);</span></p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">for</span> (<span style="color: #cc7832">int</span> i = <span style="color: #6897bb">0</span>; i &lt; lines.Length; i++)</p>
<p style="margin: 0px">&#160;&#160;&#160; {</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">bool</span> isFirst = i == <span style="color: #6897bb">0</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">bool</span> isLast = i == lines.Length &#8211; <span style="color: #6897bb">1</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">bool</span> isNotFirst = i &gt; <span style="color: #6897bb">0</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">bool</span> isNotLast = i &lt; lines.Length &#8211; <span style="color: #6897bb">1</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">string</span> nextLine = (isNotLast ? lines[i + <span style="color: #6897bb">1</span>] : <span style="color: #a5c25c">&quot;&quot;</span>).TrimEnd();</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">string</span> prevLine = (isNotFirst ? lines[i - <span style="color: #6897bb">1</span>] : <span style="color: #a5c25c">&quot;&quot;</span>).TrimEnd();</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">bool</span> nextLineIsCode = isLast ? <span style="color: #cc7832">false</span> : nextLine.StartsWith(<span style="color: #a5c25c">&quot;&#160;&#160;&#160; &quot;</span>);</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">bool</span> prevLineIsCode = prevLine.StartsWith(<span style="color: #a5c25c">&quot;&#160;&#160;&#160; &quot;</span>);</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">string</span> prefix = <span style="color: #a5c25c">&quot;&quot;</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">string</span> suffix = <span style="color: #a5c25c">&quot;&quot;</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">string</span> contents = <span style="color: #ffc66d">HttpUtility</span><span style="font-weight: normal">.HtmlEncode(lines[i].TrimEnd());</span></p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">bool</span> thisLineIsCode = contents.StartsWith(<span style="color: #a5c25c">&quot;&#160;&#160;&#160; &quot;</span>);</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">if</span> (((isFirst) || (isNotFirst &amp;&amp; !isInCodeBlock)) &amp;&amp; thisLineIsCode)</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; {</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; prefix = <span style="color: #a5c25c">&quot;&lt;pre&gt;&quot;</span> + <span style="color: #ffc66d">Environment</span><span style="font-weight: normal">.NewLine;</span></p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; isInCodeBlock = <span style="color: #cc7832">true</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">if</span> (!isInCodeBlock)</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; {</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; contents = ReplaceWithHtml(contents, <span style="color: #a5c25c">&quot;*&quot;</span>, <span style="color: #a5c25c">&quot;em&quot;</span>);</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; contents = ReplaceWithHtml(contents, <span style="color: #a5c25c">&quot;_&quot;</span>, <span style="color: #a5c25c">&quot;strong&quot;</span>);</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">if</span> (isInCodeBlock &amp;&amp; !nextLineIsCode)</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; {</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; suffix = <span style="color: #ffc66d">Environment</span><span style="font-weight: normal">.NewLine + </span><span style="color: #a5c25c">&quot;&lt;/pre&gt;&quot;</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; isInCodeBlock = <span style="color: #cc7832">false</span>;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; }</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; <span style="color: #cc7832">if</span> (!thisLineIsCode &amp;&amp; !nextLineIsCode &amp;&amp; !prevLineIsCode)</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160;&#160; suffix = <span style="color: #a5c25c">&quot;&lt;br /&gt;&quot;</span>;</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160;&#160;&#160;&#160;&#160; lines[i] = <span style="color: #cc7832">string</span><span style="font-weight: normal">.Concat(prefix, contents, suffix);</span></p>
<p style="margin: 0px">&#160;&#160;&#160; }</p>
<p style="margin: 0px">&#160;</p>
<p style="margin: 0px">&#160;&#160;&#160; <span style="color: #cc7832">return</span> <span style="color: #cc7832">string</span><span style="font-weight: normal">.Join(</span><span style="color: #ffc66d">Environment</span><span style="font-weight: normal">.NewLine, lines);</span></p>
<p style="margin: 0px">}</p>
</div>
<p>While this isn’t the prettiest 45 lines of code I&#8217;ve ever written, it <em>does</em> work.</p>
<p>If you see something, anything, wrong with this code, tell me about it in the comments! Don’t hold back, I want to know everything you think is wrong with this code. I’m trying to work on not being attached to code I write so this will be good practice.</p>
]]></content:encoded>
			<wfw:commentRss>http://john-sheehan.com/blog/code-review-a-simple-markup-processor/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>
