CsQuery to the Rescue

Each time I need to extract or modify a piece of html, I'm amazed by the speed and simplicity which which CsQuery allows me to do it.

It's a breath of fresh air over the other two options I've used in the past, namely regular expressions (which start getting ever and ever more complex and keep reminding me of this age old joke) and later the XPath based HTML Agility Pack which, while being a lot more powerful and easy to understand, has never really felt natural or easy to use.

CsQuery is essentially a .NET port of jQuery which brings the power of CSS selectors and the DOM into the .NET world. It is amazingly handy for among other things, validating the output of automated browser integration testing, scraping information from websites and what I'm going to show you below: extending markdown to add rel="nofollow" and target="_blank" to external links (inspired by this post).

I'm in the process of creating a blog for CompareVino and am using markdown as the language of choice for writing posts. Specifically, I'm using MarkdownDeep to transform markdown into html. The Markdown markup language doesn't have any provision for opening links in new tabs besides reverting to the html way of doing things (e.g. writing out the full html anchor tag).

To solve this, I've wrapped the MarkdownDeep library with my own class to insert target="_blank" and rel="nofollow" into each external link found in the post after we have transformed it into html.

CsQuery allows us to easily select all of the anchor links in the resultant html, iterate over each them to check if it's an external link and then add the two additional attributes if true. We can then return the DOM back to the calling method and hey presto, we're done!

For your reading pleasure, the class and xUnit test class included below:

using CsQuery;
using MarkdownDeep;

namespace Wine.Core.Business
{
    public class MarkdownTransformer
    {
        private readonly Markdown markdownTransformer;

        public MarkdownTransformer()
        {
            markdownTransformer = new Markdown();
        }

        public string Transform(string markdown)
        {
            var initialHtml = markdownTransformer.Transform(markdown);

            var dom = CQ.Create(initialHtml);
            var links = dom.Select("a");

            foreach (var link in links)
            {
                if (!link.GetAttribute("href").StartsWith("/") && !link.GetAttribute("href").StartsWith("#"))
                {
                    link.SetAttribute("rel", "nofollow");
                    link.SetAttribute("target", "_blank");
                }
            }

            return dom.Render();
        }
    }
}

and

using CsQuery;
using Wine.Core.Business;
using Xunit;

namespace Wine.Core.Test.Business
{
    public class MarkdownTransformerTests
    {
        private readonly MarkdownTransformer transformer;
        
        public MarkdownTransformerTests()
        {
            transformer = new MarkdownTransformer();
        }

        [Fact]
        public void NoFollowTargetBlankExternalLinks()
        {
            var markdown = "[This is a link](http://www.google.com)";

            var csquery = CQ.Create(transformer.Transform(markdown));

            Assert.Equal("<a href=\"http://www.google.com\" rel=\"nofollow\" target=\"_blank\">...</a>", csquery["a"].ToString());
        }

        [Fact]
        public void DontModifyInternalLinks()
        {
            var markdown = "[This is a link](/some-internal-link)";

            var csquery = CQ.Create(transformer.Transform(markdown));

            Assert.Equal("<a href=\"/some-internal-link\">...</a>", csquery["a"].ToString());
        }

        [Fact]
        public void DontModifyAnchorLinks()
        {
            var markdown = "[See this section](#this-section)";

            var csquery = CQ.Create(transformer.Transform(markdown));

            Assert.Equal("<a href=\"#this-section\">...</a>", csquery["a"].ToString());
        }
    }
}
Show Comments