Sunday, June 5, 2016

Zip code data in Sitecore


The source code and documentation for this solution are available on GitHub. Download


Good old postal Zip Codes. Not a very exciting subject, but it seems like every year or two, I run into a solution that requires access to a zip code database.

There are several commercial services that provide extensive zip code data, but there is at least one free database available (https://boutell.com/zipcodes/) that includes a decent amount of data, including coordinates, city and state, and time zone.

Latitude and longitude can be useful if for example you needed to put an "approximate" pin in a map.

We recently ran into a situation where we needed to know the visitor's time zone. Sitecore GeoIP data includes zip codes, but not time zones. All I need is to wire that up to a zip code database, and I'm all set. (And before you quibble over the accuracy, don't. I know it's not perfect; I think of this as a "good enough" solution).

So I set up a solution to provide Sitecore with an API for looking up zip codes. I started with a few goals:

  • I want to be able to use this in any Sitecore solution.,
  • I don't want to use SQL. It's always administrative and deployment hassle to use custom SQL tables.
  • I don't want to impose a schema on every application that uses this. Sure, that free zip database is fine for what I need now, but others may have more detailed data they'd like to use.
  • I want to use a swappable provider so other applications can change how the data is imported, where it is stored, and/or how it is queried.
  • For my default provider, I want it to "just work". I drop the file into a folder, and my Sitecore app has access to the data (well, I did end up also needing to add a mongo connection string to the connectionstrings.config file).

I decided to use MongoDB to store the data. Once I have a connection string, I can create collections and add data with any schema I want, without bugging the SQL admins. I also added to add a caching layer. I'm probably going to access this data from rules and such, and I want it to be zippy-quick.

The data flow looks like this:


The idea is, the operational data stored in MongoDB, and accessed through a caching layer at runtime. At application start (and via an administrative interface), the data file date is compared to the last time the data was imported, and if the file is newer, it is re-imported.

Installing the module

Install the update package in your Sitecore application. When Sitecore starts, it will populate a MongoDB database with data from a provided zip code document located in App_Data. This file is sourced from https://boutell.com/zipcodes/.

Whenever this file is updated, it will be reloaded the next time Sitecore starts.

Add a connection string to your ConnectionStrings.config file with the name you'd like to use for your Mongo database. For example

<add name="zipinfo" connectionString="mongodb://localhost:27017/zipinfo" />

If you want to use a separate mongo database for each Sitecore instance sharing a common Mongo server, change the connectionString e.g.
connectionString="mongodb://localhost:27017/myapp_zipinfo"

The update package will place a copy of the data file in your App_Data folder. You can relocate this to the Sitecore data folder if you desire.

Using the module

The module exposes a "manager" static class (ZipInfo.ZipInfoManager) with static methods like Get(int zipCode) to access the data. I won't get into all the methods here (see the documentation), but there are methods for both retrieve/update and cache management operations.

The update package will also install a utility at /sitecore/admin/zipinfo.aspx that'll allow you to query the database, manage the cache, and re-import the data.

The module exposes a provider class that can be swapped out with your own provider. If you have more detailed data in a a csv file, you can simply inherit from the default provider, create your own POCO class, and override the LoadRecord method that maps fields on the csv line to the POCO. If you need a different method for loading the data rather than reading it from a csv file, override the Load method. If you don;t want to use mongo, you can replace the entire provider by creating a class that implements IZipInfoProvider. More information about the provider is available in the docs.


The source code and documentation for this solution are available on GitHub. Download





Wednesday, June 1, 2016

Content Indexing vs Site Search

I've had this conversation so many times, I thought I'd capture it here once and for all.

There is a vast difference between content indexing and site search. The following discusses these differences. This is not exhaustive; there are finer nuances that I’ll skip over in order to keep focused on the key concepts.

Content Indexing

Content indexing is the act of storing selected fields of Sitecore content items into a separate index, so that content items can be retrieved rapidly by code. Examples of this are the search box Sitecore uses for item buckets, or a custom rendering that “facets” content e.g. outputs links to every item where “Georgia” is selected in a “Home state” field.

  • Indexes are created by copying raw item data into the index, typically when the item is saved or published.
  • Content indexing is a “data-oriented” operation e.g. a lookup in an index finds an match of content in a field. 
  • A content index has no concept of pages, and does not have any ability to rank on such things as link frequency. 
  • Content Indexing is absolutely required for Sitecore to function. 
  • Sitecore implements content indexing “out of the box”, using Lucene by default, with configurable support for Solr in scaled enterprise environments. 

Site Search

Site search is the act of indexing the content of entire viewable pages, so that whole pages can be found using “free text” search. An example of this is a site visitor entering a few words in a search box and getting back a page of ranked results, akin to a Google search.

  • Indexes are created by “crawling” the site e.g. code uses http requests to pull every page of the site, storing the content in its index, and examining the links on in the page to find more pages to crawl.
  • Site search is a “free text” operation, e.g. a lookup considers all of the visible content of a page.
  • A good site search tool ranks results based on things like semantics e.g. content in <h1> tags will rank higher than body text, or linking e.g. pages with more inbound links will rank higher.
  • A site search solution is only necessary if you want visitors to be able to “free text” search the site e.g. the site has a “search box”. 
  • Sitecore does not implement free-text page search “out of the box”.

Why the distinction is important

Any given page of a Sitecore site may have visible page content derived from many content items. Therefore, out-of-the-box content indexing is not an appropriate solution for site search.

Moreover, a good “free text” search experience requires that the results be well ranked. Consider when you do a Google search. Google isn’t simply returning a flat list of every page that contains your search terms, instead, it is using highly sophisticated ranking algorithms to present the results you are most likely to want first. If you’re familiar with SEO principles, you know that there are many factors that influence rank far beyond the simple content of the page.

Of course there is some overlap. A good site search tool can also include "hard data" in the form of metadata, so that search results can be "faceted". This allows the visitor to "filter" results based on date, geography, product line, or any other "field oriented" data that you include in the page metadata.

We've already deployed Solr. Why can't we use that for site search?

In theory, there is a way to leverage a Solr index to do free text search. This is not a simple matter of “configuration”, but rather, requires extensive coding. The general idea is you build a scheduled processor that programmatically loads every page of the site (via an http request) so it can get the entirety of the content on a given page. It puts that content into a “computed field” of a Solr index. Then, custom “search box” code can search that “computed field” for occurrences of that content. There are a drawbacks to this approach:

  • It is not implemented out of the box.
  • The ranking of search is either non-existent, or at least far short of the ranking quality of a true crawler.
[edited to correct my error about Coveo]

There are “off the shelf” tools that combine the concepts of content indexing and site search.

  • Coveo is an excellent commercial product that uses a proprietary indexing mechanism, with conventional "content indexing" and also crawling. It can index both entire pages and content items. It comes with value-added tools for rapid deployment of faceted search features, and also adds some ranking capabilities, including the ability to manually tweak search ranking. It comes in on-premises, cloud, and a hobbled “free” version. It is arguably the “least effort” solution to implement, since it is very "Sitecore aware" out of the box.
  • There are lots of free and commercial solutions. For example, Arke’s SDK includes a “computed search” module. uses configured field and template types to inject page content into a Solr index. 

There are other “off the shelf” solutions that provide excellent free text search experiences that do not rely on Solr. Most of these have evolved to cloud-hosted rather than on-premises solutions. Google site search and Amazon cloud search are leaders in this space, and Coveo had a cloud edition, but there are many services available. Using one of these services would still require coding, but it would be pure “integration” coding, not an attempt to build a full blown crawler.

In the absence of an “off the shelf” solution, you could build a home-grown Solr-based crawler. It’d require significant time and effort, only to yield a pretty poor user experience due to the lack of any sophisticated ranking.


Thursday, April 14, 2016

Using ARR to enable FXM


Ever used Sitecore's Federated Experience Manager (FXM)? Effectively, it lets you use Sitecore to content manage, track, personalize and test external sites which are not hosted in Sitecore.

The motivations I often hear for using FXM are...

  • We've bought a license and plan to migrate to Sitecore later, but we want to start personalizing and gathering analytics on our site now.
  • We're moving our main site to Sitecore, but we have some related sites we just don't have time and budget to move now.
  • We want to do a demo or POC using content from a non-Sitecore site, but don’t want to re-create the content in Sitecore.

With FXM you can do that. All you need is to place a small bit of script on the external sites. Sadly, that's often not possible. Sometimes the old site is literally on a server that nobody knows how to access. Sometimes you're just doing a POC and nobody wants to edit the old site for that.

IIS's Application Request Routing (ARR) to the rescue.


IIS has features called ARR and the URL Rewrite module that amount to a reverse proxy that allows you have a “man in the middle” that can manipulate the HTML before it is returned to the browser.

We set up a IIS instance with ARR with a public-facing URL (in this example, “demo.mysite.com”), and configure ARR to do the following

  1. Take the path from the inbound request, and form a URL using the Sitecore server’s host name.
  2. Fetch the HTML from the Sitecore host.
  3. Inject the FXM beacon script into the HTML
  4. Change the URLS within the HTML for such things as images, scripts, CSS, iframes, etc, so that they will be requested from the ARR and not the Sitecore site.
  5. Strip out the “X-Frame-Settings” header (if it exists), which can interfere with FXM Experience Editor.

This results in a topology like this:



The URL structure in this example would give us a demo/POC website (“demo.mysite.com”) where we can show how a site can be tracked and manipulated with FXM. This could in theory be used for a live site by changing DNS to point www.mysite.com to the ARR, and change the hostname of the Sitecore server to something like Sitecore.mysite.com.

Setting up the Reverse Proxy

From an infrastructure perspective, setting up the proxy server is pretty simple. Install the ARR and URL Rewrite extensions, and create a new site in IIS. Set the binding up so it answers requests from the desired host (in the example above, “demo.mysite.com”). The site folder doesn’t need much; a default.htm page, and an empty web config.

The magic all happens in web.config. The URL Rewrite module is governed by rules. There are two sets, one just called “rules” which are used to route requests to the Sitecore server, and another called “outbound rules” which are used to manipulate the responses from Sitecore before they are returned to the browser. Outbound rules also allow you to define “preconditions” that allow you to restrict when an outbound rule will apply.

The IIS management console provides an interface for building up all the XML in the config file for all of this. I find that when I’m working with it, I flip between IIS and Notepad++ until I get everything just right.

The referenced articles provide good guidance for how to use the URL Rewrite module and set up rules. This example web.config could be used to implement our example.


 <?xml version="1.0" encoding="utf-8"?>  
 <configuration>  
  <system.web>  
  </system.web>  
  <system.webServer>  
   <rewrite>  
    <rules>  
     <!--  
     This rule routes requests everything to the external site.  
     The use of "HTTP_ACCEPT_ENCODING" ensures that external servers   
     will send responses in the clear (not zipped or otherwise encoded)  
     -->  
     <rule name="Route to external site" stopProcessing="true">  
      <match url="(.*)" />  
      <action type="Rewrite" url="http://www.mysite.com/{R:1}" />  
      <serverVariables>  
       <set name="HTTP_ACCEPT_ENCODING" value="" />  
      </serverVariables>  
     </rule>  
    </rules>  
    <outboundRules>  
     <!--  
     This rule converts proxied pages' urls to relative urls (so they'll be requested through the ARR server and avoid cross-domain issues)  
     -->  
     <rule name="Rewrite External Absolute Paths" preCondition="Request is for html">  
      <match filterByTags="A, Area, Base, Form, Frame, Head, IFrame, Img, Input, Link, Script" pattern="^http(s)?://www.mysite.com/(.*)" />  
      <action type="Rewrite" value="/{R:2}" />  
     </rule>  
     <!--  
     This rule removes the X_Frame_Options header, which can prevent the Experience editor from working.  
     -->  
     <rule name="Strip x-frame-options" preCondition="Request is for html" patternSyntax="ECMAScript">  
      <match serverVariable="RESPONSE_X_Frame_Options" pattern="(.+)" />  
      <action type="Rewrite" value="" />  
     </rule>  
     <!--  
     This rule removes adds "(via proxy)" to the Server header, to aid troubleshooting.  
     -->  
     <rule name="Change Server Header">  
      <match serverVariable="RESPONSE_Server" pattern="(.+)" />  
      <action type="Rewrite" value="{R:0} (via proxy)" />  
     </rule>  
     <!--  
     This rule injects the FXM script into the HTML from the external site.  
     -->  
     <rule name="Add FXM script to tb" preCondition="Request is for html" patternSyntax="ExactMatch">  
      <match filterByTags="None" pattern="&lt;/head>" />  
      <action type="Rewrite" value="&lt;script src=&quot;//sitecore.mysite.com/bundle/beacon&quot;>&lt;/script>&quot;/head>" />  
     </rule>  
     <preConditions>  
      <!--  
      This precondition allows the outbound rules to only act on html responses.  
      -->  
      <preCondition name="Request is for html">  
       <add input="{RESPONSE_CONTENT_TYPE}" pattern="text/html" />  
      </preCondition>  
     </preConditions>  
    </outboundRules>  
   </rewrite>  
  </system.webServer>  
 </configuration>  

Tuesday, February 2, 2016

Dear John...

We all go West
No I’m not writing to say I’ve left you for another CMS. But John West’s announcement today leaves me wanting to take a short detour down Memory Lane. If you came here looking for technical tidbits, I’ll be hangin’ a right back down Architecture Avenue shortly.

I had the great good fortune to work directly with John on my very first Sitecore project. It was one of the first projects to be done at scale in North America, and it was my first foray into a true enterprise-level .net CMS. When Lars Nielsen flew out to conduct our first training, John was there, both to learn and to advise. He remained tightly connected throughout the project, providing strategic advice and technical leadership (and answers to my incessant questions). John’s enthusiasm for Sitecore was infectious. His spirit of adventure set the tone for that project, and indeed for my entire Sitecore career.

John’s thought leadership has been at the bedrock of Sitecore’s growth. His quiet, unassuming tone underlies a deep passion for Sitecore. Owing to John’s example, today’s Sitecore ecosystem is infused with a sense of excitement, wonder, and a craving to learn, create and explore. His blog is a hallmark of his motivational style. John provides the signposts leading to the new and evolving capabilities of the product, while never asserting his knowledge is definitive, never assuming his observations are comprehensive, and never insisting his conclusions are absolute. Being the good teacher he is, he leaves application as an exercise for the student. And exercise we do! Many talented Sitecore professionals share valuable learnings from their Sitecore journeys. But those journeys began with John’s unspoken challenge to “Go West, young man!” (Yes, I went there.)

Over the years, as John as gone from teacher to mentor to friend, I’ve felt immense pride to be part of this dynamic community that John was so instrumental in creating. Though we have gone from speaking almost every day to interacting only sporadically, every time we see each other it seems we are picking up in mid-sentence. There has never been a “goodbye” with him, and there is not one now. Talk to you soon, friend!

(And goodbye forever, XSLT!)

Tuesday, August 4, 2015

Organizing the Language Menu's Tower of Babel

If you have a site with lots of languages, you've no doubt run into some challenges. A small -- but annoying -- one, is the ordering of the language menus. This is particularly problematic when the site has a large number of configured languages, but any given conten item may only have versions in one or two. It can be a chore slogging through the menu looking for ones that have versions.

This little tidbit allows you to re-order the language menu so that languages that actually have versions will float up to the top of that Tower of Babel.

Finding the pressure point

When I'm making changes to Sitecore's "internals" I try to be as minimally invasive as possible. Unfortunately, this solution is more like a tourniquet than a pressure point. I want to replace the code-beside for one of Sitecore's XML controls (the control that Sitecore uses for drop-down language menus in the content editor). The only way I know to do that is to replace the XML file and change the <CodeBeside> element. Anyone know a good trick to change the code-beside without replacing this file?

Worse, the decompiled code-beside doesn't seem to lend itself to a surgical strike that just changes one little part. Often, you'll find that Sitecore anticipates your need by doing things like having an overridable class fetch an object that does most of the work. You can just inherit from their class and change that one method to suit your needs.Not the case here. We're going to have to implement the whole class just to change one line of code. Such is life.

The control for this menu is located at
/Sitecore/Shell/Applications/Content Manager/Galleries/Languages/Gallery Languages.xml
What we really want to do is make a small change to the associated code-beside. First, we need to change this XML file to use our code-beside instead of Sitecore's:

    <!--<CodeBeside Type="Sitecore.Shell.Applications.ContentManager.Galleries.Languages.GalleryLanguagesForm,Sitecore.Client"/>-->  
    <CodeBeside Type="MySolution.Shell.Applications.ContentManager.Galleries.GalleryLanguagesForm,sb1"/> 

For the code-beside file, we need to copy the entire decompiled Sitecore class, although we're really just changing one line of code. So use your favorite decompiler to snag the Sitecore class (or copy the code from the end of this article), and clean up the references.

I'm going to change that one pesky line to call a new method, just to isolate the change and make it more tweakable in the future.

To fetch the list of languages to put in the menu, Sitecore just does a  GetLanguages(currentItem).


That works, but I want to take the languages for which there are actually versions, and float them to the top.


So I change their code to call my method instead of GetLanguages:

 foreach (Language language in GetLanguages(currentItem))  //currentItem.Languages)  

... and then I add the GetLanguages() method that lets me be a it more finessed about ordering:

 protected IEnumerable<Language> GetLanguages(Item {  
  return currentItem.Languages.Where(l => ItemManager.GetVersions(currentItem, l).Count > 0)  
   .Union(currentItem.Languages.Where(l => ItemManager.GetVersions(currentItem, l).Count == 0));  
 }  

And now the languages that are actually used for this item will float to the top. This might be a handy place to do other manipulation you might need for your solution, depending on the business rules for language management.

Here's the full code for the XML control and the code-beside:

 using System;  
 using System.Collections.Generic;  
 using System.Globalization;  
 using System.Linq;  
 using Sitecore;  
 using Sitecore.Configuration;  
 using Sitecore.Data;  
 using Sitecore.Data.Items;  
 using Sitecore.Data.Managers;  
 using Sitecore.Diagnostics;  
 using Sitecore.Globalization;  
 using Sitecore.Shell;  
 using Sitecore.Web;  
 using Sitecore.Web.UI.HtmlControls;  
 using Sitecore.Web.UI.Sheer;  
 using Sitecore.Web.UI.XmlControls;  
 using Control = System.Web.UI.Control;  
 namespace MySolution.Shell.Applications.ContentManager.Galleries  
 {  
   public class GalleryLanguagesForm : Sitecore.Shell.Applications.ContentManager.Galleries.GalleryForm  
   {  
     protected GalleryMenu Options;  
     protected Scrollbox Languages;  
     public GalleryLanguagesForm() : base()  
     {  
     }  
     public override void HandleMessage(Message message)  
     {  
       Assert.ArgumentNotNull((object)message, "message");  
       if (message.Name == "event:click")  
         return;  
       this.Invoke(message, true);  
     }  
     protected override void OnLoad(EventArgs e)  
     {  
       Assert.ArgumentNotNull((object)e, "e");  
       base.OnLoad(e);  
       if (Context.ClientPage.IsEvent)  
         return;  
       Item currentItem = GetCurrentItem();  
       if (currentItem == null)  
         return;  
       using (new ThreadCultureSwitcher(Context.Language.CultureInfo))  
       {  
         foreach (Language language in GetLanguages(currentItem))  //currentItem.Languages)  
         {  
           ID languageItemId = LanguageManager.GetLanguageItemId(language, currentItem.Database);  
           if (!ItemUtil.IsNull(languageItemId))  
           {  
             Item obj = currentItem.Database.GetItem(languageItemId);  
             if (obj == null || !obj.Access.CanRead() || obj.Appearance.Hidden && !UserOptions.View.ShowHiddenItems)  
               continue;  
           }  
           XmlControl xmlControl = ControlFactory.GetControl("Gallery.Languages.Option") as XmlControl;  
           Assert.IsNotNull((object)xmlControl, typeof(XmlControl));  
           Context.ClientPage.AddControl((Control)this.Languages, (Control)xmlControl);  
           Item obj1 = currentItem.Database.GetItem(currentItem.ID, language);  
           if (obj1 != null)  
           {  
             int length = obj1.Versions.GetVersionNumbers(false).Length;  
             string str1;  
             if (length != 1)  
               str1 = Translate.Text("{0} versions.", (object)length.ToString());  
             else  
               str1 = Translate.Text("1 version.");  
             string str2 = str1;  
             CultureInfo cultureInfo = language.CultureInfo;  
             xmlControl["Header"] = (object)(cultureInfo.DisplayName + " : " + cultureInfo.NativeName);  
             xmlControl["Description"] = (object)str2;  
             xmlControl["Click"] = (object)string.Format("item:load(id={0},language={1},version=0)", (object)currentItem.ID, (object)language);  
             xmlControl["ClassName"] = !language.Name.Equals(WebUtil.GetQueryString("la"), StringComparison.OrdinalIgnoreCase) ? (object)"scMenuPanelItem" : (object)"scMenuPanelItemSelected";  
           }  
         }  
       }  
       Item obj2 = Sitecore.Client.CoreDatabase.GetItem("/sitecore/content/Applications/Content Editor/Menues/Languages");  
       if (obj2 == null)  
         return;  
       this.Options.AddFromDataSource(obj2, string.Empty);  
     }  
     protected IEnumerable<Language> GetLanguages(Item currentItem)  
     {  
       return currentItem.Languages.Where(l => ItemManager.GetVersions(currentItem, l).Count > 0)  
         .Union(currentItem.Languages.Where(l => ItemManager.GetVersions(currentItem, l).Count == 0));  
     }  
     private static Item GetCurrentItem()  
     {  
       string queryString1 = WebUtil.GetQueryString("db");  
       string queryString2 = WebUtil.GetQueryString("id");  
       Language language = Language.Parse(WebUtil.GetQueryString("la"));  
       Sitecore.Data.Version version = Sitecore.Data.Version.Parse(WebUtil.GetQueryString("vs"));  
       Database database = Factory.GetDatabase(queryString1);  
       Assert.IsNotNull((object)database, queryString1);  
       return database.GetItem(queryString2, language, version);  
     }  
   }  
 }  


 <?xml version="1.0" encoding="utf-8" ?>  
 <control xmlns:def="Definition" xmlns="http://schemas.sitecore.net/Visual-Studio-Intellisense" xmlns:shell="http://www.sitecore.net/shell">  
  <Gallery.Languages>  
   <Gallery>  
    <!--<CodeBeside Type="Sitecore.Shell.Applications.ContentManager.Galleries.Languages.GalleryLanguagesForm,Sitecore.Client"/>-->  
    <CodeBeside Type="sb1.Shell.Applications.ContentManager.Galleries.GalleryLanguagesForm,sb1"/>  
    <Script>  
     window.onload = function() {  
     var activeLanguage = document.querySelector('.scMenuPanelItemSelected');  
     activeLanguage.scrollIntoView(false);  
     }  
    </Script>  
    <Stylesheet Key="GalleryLanguages">  
     .scMenuPanelItem, .scMenuPanelItem_Hover, .scMenuPanelItemSelected_Hover, .scMenuPanelItemSelected {  
     padding-left: 0;  
     padding-right: 0;  
     padding-top: 8px;  
     padding-bottom: 8px;  
     }  
     .scGalleryGrip {  
     position: absolute;  
     bottom: 1px;  
     left: 1px;  
     right: 1px;  
     height: 10px;  
     }  
     .scLanguagesGalleryMenu {  
     overflow: hidden;  
     vertical-align: top;  
     border-bottom: 12px solid transparent;  
     -moz-box-sizing: border-box;  
     box-sizing: border-box;  
     width: 100%;  
     height: 100%;  
     border-collapse: separate;  
     }  
     div#Languages img {  
     display: none;  
     }  
    </Stylesheet>  
    <Border Width="100%" Height="100%">  
     <GalleryMenu ID="Options" Class="scLanguagesGalleryMenu">  
      <MenuPanel Height="100%">  
       <Scrollbox ID="Languages" Class="scScrollbox scFixSize scFixWidthInsideGallery" style="padding-top:0 !important;" Height="100%" Width="100%" />  
      </MenuPanel>  
     </GalleryMenu>  
     <Gallery.Grip />  
    </Border>  
   </Gallery>  
  </Gallery.Languages>  
 </control>  


Wednesday, May 28, 2014

Size Doesn’t Matter: Suppressing Size Attributes in Image Tags



On a recent, massively-responsive project, our front-end developer asked us to (well, he actually threatened to hold his breath until he turned blue unless we would) remove the height and width attributes from the image tags in our Sitecore site. He’s one of the best front-end guys I've ever worked with, so instead of just dismissing this as “front-end guys will be front-end guys”, I decided to see if we could indulge him.

It makes sense, actually. Client-side code can deal manipulating height and width easily enough when rendering on different devices. But it you want the thrill of waving around the resize handle and watching all that responsiveness a’responding, then it’d be better if the height and width attributes weren't there in the first place.

By and large, there are two ways an image tag finds its way into a page generated by Sitecore. It may come from an image field, or from an embedded image in an HTML field. So we should be able to tackle this in the renderField pipeline, assuming we've been good boys and girls and used field renderers everywhere (and if you haven’t, then go to your room and think about what you've done).

The two cases (image fields and html fields) pose different challenges, so let’s look at them separately. Or if you don’t want to dig in, then you can stop reading now, download the module package or source code (zip fileGitHub, or Sitecore Marketplace) and have at it.

Image Fields

We've got images in image fields, and we’re using field renderers to generate image tags at runtime. Sitecore uses Sitecore.Pipelines.RenderField.GetImageFieldValue to do this. When we take a look under the hood, we see it in turn is using a Sitecore.Xml.Xsl.ImageRenderer. Luckily, GetImageFieldValue uses a virtual method CreateRenderer, so we can shim in our own class that inherits from GetImageFieldValue and override CreateRenderer to substitute our own handy-dandy ImageRenderer class.

Now, Sitecore's ImageRenderer class is pretty bulky, and it has more stuff that deals with dimensions than a cartographer’s workshop. I’d like to just inherit from theirs and find a good pressure point to slap down those size attributes. Taking a good look at the Render method in Sitecore’s ImageRenderer, it looks like someone anticipated our need. The last thing it does to determine the size is to call a virtual method called AdjustImageSize. All we need to do is override that and set the height and width properties to zero. The existing Sitecore code is already set up to suppress the height and width attributes if these properties are zero.

So we need two pretty lightweight classes and an override in a config file. First, the config file. We need to tell Sitecore to replace its GetImageFieldValue processor with ours.

 <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">  
  <sitecore>  
   <pipelines>  
    <renderField>  
     <processor   
      type="DimensionlessImages.Pipelines.RenderField.GetImageFieldValue,   
         DimensionlessImages"  
      patch:instead="*[@type='Sitecore.Pipelines.RenderField.GetImageFieldValue,  
         Sitecore.Kernel']"  
      />  

Then, there’s our own GetImageFieldValue processor, which inherits from Sitecore’s and overrides the CreateReplacer method.

 namespace DimensionlessImages.Pipelines.RenderField  
 {  
  using Sitecore.Xml.Xsl;  
  public class GetImageFieldValue : Sitecore.Pipelines.RenderField.GetImageFieldValue  
  {  
   protected override ImageRenderer CreateRenderer()  
   {  
    return new DimensionlessImages.ImageRenderer();  
   }   
  }  
 }  

And lastly, there’s our own ImageRenderer, which overrides the AdjustImageSize method.

 namespace DimensionlessImages  
 {  
  using Sitecore.Data.Fields;  
  public class ImageRenderer : Sitecore.Xml.Xsl.ImageRenderer  
  {  
   protected override void AdjustImageSize(ImageField imageField, float imageScale, int imageMaxWidth, int imageMaxHeight, ref int w, ref int h)  
   {  
    w = 0;  
    h = 0;  
   }  
  }  
 }  

Piece of cake, that.

HTML Fields

HTML fields are a different animal. We've got a hunk of existing html, not just some data we’ll use to form HTML at runtime.

When a media library image is inserted in the rich text editor, Sitecore adds the height and width attributes to the img tag. So between that, and the possibility that content people might edit HTML manually (bless their little programmer-wannabe hearts), we’re going to have lots of height and width attributes in our HTML fields.

We can use the HtmlAgilityPack to strip these attributes off the img tags in in the renderField pipeline. Although the HtmlAgilityPack is wicked fast, we could start to see a performance hit of there are lots of HTML fields on complex pages. It’d better to do it in a save handler or a publishItem processor, but that could present problems with the page editor or if there is already a lot of existing content. I’m seeing times of less than 0.1ms per HTML field to strip the tags at runtime, so to make this code more bullet-proof (well, ok, to let me be lazy) I’m going to do it in renderField. If your solution permits, by all means move this processing to a save handler.

Like before, we’ll start with the config changes…

 <configuration xmlns:patch="http://www.sitecore.net/xmlconfig/">  
  <sitecore>  
   <renderField>  
    <processor type="DimensionlessImages.Pipelines.RenderField.GetFieldValue, DimensionlessImages"  
      patch:instead="*[@type='Sitecore.Pipelines.RenderField.GetFieldValue, Sitecore.Kernel']"  
    />  

We’ll use a custom GetFieldValue processor. It is actually simpler than the image field case, because all we have to do is catch “rich text” fields and strip the height and width attributes.

 namespace DimensionlessImages.Pipelines.RenderField  
 {  
  using Sitecore.Pipelines.RenderField;  
  public class GetFieldValue : Sitecore.Pipelines.RenderField.GetFieldValue  
  {  
   public new void Process(RenderFieldArgs args)  
   {  
    base.Process(args);  
    if (args.FieldTypeKey == "rich text")  
    {  
     Sitecore.Diagnostics.Profiler.StartOperation("Stripping image tags from field: " + args.FieldName);  
     args.Result.FirstPart = HtmlUtil.StripDimensions(args.Result.FirstPart);  
     Sitecore.Diagnostics.Profiler.EndOperation();  
    }  
   }  
  }  
 }  

Finally, we need a helper method uses the HtmlAgilityPack to strip the dimension attributes…

 namespace DimensionlessImages  
 {  
  using System;  
  using HtmlAgilityPack;  
  public class HtmlUtil  
  {  
   public static string StripDimensions(string text)  
   {  
    if (string.IsNullOrWhiteSpace(text))  
    {  
     return text;  
    }  
    string outText = text;  
    try  
    {  
     var doc = new HtmlDocument();  
     doc.LoadHtml(outText);  
     StripAttribute(doc, "width");  
     StripAttribute(doc, "height");  
     outText = doc.DocumentNode.WriteContentTo();  
    }  
    catch (Exception)  
    {}  
    return outText;  
   }  
   private static void StripAttribute(HtmlDocument doc, string attribute)  
   {  
    // For reasons surpassing all understanding, HtmeAgilityPack returns null instead of an empty collection  
    // when the query finds no results.  
    HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes(String.Format("//img[@{0}]", attribute));  
    if (nodes == null || nodes.Count.Equals(0))  
    {  
     return;  
    }  
    foreach (HtmlNode node in nodes)  
    {  
     node.Attributes[attribute].Remove();  
    }  
   }  
  }  
 }  

- - -

This code addresses the most likely cases that emit image tags to the browser. A given solution may have other cases, like stuff being generated by custom code or copied from static files. For cases like that, we've provided the static method that custom code can use to “play ball” with the rest of this code.

Download: Module (package only) or source code (zip fileGitHub, or Sitecore Marketplace).






Friday, November 29, 2013

Sitecore Profiling and Tracing

It's been a while since I've posted. I have a couple of bigger subjects on the spike, but for now here's a quick word about Sitecore profiling and tracing.

Have you ever had to debug a problem with a Sitecore page and you wind up slogging through the log, trying to find the related error? Or have you ever wondered which of your renderings or sublayouts might be causing performance issues? Or maybe you have custom logic in a pipeline processor, and you wish there was an easy way to emit debugging or performance information?

There are a couple of underutilized features in Sitecore that can be a real time saver in situations like this: Sitecore.Diagnostics.Tracer and Sitecore.Diagnostics.Profiler. These classes are sitting right next to the much-used Sitecore.Diagnostics.Log class, but are often overlooked. Both of these classes allow you to emit information directly to the page when in debug mode. This is a far more convenient and useful way to do debugging than writing to the Sitecore log.

The Profiler class is useful for tracking the progress of the page rendering cycle, and for gathering performance data about the components and code being executed. It has the usual Info, Warning and Error static methods you are probably familiar with from the Log class, but the real gems here are the StartOperation and EndOperation methods. With them, you can gather valuable performance data about parts of the rendering process, including both large chunks of a process and nested inner pieces of the process. For example, the Arke MetaTag Manager  module emits quite a bit of information about it's internal workings to the debugger.


The information provided not only helps us understand the performance impact of this process, but also helps us troubleshoot when we suspect that a tag is not being gathered properly.

To output this information, it's a simple matter of adding a two lines of code. For example, this is the Process method used for the CustomTag pipeline processor:


public void Process(InjectMetaTagsPipelineArgs args)
{

  Sitecore.Diagnostics.Profiler.StartOperation(
    "Adding custom tag '" + GetSignatureName() + "'"
    );

  try
  {
    Assert.ArgumentNotNullOrEmpty(
      TypeSignature, 
      "Class not supplied"
      );
    args.MetaTags.Add(ReflectionUtil.GetCustomTag(TypeSignature));
  }
  catch (Exception ex)
  {
    Sitecore.Diagnostics.Tracer.Error(
      "CustomTag failed.", 
      ex
      );
    Sitecore.Diagnostics.Log.Error(
      "CustomTag failed.", 
      ex, 
      "CustomTag"
      );
  }
  finally
  {
    Sitecore.Diagnostics.Profiler.EndOperation();
  }
}


Sitecore will nest sets of profiling blocks, which makes it easy to trace through a process to see where a problem or performance hit may be.

The other handy class for "in-page" debugging is the Tracer class. Again , this class has the familiar InfoWarning and Error static methods, but the output from these methods is written to the "Trace" section of the page debugger. I use these methods for writing out error or info strings much the way I would to the Sitecore Log, but they're written to the page itself, where troubleshooting is much easier. It's always nice to just be able to turn on the debugger to see this info instead of having to search through the log file.

Again, looking at the sample code above, note that the "catch" statements include a line of code to write the error to the tracer, which would yield this:

That beats searching for the error on the log file ... or worse, throwing a "yellow screen".

Happy tracing!