425-882-1032 info@3sharp.com

NOTE: This post is actually a repeat of one that was authored last June. Unfortunately, that post resides on another blog server that is no longer public. So, in response to a recent question about transforming InfoPath rich text into Word, I have decided to re-post. Amazingly, the content is still valid, even though Microsoft has since released its XSLT Inference Tool.

Many of our business solutions require an XSLT process for transforming InfoPath form data into Microsoft Word 2003 documents. Overall, development of this type of transformation is pretty straightforward. First, the Word document template is built and then saved as an XML file, giving developers access to the underlying WordprocessingML (WordML) content. The WordML header information—which consists of definitions for styles, fonts, lists, and custom document properties—is then incorporated into the XSL stylesheet that is used to process the InfoPath XML content.

What becomes somewhat tricky in the stylesheet though is trying to accommodate character formatting that appears in the XML input. Many of the InfoPath forms that we develop contain rich-text fields, which allow users to add XHTML content, as in the following example:

<div xmlns="http://www.w3.org/1999/xhtml">The <strong>brown</strong> fox jumped over the fence.</div>

Any inline formatting, like the strong element above, creates a challenge in processing paragraph-level elements such as div and p. In WordML, the concept of “runs” is used within paragraph elements. A run defines the formatting properties for a particular string of text. Every time the character formatting within a paragraph changes, a new run is created. So, for the XHTML sample shown above, three separate runs (represented as sibling w:r elements) would appear in the WordML:

<w:p>
    <w:pPr>
        <w:pStyle w:val="Paragraph" />
    </w:pPr>
    <w:r>
        <w:t>The </w:t>
    </w:r>
    <w:r>
        <w:rPr>
            <w:b />
        </w:rPr>
        <w:t>brown</w:t>
    </w:r>
    <w:r>
        <w:t> fox jumped over the fence.</w:t>
    </w:r>
</w:p>

In order to create the separate runs, each character-formatting template in the XSL file is updated with unmatched w:t and w:r start and end tags. These tags provide the capability to close the run for the text prior to the character formatting, create a run for the character formatting, and then open a run for the text after the character formatting. Since it is illegal in XSL files to have a start tag without a matching end tag and vice versa, the delimiter characters for these tags are escaped. To illustrate, the following demonstrates how inline bold text is processed:

<xsl:template match="xhtml:b | xhtml:strong">
    <xsl:text disable-output-escaping="yes"></w:t></w:r></xsl:text>
    <w:r>
        <w:rPr>
            <w:b/>
            <xsl:call-template name="output-character-formatting"/>
        </w:rPr>
        <w:t><xsl:apply-templates/></w:t>
    </w:r>
    <xsl:text disable-output-escaping="yes"><w:r></xsl:text>
    <w:rPr>
        <xsl:call-template name="output-character-formatting"/>
    </w:rPr>
    <xsl:text disable-output-escaping="yes"><w:t></xsl:text>
</xsl:template>

The output-character-formatting named template handled multiple character-formatting properties (e.g., bold-italic text). This template was used to identify ancestors of the current element and apply the WordML formatting properties accordingly.

<xsl:template name="output-character-formatting">
    <xsl:if test="ancestor::xhtml:i or ancestor::xhtml:em">
        <w:i/>
    </xsl:if>
    <xsl:if test="ancestor::xhtml:b or ancestor::xhtml:strong">
        <w:b/>
    </xsl:if>
    <xsl:if test="ancestor::xhtml:u">
        <w:u w:val="single"/>
    </xsl:if>
    <xsl:if test="ancestor::xhtml:strike">
        <w:strike/>
    </xsl:if>
    <xsl:if test="ancestor::xhtml:sup">
        <w:vertAlign w:val="superscript"/>
    </xsl:if>
    <xsl:if test="ancestor::xhtml:sub">
        <w:vertAlign w:val="subscript"/>
    </xsl:if>
</xsl:template>