OOXML Hacking: Document Repair

You have a crucial thesis or presentation that’s due in the morning, but when you try to open it, you get a message saying the file has an error. It may seem like the end of the road, but with a little XML hacking, you can repair your file in just a few minutes and be back to work. Document repair is something you can do yourself.

First, let’s look at different causes of file corruption. The number one cause is working on files while they are on temporary or removable media. A USB or flash drive is a convenient way to carry data. The common alternative is to keep your information in the Cloud. But both of these are hazardous if you’re editing files. Accidentally ejecting a USB stick or losing your Internet connection while a file is open in Office is a near-guarantee of corruption. This type of corruption is also disastrous, because the file contents are so thoroughly scrambled, there is no way to recover the data.

But there are also files that get scrambled by software and usually these are recoverable. We’ll use the same techniques covered in previous posts. Windows users should review XML Hacking: An Introduction, while OS X hackers need to follow these instructions: XML Hacking: Editing in OS X


Is the File Recoverable?

When opened in Office, unrecoverable files may give you errors like these:

Recover Text
Parts Missing

The first step is to rename a copy of the file with a .zip ending and expand it. An unrecoverable file (one scrambled by a USB or Cloud drive) will almost always raise an Zip error. Cut your losses, you’re not going to be able to fix this. As a second-best alternative, try opening the original damaged file in NotePad (on Windows) or Text Edit (OS X) to recover whatever text you can. You also might be able to extract some contents by opening in a different word processor, like Pages on a Mac.

By contrast, if you see the following messages, document repair is possible:

Illegal Character
XML Corruption Warning

You can see that the first 2 messages are generic, while the second 2 give a specific location for the error. This means the file is at least partially readable by the program.


Error-finding

There are quite a few document repair articles on the web that are worth reading for the variety of tools that people are using. I prefer a combination of a good text editor (NotePad++ on Windows, BBEdit on OS X), plus a modern browser like FireFox or Chrome. The text editor is where you do the editing, while the browser parses the XML and finds any errors.

You’ve already unzipped the document or presentation, now look for the XML portion that contains the error. Most of the time, with a Word file, document.xml will be the culprit. Open document.xml in the text editor and Prettify (NotePad++) or Tidy (BBEdit) it to make it readable. A raw document.xml file only has 2 lines, which is why the XML errors are invariably reported as being on line 2. Making the text readable also adds useful line numbers to error reports, making the errors much quicker to find. Now the file should look like this:

document.xml

Save document.xml, then open it in your choice of browser. This is how FireFox and Chrome show where the first error is:

Browsers View XML

As you can see, the report is a little more informative in FireFox. The error is a mismatched tag: a tag was opened but not closed. It expected to see the closing tag </mc:Fallback> and it tells you exactly where it thought that tag should be. The arrow points to the first character that is in error. The correct way to interpret this is that the expected end tag should be inserted immediately before the tag pointed to.


Document Repair Technique

Here’s what the error location looks like in the text editor:

Document Repair Error Location

Then here is what it looks like after inserting the closing tag (you can copy and paste directly from the browser window):

Step 1 Document Repair

Save document.xml in the text editor, then refresh the browser. The next error is shown:

Step 1 Result

Repeat the steps. Some files have only a couple of errors, others may have dozens. You’ll know when you’re done, because refreshing the browser will give you a different screen, displaying the XML instead of an error message:

Final Document Repair Result


Rebuild the File

Close the text editor and browser, then re-zip the folders and [Content_Types].xml, giving the zip file a new name and a file ending that matches the original. Open it to ensure it works. Office does not tolerate XML errors well and doesn’t give you clear error messages, so if the file doesn’t open, you missed something. In addition, Mac users have to use Terminal to zip and view files, as noted on the XML Hacking: Editing in OS X page.

Lots of people ask “How can I prevent this?”, but there isn’t a really good answer. If a file can be repaired, it’s almost always due to a program bug that writes malformed XML. In the Word file used for example, this is often when a placed graphic has no fallback information, which is supposed to help with graphic depiction in older file formats. It appears that the program omits the closing fallback tags when saving and you get the error. It’s not your fault, but Microsoft has not been able to find and eliminate this bug since the 2007 version.

OOXML Hacking: Linked Excel Charts

Time marches on and this post is obsolete. Please check out XML Hacking: Fix Broken PowerPoint Links for more current information on this topic.

OS X versions of Microsoft Office have always been the poor step-children in the Microsoft family. Always missing important features found only in the Windows alternatives. One of these obvious disparities has been in the area of linked Excel charts. In Windows, Microsoft uses their OLE technology to allow, for instance, an Excel workbook to be linked to a PowerPoint presentation.

The Excel workbook can still be edited independently. The charts can be revised based on new data, and when the presentation is opened, the updated information will be displayed. This is a powerful tool in many situations where information is changing rapidly and the presentation must stay current. This approach also leverages the inheritance of data. This allows users to have only one data source that drives updates in many different places.

Of course, OLE being a proprietary Microsoft technology, it has almost no support on other operating systems. The only way it appears in OS X is if an individual software vendor creates an instance that works with their code. Office for Mac has had its own tiny version of OLE that allows some, but not all the features found in Windows. You could only insert Office objects (forget about PDFs) and you couldn’t link, only embed.

Until now. With the release of Office 2016 for Mac, the tiniest crack of linkability has finally opened. Try these steps: Open Excel 2016 for Mac and create a chart. Select that chart and copy it. Open a presentation in PowerPoint and click on the down-pointing arrowhead beside the Paste button. Now your options include all of the following:

  • Use Destination Theme & Embed Workbook
  • Keep Source Formatting & Embed Workbook
  • Use Destination Theme & Link Data
  • Keep Source Formatting & Link Data
  • Paste as Picture

Options 1, 2 and 5 have always been available. The news is with 3 and 4, where linked data for charts becomes a new possibility. But along with this fresh opportunity comes a problem that hasn’t been addressed by Microsoft. It’s very nice to link charts, but the Microsoft default is always to hard code the link path. This means that moving the presentation and Excel source to a different computer destroys the links. The charts are no longer editable, because the link path has changed.

Remember the poor step-child analogy? Here it is again: Windows versions of PowerPoint allow you to edit the links in the program so you can fix the path problem. But no such facility exists on the Mac. To update those linked Excel charts, you need to … hack the XML!

If you’re new to XML hacking, please read my introduction to the subject. Since this topic is specific to OS X, it’s also vital to read XML Hacking: Editing in OS X as well. I assume that you have figured out the correct path to the Excel file on the computer where the presentation has been moved.

Updating linked Excel Charts with XML Editing

After unzipping the presentation, you’re going to look inside the folders for ppt/charts/rels. Office XML files are full of rels folders that contain the relationships between the components of the document. Each chart in the presentation consists of a file i.e. chart1.xml with a corresponding chart1.xml.rels inside the rels folder. The number in the chart name increments for each additional chart linked.

The contents of chart1.xml.rels looks like this:

<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
  <Relationship Id="rId1" Type="http://schemas.microsoft.com/office/2011/relationships/chartStyle" Target="style1.xml" />
  <Relationship Id="rId2" Type="http://schemas.microsoft.com/office/2011/relationships/chartColorStyle" Target="colors1.xml" />
  <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject" Target="file://localhost/Users/server/Documents/Dockets/Test/Excel/LinkTest.xlsx" TargetMode="External" />
</Relationships>

The lines of code are long, please scroll to see where I’ve bolded the path and file name, this is the section you have to modify to update the linked Excel chart.

Just as a comparison, here’s the analogous information from a PowerPoint 2010 file. In this case, there is not a chart folder containing chart.xml files. Instead, the charts are part of the slide files and are found in slide1.xml. The rels file is slide1.xml.rels and it looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
	<Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject" Target="file:///I:\Dockets\Test\Excel\ExcelLink2010.xlsx!Sheet1!%5bExcelLink2010.xlsx%5dSheet1%20Chart%201" TargetMode="External"/>
	<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/slideLayout" Target="../slideLayouts/slideLayout7.xml"/>
	<Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/vmlDrawing" Target="../drawings/vmlDrawing1.vml"/>
	<Relationship Id="rId4" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="../media/image1.emf"/>
</Relationships>

A close examination shows that much of the same information in a Mac file is also here, but the file and path is Windows-style. Using this information, you’re ready to update those linked Excel charts with the best of them!

OOXML Hacking: Editing in macOS

Note: I’ve included the original article text to describe the background issues about XML editing in macOS, but to retain your sanity, be sure to follow the May 2016 and July 2018 updates at the end and use a text editor that doesn’t require unzipping and rezipping the files

When you’re hand-editing Office files in Windows, it’s pretty straight-forward: unzip file > edit > rezip, you’re done. Editing in macOS requires a couple of extra precautions. This is because the graphical user interface adds Mac attributes to files and plants hidden files in folders. Office will not tolerate either of these:

Editing in macOS - The Open XML file cannot be opened because there are problems with the contents. Details The file is corrupt and cannot be opened.

XML error message in 2008


Editing in macOS - The Open XML file cannot be opened because there are problems with the contents or the file name might contain invalid characters (for example, \/). Details The file is corrupt and cannot be opened.

XML error message in 2011


Editing in macOS - The Open XML file cannot be opened because there are problems with the contents or the file name might contain invalid characters (for example, \/). Details The file is corrupt and cannot be opened.

XML error message in 2016

If you use macOS’s Archive Utility to unzip or zip the files, Word will refuse to open the resulting file. On top of that, if you look in any of the folders using the Finder, a hidden .DS_Store file will be created in the folder. When re-zipped, Word will not accept the extra file and again report an XML error. The solution to these issues is to use the command line, like the Unix warrior you want to be! Remember to run each Terminal command by pressing the Return key after typing the command.

A valuable utility for this is OpenTerminalHere. Open any Finder window, click on OpenTerminalHere and a terminal window opens pointed to the Finder window. So download and install it, then follow these steps to open, edit and re-zip Office files:

  1. Move a copy of the Office document (let’s call it TestDoc.docx) to a separate folder and open that folder in the Finder.
  2. Click on OpenTerminalHere to open a copy of Terminal aimed at the folder.
  3. In the Terminal, type
    unzip TestDoc.docx

    then press Return. The file is unzipped into several folders plus a file called [Content_Types].xml.

  4. Do not look in any of the folders using the Finder, or you’ll have to start over. To examine a folder’s contents, use the Terminal to change the folder, then list the contents:
    cd word

    ls -l
  5. To go back up to the previous folder, type:
    cd ..
  6. To edit the files, open your text editor, then navigate using the File>Open dialog to find the file. Edit the file, then save and close.
  7. When you’re all done, double-check that terminal is pointing at the original folder holding the documents and the expanded folders. If you’re unsure, close terminal, then click on OpenTerminalHere to reopen in the right spot.
  8. In Terminal, re-zip the files with this style of command:
    zip -r RevisedDoc.dotx [Content_Types].xml _rels docProps word

    This example is for Word, but the correct syntax after zip -r is to type the name of the final document, followed by the file and folders, each separated by a space. The file is reassembled into an Office file.

  9. Test that you can open it. If you get an XML error notice, re-read the above steps and try again.

Please note: these editing techniques are required when editing in macOS with Word, PowerPoint and Excel documents and templates, plus Office Theme files (the kind exported from PowerPoint that combine all Theme elements.

If, on the other hand, you are editing a Font Theme or a Color Theme, those are simple XML files. They don’t need to be unzipped or re-zipped and Office doesn’t seem to care about macOS attributes attached to them. These plain XML files don’t need to be handled through the terminal, just use the Finder.

Next time, we’ll be looking at managing Word styles in macOS. Finally, a way to get rid of the zombie styles automatically created by Word! Happy hacking!


March 2016 Update

An alternative to working entirely in Terminal is to work on a network or USB disk where creation od .DS_Store files has been turned off. On a network disk, open Terminal in your choice of folder and run the command:

defaults write com.apple.desktopservices DSDontWriteNetworkStores true

To use a USB disk, run this command instead:

defaults write com.apple.desktopservices DSDontWriteUSBStores true

While this will prevent future generation of the .DS_Store files in that folder and any subfolders, it’s very likely you already have such files, since they’re created almost as soon as you view a folder’s contents in the Finder. In addition, some important XML parts are hidden and need to be revealed. So while Terminal is open, run:

defaults write com.apple.finder AppleShowAllFiles YES

followed by:

killall Finder

The second line restarts the finder to force a refresh of the view. Now you can see any .DS_Store files and delete them before re-zipping the files into an Office document. You’ll have still have to do the zipping in Terminal. Also, no .DS_Store files means OpenTerminalHere doesn’t work, so you’ll have to navigate manually via Terminal commands. Now you know why this is a lame alternative.

If you try this technique, you can always restore the clean file view by running:

defaults write com.apple.finder AppleShowAllFiles NO
killall Finder

Once you’ve created this OOXML editing drive, you can use the command-line zip utility to unzip the files. But there’s also a very useful GUI utility that works better than Archive Utility with Office files. Visit the App Store and get The Unarchiver. Then use it to unzip and expand the Office file.


Editing in macOS – May 2016 Update

BBEdit 11 and better has the ability to open and edit Office files directly, avoiding all of the above hassle when editing in macOS. BBEdit has a 30-day free trial with all features included. While older versions of BBEdit used Tidy to format text, that utility has been retired. The BBEdit programmers have written a script to format XML in human-readable form. You can download it from here, please be sure to read the installation instructions first: Click to download XML Tidy Script for BBEdit

Here’s your working procedure:

  1. Open your Office file in BBEdit 11 or later. In the left-hand pane, you’ll see a folder tree of the files contained within, so no unzipping is required
  2. Select the file you want to edit. The file opens in the main BBEdit window, displaying two lines. The first is the XML header, the second is the actual content.
  3. Click at the left end of the second line.
  4. Choose Text>Apply Text Filter>run_tidy.
  5. Make your edits and save. It’s not necessary to linearize the XML. The Office program will do that anyway the first time you save it. However, if you like to leave things exactly the way you found them, click in from of the first line of content (after the header line), choose Markup>Utilities>Format…, change the Mode to Compact and click on the Format button. Save the file and test your editing in macOS.


Editing in macOS – July 2018 Update

Technology marches on! If you use the Chrome browser, there is a free XML editing alternative that avoids unzipping and rezipping files. Open this link in Chrome: OOXML Tools and download the free plugin. After installation, click on the OOXML icon to the right of the browser address bar. Drag your Office files onto the browser window to begin editing. When you’re finished, click on the Save button, then the Download button in the upper left corner and give the new file an appropriate name. Chrome will place the new file in your Downloads folder and leave the original file untouched. OOXML’s EMF/WMF bug has been fixed, so download the most recent version. Thanks to Bram Alkema of the Netherlands for informing us about OOXML Tools.

Please note, for any OOXML Hacking that requires adding new XML parts (Ribbon mods, creating SuperThemes), BBEdit and OOXML Tools will not work. You’ll have to use the March 2016 update solution and create a network or USB disk set up for XML editing.

We’re experts in XML hacking, so you don’t have to be. Contact me at production@brandwares.com with the details of what you need hacked.

OOXML Hacking: Font Themes

Font themes are one of the simpler theme elements in Open Office XML, but for some baffling reason, Mac Office users can’t create one. It’s odd enough that the only Mac program that can create a color theme is PowerPoint, but even it can’t provide an escape from Calibri and Arial! So I’m going to show you how to do it on your own.

Let’s start with a dead-simple font theme. Here’s the minimal file that Office will read:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<a:fontScheme xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" name="Test">
  <a:majorFont>
    <a:latin typeface="Arial"/>
    <a:ea typeface=""/>
    <a:cs typeface=""/>
  </a:majorFont>
  <a:minorFont>
    <a:latin typeface="Arial"/>
    <a:ea typeface=""/>
    <a:cs typeface=""/>
  </a:minorFont>
</a:fontScheme>

Important Note: If you copy and paste this sample, you must change the non-breaking space characters to ordinary spaces. I need to use non-breaking spaces to format an HTML page, but Office will refuse to display your font theme if you don’t search and replace them with regular spaces.

You can create this in any text editor, including TextEdit in plain text mode (don’t try this with an rtf file). However, by default TextEdit will change the necessary straight quotes to smart quotes, producing a file that Office will not recognize. If you’re using TextEdit, make sure you visit both TextEdit>Preferences and Edit>Substitutions and turn off Smart Quotes in both locations. A better alternative is the free version of BBEdit. When you visit this link, click on the Download link to get the free version. If you do any significant amount of XML editing, the paid version of BBEdit is well worth the $50 price tag.

The most common font theme problem is using smart quotes (Hex 201C + 201D, Decimal 8220 + 8221) other than plain straight quotes (Hex 22, Decimal 34). But you can also ruin a font theme by using non-breaking spaces (Hex A0, Decimal 160) instead of regular spaces (Hex 20, Decimal 32). Even though a font theme is encoded in UTF-8, you should only use plain ASCII characters for the text. XML has a low tolerance for non-standard characters.

Now that you’re set up to edit, copy and paste the font theme file. The <a:latin> tag is the standard font for your theme. <a:majorFont> is for headings and <a:minorFont> for text. Fill in <a:ea> with a font that supports Chinese or Japanese (ea stands for East Asian), if you want to support those languages. The <a:cs> tag stands for complex scripts: Arabic, Thai, Hebrew and many more. For more detail on non-European language support in font themes, please see my article XML Hacking: Font Themes Complete. Or you can just leave those tags blank if you have a predictable user base that won’t require them.

A common mistake is to get too specific with the font name in font themes. The name is only the base font name as displayed in Powerpoint’s font menu. “Open Sans” will work, but “Open Sans Extrabold” will cause Word 2011 to display a blank space where the font theme should be, while Word 2016 will simply ignore the entire file.

Save the file as a text file with a .xml ending and give it the name you want to appear in the user interface. “Brandwares.xml” will appear in the Font Theme menu as Brandwares.

For Office 2016, 2019 and 2021 save this file to Users/YourUserName/Library/Group Containers/UBF8T346G9.Office/User Content/Themes/Theme Fonts. For Office 2011, save it to Users/YourUserName/Library/Application Support/Microsoft/Office/User Templates/My Themes/Theme Fonts. In current versions of OS X, the user Library is hidden by default. To open it, hold down the Option key, while clicking on the Go menu and choosing Library.

Once it’s correctly installed, it will show in PowerPoint’s Slide Master view under the Fonts dropdown. A new Custom group will appear at the top of the list, with your font theme in it. Once you apply it and a color theme to a presentation, you can save as a theme file and distribute that to your users, it will contain the font theme you just created. Happy hacking!


Font Themes – An Alternate Method

March 2017 edit: If you have any problems creating a font theme from scratch, here’s a workaround. Open an existing font theme that come with Office and edit the font names to the ones you want to use. These files are the verbose style discussed in this article: XML Hacking: Font Themes Complete. For most uses, you only need to set the a:latin font in the a:majorfont and a:minorfont sections. Here’s where you can find the Microsoft Font Themes:

Office 2011 for Mac – Open Applications/Microsoft Office 2011/Office/Media/Office Themes/Theme Fonts and copy any of the XML files.

Office 2016 or 2019 for Mac – Open Applications, then right-click on Microsoft PowerPoint and choose Show Package Contents. Open Contents/Resources/Office Themes/Theme Colors and copy any of the XML files in there.

Here are the locations for 32-bit versions of Windows. If you’re using a 64-bit version of Windows, check the same path inside C:\Program Files (x86).

Office 2007 for Windows – Open C:\Program Files\Microsoft Office\Document Themes 12\Theme Fonts.

Office 2010 for Windows – Open C:\Program Files\Microsoft Office\Document Themes 14\Theme Fonts.

Office 2013 for Windows – Open C:\Program Files\Microsoft Office\root\Document Themes 15\Theme Fonts.

Office 2016 or 2019 for Windows – Open C:\Program Files\Microsoft Office\root\Document Themes 16\Theme Fonts.

Too complicated? We can help! Brandwares is a full service template creation service for all Office programs. Contact me at production@brandwares.com

OOXML Hacking: Table Styles Complete

Custom Table Styles are probably one of the more detailed hacks you’ll have to write. See the constructions details in my previous post. Besides the basic table format, there are 6 optional format layers you need to at least consider. In a minimal table style, you’ll need to include at least the Header Row, First Column and Banded Rows. Most users will expect to see these options. Total Rows, Last Columns and Banded Columns are less requested, you only need to include them if a design or client specifically requires them.

As mentioned in part 1, if you haven’t hacked XML before, please read XML Hacking: An Introduction. If you’re using a Mac, you should also read XML Hacking: Editing in OS X. In addition, an essential companion to this pair of articles is the post on setting Default Table Text, which is set in a different XML component..

Let’s take a look at how our work appears in the PowerPoint interface. First, we’ll insert a plain vanilla table. By default this takes on colors and fonts from the current PowerPoint theme:

Default Table Style

Next, we choose the Table Tools>Design tab, open the Table Styles gallery. Up at the top a new Custom section has appeared with our new custom table style:

Select Custom Table Style

Select the custom table style and the default table changes to match our design. This screen shot has all formatting options turned off, so effectively we are seeing the Whole Table formatting only.

All Table Style Options Off

Table Style Options: Banded Rows and Header

Using the options panel in the upper left corner, we can add some of optional formatting layers we created in XML. First, let’s turn on banded rows. If you remember, we only formatted odd-numbered rows, so the banding only changes rows 1 and 3 in our example:

Table Style Banded Rows

Next, we’ll leave banded rows on and also add the Header row. This row doesn’t count as part of the table body, so the banding moves down 1 row:

Banded Rows and Header

Table Style Options: First and Last Columns

Next, we’ll turn off banded rows, leave the Header as is and add the first column:

Table Style Header Row and First Column

Here’s the table with First and Last Columns checked:

Table Style First and Last Columns

Table Style Options: Header and Total

And finally, Header and Total Rows:

Table Style Header and Total Rows

As you can see, with some pre-planning, one table style can cover quite a few related table looks. The layer options for different features make the table useful for many different purposes and the options panel makes it fast and easy for users to try different combinations. This feature is a major advance over tables in PowerPoint 2003 and earlier, which were quite crude by comparison.Table styles a similar way in Word, PowerPoint and Excel. While Word and Excel include table style editors in their interface, PowerPoint needs to be hacked to create them. Sadly, the table style specs for the different parograms are dissferent, so it’s not possible to trasfer tables style OOXML among Word, Excel and PowerPoint.

Table Styles: What You Can’t Do

You cannot set vertical cell/row alignment or cell margins in default taxt table text or a table style. It would have been possible given the OOXML spec, but Microsoft just didn’t bother.

Of course, if the process is too complex, we’re here to help. The current price on a custom table style is US$120. Just email me production@brandwares.com