Monday, 17 September 2007 10:48

Linux and Windows interoperability with OpenXML

In the past, I have been less than flattering over Microsoft’s OpenXML document format. Make no mistake, an open file format is definitely a must-have but is OpenXML the right one? Nevertheless, OpenXML exists so let’s be pragmatic about it: OpenXML can bridge the gap between Windows and Linux. In fact, it can open up whole new opportunities for Linux coders to produce documents in a Microsoft-friendly format.
Previously, I questioned the need for a new standard given there already is a standardised open document format. We also wondered how significant Ecma ratification was anyway, given Ecma's own web site says it is not a standards body but instead a vendor-supported body that will facilitate red-tape and the pushing through of documents through the ISO processes.

Before we begin, let's be clear about two things: firstly, my views do not necessarily reflect that of ITWire or its editors. Secondly, this story is not making any value judgment on OpenXML vs ODF. Instead, I'm saying, "ok, OpenXML exists. Here's a bunch of stuff you can do with it." Perhaps some might say you can achieve the same things if only Microsoft adopted the existing ISO standard instead - and that's cool; just because I say "here's something you can do in OpenXML" you shouldn't interpret that as meaning "OpenXML is the greatest thing in the world."

Now, credit where credit is due, this story didn’t come to me from my own keen thought processes. Rather, Microsoft expressed disappointment at my views and said “a better story” would have been the positive benefits OpenXML can bring the interoperability between different operating systems, aka Windows and Linux.

That's a fair point; Microsoft have published the specifications for the XML-based file formats used throughout their latest Office suite applications. Never before have alternate applications had the same opportunity to offer 100% compatibility. In the past, rival suites which purported to open such documents could not guarantee complete success: this is no longer the case and major vendors are taking it up. The Novell edition of Open Office supports OpenXML. The Gnumeric open source spreadsheet supports OpenXML. And Corel have announced their support of OpenXML.

What’s more, an XML-based specification has a massive advantage over a binary file format – by virtue of using XML, a program’s support for OpenXML does not have to be full-blown. It’s now possible to produce simple utilities which operate over Microsoft Office documents, performing their own work but safely ignoring everything else at no risk of damaging the document. Such a utility might be a command-line global search-and-replace, for example, which swiftly processes a batch of documents in one sweep. Or, perhaps a utility which reduced the colour depth (and thus dramatically shrink the file size) of all embedded images within a set of specified documents. In these hypothetical cases the developer would not have to understand or implement the entire OpenXML specification. (Although to find the relevant information, they would still have to wade through the 6,000 page description!)

This is the approach crafty developers worldwide have been taking. is building a repository of these with RSS subscription available. Here’s some of the more interesting ones.


One developer demonstrates how to dynamically create invoices within a PHP-powered web application that are in Microsoft Excel format.

This code sample is compelling; firstly, the originating server can be a Linux box or indeed any other machine that is capable of hosting PHP. There need not be any proprietary libraries or run-times installed. In fact, previously, generating Excel documents on-the-fly was a tedious process even for Windows developers; a .NET web site had to rely on “primary interop assemblies” (PIAs) to communicate with non-.NET Office APIs.

Secondly, it does have to be said, if you’re sending an Excel document there’s a safer pragmatic bet the recipient has either Excel or an Excel-compatible spreadsheet application than any other format. True, at this time Office 2007 doesn’t have a major foothold but this will undoubtedly grow.

Now, perhaps dynamic creation of invoices isn’t really your cup of tea. Where this bit of code really is terrific though is that it itself is only serving to demonstrate a larger piece of work by the author – namely, a reusable open source library of Excel 2007 reading/writing routines called PHPExcel.

The implemented features thus far are quite impressive; an in-memory spreadsheet can be represented, with worksheets, data and formulas. Protection is enforced as is formatting including expected font changes and more complex items like gradient fills. Images may be added and various styles set, along with printing options and saving to several file formats.

Having a library like this means that all the complex work involved in reading and writing the OpenXML format is taken care of. The main app becomes merely a dead simple sequence of calls like so, allowing the developer to focus on the problem at hand:

include 'PHPExcel.php';
include 'PHPExcel/Writer/Excel2007.php';
$objPHPExcel = new PHPExcel();
$objPHPExcel->getActiveSheet()->setCellValue('B1', 'Invoice');
$objPHPExcel->getActiveSheet()->setCellValue('E4', '=C4*D4');
$objWriter = new PHPExcel_Writer_Excel2007($objPHPExcel);

Producing Excel documents in PHP has never been easier.

Creating Word documents in pure Java

Continuing the trend, another team of developers have devised Java code which generates valid OpenXML word processing documents without any use of the Office client applications, or any Microsoft APIs or libraries, and indeed, without even requiring a Microsoft operating system.

The intention of this code is to assist developers who work in Java on Linux or Macintosh or any other non-Microsoft environment, and also developers building server-side applications that wish to produce Office-compatible documents to present data and reports.

Actually, to be precise, OpenXML covers a set of XML document standards; SpreadsheetML is the subset relating to spreadsheeting which PHPExcel is striving to implement, and this Java code actually implements the WordProcessingML side of OpenXML.

More coders have jumped in with Java snippets. Another sample shows creating a document, adjusting its properties and thumbnail, adding text and converting to HTML output. and several more can be found.

The pinnacle of them all, however, is OpenXML4J – an open-source library for Java developers that provides classes for OpenXML development. It’s in pure Java meaning it’s usable anywhere you have a standard Java compiler and library and runtime. Just like PHPExcel, this library can be used by developers to manage all the mechanics of OpenXML document construction and manipulation making working with Word/Excel and PowerPoint documents a breeze on any platform.

ODF to OpenXML and back again

Another project gaining traction is the ODF to OpenXML translator package. This title is potentially misleading; the project doesn’t just convert ODF (Open Document Format) documents to OpenXML but also allows conversion the other way. This is actually among the top 25 projects on SourceForge.

The development goals for this team are to make plugins that provide interoperability between applications based on ODF and OpenXML. A core deliverable is the development of add-ins for Microsoft Office which permit both opening and saving of ODF files. Unfortunately, no such add-ins appear to be underway or planned for OpenOffice but a secondary deliverable is a series of command-line translator utilities to perform batch conversions in either direction. These utilities can also be run on servers, invoked by server-side applications.

The conversion process is essentially based on performing XSL transformations between the two distinct XML formats, along with necessary pre- and post-processing to manage the zip file packaging and some other housekeeping.
This project is open source, but is being developed by several commercial providers including an international software company who have in the past produced an OpenOffice converter for Word 2003.

The applications can be used to allow Microsoft Office to work with ODF documents created by and intended for use by ODF-compliant applications on Linux or other platforms. However, disappointingly, the applications themselves are compiled for 32-bit Windows environments only, and hence only run on that platform.

Real-world scenarios

With technology like that described here, handling OpenXML within Linux is a snap. We’ve mentioned possibilities for small utilities,and we’ve presented code to produce invoices on the fly. For something more substantial consider some real-world possibilities.

A banking example is a commercial bank website allowing its customers the facility to check their current balance and then with a simple click download and open a spreadsheet generated on the fly from the server. This spreadsheet may include all the user’s account data. They may now work with this data and simulate loans or other operations, or sum the interest paid during a financial year or other activities.

Similarly, an energy company might provide opportunity for customers to check electricity consumption and download a dynamically-generated spreadsheet with formulas and customer data which can be merged with data from other sources thus realising an ad-hoc analysis.

For knowledge workers, an OpenXML app might generate presentations on demand from several slide decks stored on a web server. Presentations can be quickly compiled, adding or removing or shuffling slides as required.

OpenXML4J present other scenarios that can be imagined.

Now, sure, Microsoft ASP.NET developers can do all these things with the .NET framework and Microsoft Office already. And Linux developers could do all this with other document formats. But, pragmatically, OpenXML opens up the popular and widespread Microsoft Office application to developers and applications worldwide, without imposing any constraints on the server or desktop technology used. A verdant world of interoperability between diverse operating systems is opened up, giving the user a rich user interface and experience.

Further reading

The Ecma TC45 committee have produced copious amounts of paperwork so you probably wouldn’t want to start explorations into OpenXML with them; instead, OpenXML Explained, the first OpenXML book – an easily-digested 128 page publication – can be purchased or better still downloaded free as a PDF from

For any person who is keen to produce their own code that works with OpenXML documents – whether consuming or constructing – this is an excellent resource and reference and tutorial all in one.

(Now, did we say everyone should adopt OpenXML? No, we said "if you want to work with OpenXML documents, here's some code snippets." - have fun :)



Recently iTWire remodelled and relaunched how we approach "Sponsored Content" and this is now referred to as "Promotional News and Content”.

This repositioning of our promotional stories has come about due to customer focus groups and their feedback from PR firms, bloggers and advertising firms.

Your Promotional story will be prominently displayed on the Home Page.

We will also provide you with a second post that will be displayed on every page on the right hand side for at least 6 weeks and also it will appear for 4 weeks in the newsletter every day that goes to 75,000 readers twice daily.




Denodo, the leader in data virtualisation, has announced a debate-style three-part Experts Roundtable Series, with the first event to be hosted in the APAC region.

The round table will feature high-level executives and thought leaders from some of the region’s most influential organisations.

They will debate the latest trends in cloud adoption and technologies altering the data management industry.

The debate will centre on the recently-published Denodo 2020 Global Cloud Survey.

To discover more and register for the event, please click the button below.


David M Williams

David has been computing since 1984 where he instantly gravitated to the family Commodore 64. He completed a Bachelor of Computer Science degree from 1990 to 1992, commencing full-time employment as a systems analyst at the end of that year. David subsequently worked as a UNIX Systems Manager, Asia-Pacific technical specialist for an international software company, Business Analyst, IT Manager, and other roles. David has been the Chief Information Officer for national public companies since 2007, delivering IT knowledge and business acumen, seeking to transform the industries within which he works. David is also involved in the user group community, the Australian Computer Society technical advisory boards, and education.


Webinars & Events