Tuesday, September 9, 2008

Open XML SDK V2.0 - How to read an Excel 2007 using Open XML Format SDK V2.0

Here's a quickie on navigating to a cell in an Excel 2007 XML document, using the Open XML Format SDK verison 2.0 released recently...

SpreadsheetDocument ssdocCellStyleTest = SpreadsheetDocument.Open("C:\\XML2\\BorderTest.xlsx", false);
Workbook workbook = ssdocCellStyleTest.WorkbookPart.Workbook;
WorksheetPart sheetpart;
foreach (WorksheetPart worksheetpart in workbook.WorkbookPart.WorksheetParts)
{
Worksheet worksheet = worksheetpart.Worksheet;
foreach (SheetData sheetData in worksheet.Elements())
{
foreach (Row row in sheetData.Elements())
{
MessageBox.Show("row index is " + row.RowIndex);
foreach (Cell cell in row.Elements())
{
MessageBox.Show("cell index " + cell.CellReference);
}

}
}
Stylesheet styleSheet = workbook.WorkbookPart.WorkbookStylesPart.Stylesheet;
styleSheet.Fonts.Elements();
foreach (CellFormat cellFormat in styleSheet.CellFormats.Elements())
{
MessageBox.Show("Cell Style ID " + cellFormat.FormatId);
MessageBox.Show("Cell has border " + cellFormat.ApplyBorder);
MessageBox.Show("Cell has fill " + cellFormat.FillId);
DocumentFormat.OpenXml.Spreadsheet.Fonts oFonts = styleSheet.Fonts;
DocumentFormat.OpenXml.Spreadsheet.Font oFont = (DocumentFormat.OpenXml.Spreadsheet.Font)oFonts.ChildElements[int.Parse(cellFormat.FontId.ToString())];
foreach (FontName oFontName in oFont.Elements())
{
foreach (OpenXmlAttribute attrib in oFontName.GetAttributes())
{
MessageBox.Show("Font attrib " + attrib.LocalName + " value " + attrib.Value);
}
}
}

So ... as you see every element of the SpreadsheetML language... is now an OpenXmlElement class, or either an OpenXmlLeafElement class... you are familiar with WorkbookPart, ThemePart etc.. now each part comes with an object like workbookPart has Workbook object, ThemePart has Theme object, StyleSheetPart has a stylesheet object.. which is a wrapper of the data in the xml stream of its corresponding part.
ITs stilll not that robust... as you can see... given the Font id and the stylesheet.Fonts collection object retrieving the font attributes is still difficult, because even b, u, i, name elements have classes Bold, Italic, Underline, FontName etc. (NOTE: Dont confuse this with System.Drawing objects, they are part of the DocumentFormat.OpenXml.Spreadsheet namespace)

However the theme object is a good wrapper. For example theme.themeElements()[0].ColorScheme.AccentColor.RGBColorInHex gives the rgb value of the theme color instantly.. we dont have navigate as we did in Fonts....

So its an inconsistant new SDK... there is no uniformity.... because DocumentFormat.OpenXml.Spreadsheet.Font is different from DocumentFormat.OpenXML.WordProcessing.Font....

Have to explore more on VML drawing extraction usig this new SDK...

Happy excelling!

Regards,
Kalpana

4 comments:

Anonymous said...

hi Kalpana..
Is there any specific reason for posting "Free Trojan condoms Ad" with this article..

Kalpana Anand Natraj said...

Hi Anonymous.. I don't decide what ads come up with the content. This is purely technical info.
Google places the ad automatically. I have no control over it.

I am sorry if it had offended you.

Thanks for visiting my page. Please visit agian. Let me know what kind of content you will like to see here.

Thanks,
Kalpana

Anonymous said...

I thought the post made some good points on extracting data, I use python for simple extracting data,data extraction can be a time consuming process but for larger projects like the web, files, or documents i tried http://www.extractingdata.com which worked great, they build quick custom screen scrapers, extracting data, and data parsing programs

Alex said...

I was at the Inet when I unexpectedly lost my excel files. To my good fortune a friend rapidly suggested me a resolution. He was right, reason of it solved out my troubles for a minute and absolutely easily. What is more the utility may help in this trouble also - Microsoft Excel repair tool.