C++ ExcelFormat Library.

2013. 10. 18. 16:02C++

http://www.codeproject.com/Articles/42504/ExcelFormat-Library

Introduction

There has been the C++ project BasicExcel for quite some years now. It is very useful to read and write Excel sheets in.xls format. However, it contains only very basic functionality. There are missing features like text formatting using fonts, and display formats and formulas are not yet supported. This article covers this additional functionality. The new, derived project ExcelFormat is based on the old BasicExcel code, and just adds what I was missing for my own needs. If you want to see where I am using the new library, you can look at the Export functions of Service Managerand Uninstall Manager.

XLS Format

This library processes Excel files in the BIFF8 XLS file format. For a detailed documentation of the format and all its internals, there are two documents available:

Compatibility

You can use the code on MS Windows with the MSVC compilers of VC++ 6.0 SP3 or later. Using GCC on MacOSX or Linux is also supported, but with some limitations. May be it even works in other environments like Sun Solaris, but this is not yet tested. Here you can see a table of environments with their respective status:

small XLS fileslarger XLS files
WIN32, using Windows API IStorageOKOK
WIN64, using Windows API IStorageOKOK
MacOS X / Linux 32 bit, using BasicExcel CompoundFile implementationOKissues
MacOS X / Linux 64 bit, using BasicExcel CompoundFile implementationOKissues

The issues listed above are because of the incomplete CompoundFile implementation of the underlying BasicExcellibrary. I already fixed some part of the BasicExcel code regarding numbers in RKValue format and 64 bit compatibility, but there are still issues that show up when reading or writing larger XLS files which use XBAT or SBAT entries.

Using the Code

To use the new formatting functionality, first create an XLSFormatManager object, like in the example1() function, and attach it to an existing BasicExcel object:

void example1(const char* path)
{
    BasicExcel xls;
 
    // create sheet 1 and get the associated BasicExcelWorksheet pointer
    xls.New(1);
    BasicExcelWorksheet* sheet = xls.GetWorksheet(0);
 
    XLSFormatManager fmt_mgr(xls);

You can find all the examples of this article in the source code file Examples.cpp.

To define a custom font, create an ExcelFont object and set any needed properties, for example, the font weight for a bold font:

ExcelFont font_bold;
font_bold._weight = FW_BOLD;	// 700=bold, 400=normal

The format of an Excel cell can be defined by a CellFormat object, which holds the chosen font and some more properties:

CellFormat fmt_bold(fmt_mgr);
fmt_bold.set_font(font_bold);

After you have prepared the CellFormat, you can choose the font and display settings of Excel cells by callingSetFormat():

// Create a table containing a header row in bold and four rows below.
int col, row = 0;
 
for(col=0; col<10; ++col) {
    BasicExcelCell* cell = sheet->Cell(row, col);
 
    cell->Set("TITLE");
    cell->SetFormat(fmt_bold);
}
 
while(++row < 4) {
    for(int col=0; col<10; ++col)
        sheet->Cell(row, col)->Set("text");
}
 
++row;

Text color is specified by setting color indices in ExcelFont, for example:

ExcelFont font_red_bold;
font_red_bold._weight = FW_BOLD;
font_red_bold._color_index = EGA_RED;
 
CellFormat fmt_red_bold(fmt_mgr, font_red_bold);
fmt_red_bold.set_color1(COLOR1_PAT_SOLID); 		// solid background
fmt_red_bold.set_color2(MAKE_COLOR2(EGA_BLUE,0));	// blue background

 
CellFormat fmt_green(fmt_mgr, ExcelFont().set_color_index(EGA_GREEN));
 
for(col=0; col<10; ++col) {
    BasicExcelCell* cell = sheet->Cell(row, col);
 
    cell->Set("xxx");
    cell->SetFormat(fmt_red_bold);
 
    cell = sheet->Cell(row, ++col);
    cell->Set("yyy");
    cell->SetFormat(fmt_green);
}

ExcelFormat.h contains constants to define basic palette colors in the enumeration EXCEL_COLORS, you can use in calls to ExcelFont()::set_color_index(). The macro MAKE_COLOR2 accepts two color indices to specify the pattern line and pattern background colors. As a shortcut to calling CellFormat::set_color1() andCellFormat::set_color2(), you can also use CellFormat::set_background() to define cells with solid background colors or colorize patterns.

After creating and formatting the Excel cells in memory, all you have to do is to save the new Excel sheet as a file:

    xls.SaveAs(path);
}

This should be enough for a quick overview on how to use the new ExcelFormat objects. In the source code fileExcelFormat.cpp, you can find some more examples.

ExcelFont

There are nine font properties available:

struct ExcelFont
{
    ExcelFont()
 
    ...
 
    wstring _name;
    short   _height;
    short   _weight;
    short   _options;
    short   _color_index;
    short   _escapement_type;
    char    _underline_type;
    char    _family;
    char    _character_set;
 
    ...
 
};

For detailed information, please look into the XLS file format documentation.

CellFormat

These CellFormat properties are available in addition to the chosen ExcelFont and the display format:

struct CellFormat
{
    CellFormat(XLSFormatManager& mgr)
 
    ...
 
    char    _alignment;
    char    _rotation;
    char    _text_props;
    int     _borderlines;
    int     _color1;
    short   _color2;
 
    ...
 
};

In example2(), you can see how to change the font face and the font size:

ExcelFont font_header;
font_header.set_weight(FW_BOLD);
font_header.set_underline_type(EXCEL_UNDERLINE_SINGLE);
font_header.set_font_name(L"Times New Roman");  // font face "Times New Roman"

 
font_header.set_color_index(EGA_BLUE);
font_header._options = EXCEL_FONT_STRUCK_OUT;
 
CellFormat fmt_header(fmt_mgr, font_header);
fmt_header.set_rotation(30); // rotate the header cell text 30° to the left
 

int row = 0;
 
for(int col=0; col<10; ++col) {
    BasicExcelCell* cell = sheet->Cell(row, col);
 
    cell->Set("TITLE");
    cell->SetFormat(fmt_header);
}

The example3() function shows how to use CellFormat::set_format_string() andBasicExcelCell::SetFormat() to define text, number, and date format strings. There are some predefined constants for format strings:

#define XLS_FORMAT_GENERAL      L"General"
#define XLS_FORMAT_TEXT         L"@"
#define XLS_FORMAT_INTEGER      L"0"
 
#define XLS_FORMAT_DECIMAL      L"0.00"
#define XLS_FORMAT_PERCENT      L"0%"
 
 
#define XLS_FORMAT_DATE         L"M/D/YY"
 
#define XLS_FORMAT_TIME         L"h:mm:ss"
#define XLS_FORMAT_DATETIME     L"M/D/YY h:mm"

However, you can use any valid Excel format string to define custom display formats.

example4() shows how to use many different fonts and colors in one sheet:

example4.png

Formulas

Formulas are now preserved when reading and writing an Excel sheet. You can even store a formula into an Excel cell by calling BasicExcelCell::SetFormula(). However, currently, you either have to copy aWorksheet::CellTable::RowBlock::CellBlock::Formula object from an already existing cell, or you have to define it yourself, which is quite tricky because Excel uses a pre-parsed formula containing RPN tokens.

Points of Interest

For those who are interested in a bit of background information, I want to provide a description of what has been changed since the predecessor library BasicExcel.

Conditional Compilation

There are two different BasicExcel implementations, one for VC++ version 6.0, and one for newer compilers.ExcelFormat now merges these two code bases by using conditional compilation to handle the differences. Most of this is accomplished in the header file by this code snippet (and modifications using the newly defined macros):

#if _MSC_VER<=1200    // VC++ 6.0

#pragma warning(disable: 4786)
 
#define LONGINT __int64
#define LONGINT_CONST(x) x
#define COMPOUNDFILE
 
#else    // newer Microsoft compilers

#define LONGINT long long
#define LONGINT_CONST(x) x##LL
#define COMPOUNDFILE CompoundFile::
 
#ifdef _DEBUG
#define _ITERATOR_DEBUG_LEVEL 0	// speedup iterator operations while debugging
#endif
 
#endif

To distinguish between MS Windows environments using MSVC and others using GCC, the macro _MSC_VER is tested for existence. This switches between using the Windows API to access the compound binary file format and the BasicExcel implementation of CompoundFile.

The #pragma warning statement disables compiler warnings of VC++ 6.0 because of long compiler internal names in object files, which occur for non-trivial usage of STL classes.

Immediately after these defines, there are these preprocessor statements for VS 2005 and above:

#if _MSC_VER>=1400    		// VS 2005
#define _CRT_SECURE_NO_WARNINGS    	//MF
#define _SCL_SECURE_NO_WARNINGS    	//MF
 
#endif

They disable the VC++ security library warnings, because the BasicExcel code is not yet prepared for this new runtime library additions. By the way, the MF comments mark additions and fixes by me to the old BasicExcel code.

Additionally, I eliminated some compiler warnings by fixing data types. Typically, this was a change from integral types like int to the C runtime library type size_t.

To switch between 32 bit and 64 bit mode using GCC, use the compiler option -m32 respective -m64 like in the Makefile example in the source download. In the VC++ environment you can choose the target environment in the project settings.

New Features

In addition to the new API described above, I had to add code in BasicExcel in order to implement these new features:

  • Read and write the Formula structure from/to XLS files
  • Define, read, and write the FORMAT structure from/to XLS files
  • Get/set the XF index value BasicExcelCell objects for all the various cell types

Formatting Storage Structures

The information about the format of an Excel cell in the BIFF8 file format is stored using a so called XF index. This refers to an XF (extended format) record, which consists of the following members:

XF {
    short       fontRecordIndex     // FORMAT index
    short       formatRecordIndex   // FONT index

    short       protectionType
    char        alignment
    char        rotation
    char        textProperties
    char        usedAttributes
    int         borderLines
    int         colour1
    short       colour2
}

Besides a few directly associated attributes specifying alignment, rotation, etc. of the Excel cell, there are two more index values: fontRecordIndex and formatRecordIndex. These are used to define font and display format descriptions. In the whole picture, this two-level index based formatting architecture enables cell formatting with small file sizes and low memory usage, because there are typically only a few different fonts and display formats used in an Excel worksheet. The font index refers to a FONT record with the following attributes:

FONT {
    short       height
    short       options
    short       colourIndex
    short       weight
    short       escapementType
    char        underlineType
    char        family
    char        characterSet
    char        unused
    SmallString name
}

The third index is a special one. This format index is associated with a record containing only the index itself and a text representation of the display format:

FORMAT {
    short       index
    LargeString fmtstring
}

XLSFormatManager manages these three formatting substructures to format Excel cells using the C++ structuresCellFormat and ExcelFont:

struct CellFormat  ->  XF {FORMAT index, FONT index, XF attributes}
 
struct ExcelFont   ->  FONT {FONT attributes}

On calling CellFormat::set_font(const ExcelFont& font), the manager class searches for an already registered matching font description. If there is none yet, a new FONT record is created to be stored in the Excel sheet. On calling CellFormat::set_format_string(const wstring& fmt_str), the manager class searches for an already registered matching display format string. If there is none yet, a new FORMAT record stores the display formatstring. The same strategy is used when applying a CellFormat to a cell object by callingBasicExcelCell::SetFormat(const CellFormat& fmt): the manager class searches for an already registered matching XF description with identical font and format indices as well as matching XF attributes. If there is none yet, a new XF record is created to be stored in the Excel sheet. In the end, this results in a list of FORMATFONT, and XFrecords, which are stored in the header of the Excel workbook file. Each cell is formatted by storing a single related XF index, which determines all the cell formatting attributes, the font, and the display format.

For further implementation details, please have a look into ExcelFormat.hExcelFormat.cppBasicExcel.hpp andBasicExcel.cpp of the source code.

Memory Usage

To limit memory usage, a reference counting SmartPtr is used to manage the heap structures needed for formula storage. Every cell contains only an optionally filled pointer to the struct Formula. For text or number cells, formula information is not needed. So the smart pointer stays empty, and just contains the value NULL.

In the following section, you can see the implementation of the struct RefCnt as the base for struct Formula and the template struct SmartPtr to hold the reference counted heap objects:

// reference counter for SmartPtr managed objects
struct RefCnt
{
    // On construction the reference counter
    // is initialized with an usage count of 0.
    RefCnt()
     :    _ref_cnt(0)
    {
    }
 
    int    _ref_cnt;
};
 
 
// reference counting smart pointer
template<typename T> struct SmartPtr
{
    // default constructor
    SmartPtr()
     :  _ptr(NULL)
    {
    }
 
    // The initialized SmartPtr constructor increments the reference counter
 
          // in struct RefCnt.
    SmartPtr(T* p)
     :  _ptr(p)
    {
        if (p)
            ++_ptr->_ref_cnt;
    }
 
    // The copy constructor increments the reference counter.
    SmartPtr(const SmartPtr& other)
     :    _ptr(other._ptr)
    {
        if (_ptr)
            ++_ptr->_ref_cnt;
    }
 
    // The destructor decreases the reference counter and
 
    // frees the managed memory as the counter reaches zero.
    ~SmartPtr()
    {
        if (_ptr) {
            if (!--_ptr->_ref_cnt)
                delete _ptr;
        }
    }
 
    // The assignment operator increments the reference counter.
 
    SmartPtr& operator=(T* p)
    {
        if (_ptr) {
            if (!--_ptr->_ref_cnt)
                delete _ptr;
 
            _ptr = NULL;
        }
 
        if (p) {
            _ptr = p;
 
            ++_ptr->_ref_cnt;
        }
 
        return *this;
    }
 
     // operator bool() to check for non-empty smart pointers
 
    operator bool() const {return _ptr != NULL;}
 
     // operator!() to check for empty smart pointers
    bool operator!() const {return !_ptr;}
 
     // operator->() to access the managed objects
 
    T* operator->() {return _ptr;}
    const T* operator->() const {return _ptr;}
 
     // Dereference pointed memory
 
    T& operator*() {return *_ptr;}
    const T& operator*() const {return *_ptr;}
 
 
private:
    T* _ptr;
};

String Conversion

There are some new string conversion functions: stringFromSmallString()stringFromLargeString(),wstringFromSmallString()wstringFromLargeString() using narrow_string()/widen_string() are used to convert from internal Excel string structures to STL string classes and vice versa. You can use them to access the internal data storage of BasicExcel.

History

  • 20.09.2009 - Version 1.0 of ExcelFormat (BasicExcel Version 2.0)
  • 28.09.2009 - Version 2.0
    • Added new section Formatting storage structures
  • 04.10.2009 - Version 2.1
    • Updated source code with macros and constants for cell and font properties
  • 07.11.2009 - Version 2.2
    • Fixed VS2008 problem when reading sheets with formula fields
    • Added BasicExcel::Close()CellFormat::get/set_text_props() andget/set_borderlines()
  • 12.01.2010 - Version 2.3: (by Ami Castonguay and Martin Fuchs)
    • Fixed reference counting of Formula data structs
    • Support for shared formulas
    • Support for merged cells
    • Save formatting even if cell is empty
    • Flush fstream instead of closing it followed by open to prevent races in conjunction with virus scanners
    • Enable reading of XLS files exported by MacOS Numbers.app
  • 15.11.2010 - Version 2.4
    • Add second set_borderlines() overload
    • Add ExcelFont::set_italic()CellFormat::set_wrapping()
    • Handle COLINFO
    • Miscellaneous fixes
  • 01.01.2011 - Version 2.5
    • Dynamically allocate memory for unexpected high row/column values while loading XLS files
    • Unicode overloads for Load() and SaveAs()
    • Adjust to RKValues written by OpenOffice Calc
  • 03.02.2011 - Version 3.0
    • access compound document files using the Windows API
    • reduced memory consumption and increased speed
    • 64 bit portability
    • return current value string from formula cells