Intro to Structured Storage

By: Kenn Scribner for Visual C++ Developer

Background: Kate (the VCD editor) sent a note to a group of writers asking for articles regarding structured storage. It sounded like fun. :) I'd worked with structured storage quite a bit previously, so I tossed this article together. It never was printed (VCD stopped publication), but hopefully the work isn't wasted. Let me know what you think!

An Introduction to Structured Storage

If you haven't had the opportunity to interact with Microsoft's structured storage and would like to, this is your chance! If you've never heard of structured storage, read along for a minute and see if the technology fits into your work. The odds are you'll find structured storage a useful and handy technology. If you've used MFC to store your CDocument data, you may have already been using structured storage and not known it!

Before You Get Started

When you work with structured storage in Win32, you're working with COM objects. When I work with COM objects, I use ATL to encapsulate the COM object pointers. You'll see this in some of the example code I'll provide. Don't be too concerned if you're not a COM wizard, however. You don't need to know COM that well to use structured storage, and I won't perform any fancy ATL tricks along the way. We'll be using Win32 API calls to obtain initial structured storage pointers, so the COM objects themselves look very much like any other C++ object (although the usual COM rules apply, with the exception you don't need to initialize the COM runtime, interestingly enough).

Structured Storage--The Big Picture

Let's begin by looking at what structured storage is versus how you work with it. I'm sure you're already well versed at working with files and the Windows file system (FAT or NTFS, it doesn't matter). Files are stored in directories. Directories can also contain other directories. The result of this is you have a tree structure that contains directories and files. You can see this in the left-hand pane of Figure 1.

The reason I'm stating the obvious is that structured storage imposes a similar tree structure within a single file. In a structured storage file, commonly referred to as a compound file, you have concepts similar to directories and files. In structured storage, the concept of the directory is implemented by the storage. Similarly, files are called streams. The compound file itself is a binary file, but the structured storage components impose road signs and markers to establish a structure to the file itself. The structure happens to be very similar to that of your file system. In fact the metaphor is so precise I used similar bitmaps to denote storages and streams in the demo application I wrote for this article, an image of which you see in Figure 2.

The application, FileWriter, is showing you the contents of a compound file I created. The tree nodes in Figure 2 with the directory bitmap are storages, which contain other storages or streams. Streams are identified in the compound file using a text file image. FileWriter simply loaded a compound file and displayed the file's (structured) contents. I'll discuss FileWriter in detail a bit later in the article.

Note that to be technically accurate, I should point out that structured storage is an abstract concept while compound files are Microsoft's implementation of that abstraction. If people tend to use the terms interchangeably, it's probably because Microsoft invented the abstraction as well as the implementation, and if any other vendor happened to implement structured storage, I'm not personally aware of it. In fact, at one time Microsoft gave away their compound file implementation source code to facilitate its implementation on other disparate platforms.

Some of the benefits the Microsoft documentation mentions are file system independence, browsability, and access to certain internal data. That is, a compound file is a binary blob, so Macintosh recipients can open and use the data as well as Windows users, hence the file's independence from a given file system. Programs that want to browse the compound file are free to do so, even if they can't discern the contents of the streams (which may have data stored in a proprietary format). The DFView program I provided with FileWriter is an example of such a browser. And since you can use a standard set of API calls to work with structured storage, access to special information, such as version data, is simplified. You don't need to perform special magic tricks to retrieve the file's version information...just use the provided API calls.

The benefits Microsoft touts are fine, but I've not found them to be of practical concern to my own work. In my case, I work with compound files because they provide me a robust alternative to designing my own custom binary file formats. If I want to store complex application data in a file, I can do so easily using a compound file. The task is much more difficult if I have to design my own file format and then write the code to interact with the file. I also believe compound files help me organize my data storage requirements and lay out a reasonable format for persisting data. If the data I need to store comes in some format, I can usually map the data format to a structured storage file rather easily.

Another interesting feature I like regarding compound files is that you can open a compound file for transactional use. That is, you don't require MTS or COM+ to invoke transacted reads and writes to your compound file. I won't specifically show how that is done in this article, but it isn't difficult (merely include the transacted flag when opening the compound file and make sure you commit your changes). This can often be of tremendous benefit in some circumstances.

Now that you've read about the basic concepts and have an idea what benefits structured storage might provide to you, let's look at how Win32 supports structured storage.

Win32 Structured Storage Support

As with most Win32 API calls, there are many more structured storage API calls available to you than you will ever find a need to use. You will probably use very few most of the time, and I've listed those in Table 1. The API calls I haven't listed are either for low-level work, are for specific tasks that you don't normally need to perform, or are merely wrappers for COM interface methods (and you'll find it's easier to use the COM methods in most cases).

Table 1--A Sampling of Win32 Structured Storage API calls

API Call	Purpose
StgIsStorageFile	Quickly determine if a given file is in fact a compound file.
StgOpenStorage	Opens a top-level storage (the compound file itself). Also has an "ex" version you should use with Windows 2000 due to enhancements for that system.
StgCreateDocfile	Creates a new top-level storage (compound file). This API call has been deprecated by StgOpenStorage/Ex, but you can use it if you're interested in backwards compatibility with older source files.
WriteClassStg	Stores the documents class identifier (CLSID, a COM GUID) in the file for later retrieval and identification. Used for file versioning.
ReadClassStg	Retrieves the CLSID value previously written by WriteClassStg.

In general, you follow this outline when working with compound files:

Call StgIsStorageFile() to be sure an existing file is actually a compound file, if you are opening a file (there is no need to do this if you're writing to a newly-created compound file).

Call StgOpenStorage() to create or open an existing compound file.

(Optionally) Call ReadClassStg() (existing file) or WriteClassStg() (new file) to gather version information. If you're not interested in version information, then omit this step.

Use the IStorage pointer StgOpenStorage() returns to you to work with underlying (child) storages and streams.

When you release the last COM pointer associated with the compound file, the file is closed. You don't need to explicitly call an (imaginary) equivalent of CloseHandle() to close the file, as you would with traditional Win32 files.

IStorage and IStream

IStorage and IStream are the two primary COM objects you'll work with when dealing with compound files. Each has several methods, but some methods are used more often than others (as with the Win32 API calls). Given an IStorage pointer, which you obtain from StgOpenStorage(), you can open or create other storages and streams.

Table 2 provides a few of the IStorage methods you'll find most useful, while Table 3 does the same for IStream.

Table 2--A Sampling of IStorage Methods

Method	Purpose
CreateStorage	Creates a new (child) storage and returns its pointer to you.
CreateStream	Creates a new (child) stream and returns its pointer to you.
EnumElements	Enumerates child storages and streams, allowing you to discover what this particular storage contains.
OpenStorage	Opens an existing (child) storage and returns its pointer to you.
OpenStream	Opens an existing (child) stream and returns its pointer to you.
Stat	Returns information regarding this storage, to include such things as its name and size.

Table 3--A Sampling of IStream Methods

API Call	Purpose
Read	Returns contents of stream to you.
Write	Records provided data into stream.
Seek	Shifts location where data will be read from or written to within the stream.
CopyTo	Copies the stream contents to another stream.
Stat	Returns information regarding this stream, to include such things as its name and size.

I've included all of these methods in the source code for FileWriter, with the exception of IStream::CopyTo(). I've found that particular method handy in other situations, so I included it here.

There is a lot more to IStorage and IStream than I've mentioned here, but this is enough information to begin using these interfaces in your compound file work. Let's now turn to a real-world application that uses compound files to store its persisted data.

The FileWriter Application

To demonstrate compound files in action, I wrote the FileWriter application. To be honest, it does little more than read and write compound files, but that should help to minimize extraneous code. The concept behind FileWriter is the data to be stored (or retrieved) is hierarchical by nature. That is, the data is stored in the form of a tree. Therefore, FileWriter displays the data in a tree control, to which you can add or delete storages and streams at will. You're free to save the data to a file, and as you would expect, you can easily read the data back into the tree control.

FileWriter itself is an MFC dialog-based application, the majority of which I'll not cover here. If you're interested in tree control work, there is plenty inside FileWriter. Instead, I'll concentrate on the logic for dealing with the compound file itself. There are two major tasks to accomplish: read an existing file and write a new file.

I'm always amazed at how often I return to the roots of my engineering training for contemporary implementation details. In this case, both reading and writing involve recursive tree traversal routines, which you undoubtedly learned in Algorithms 101. But even if you didn't write depth-first recursive tree traversal programs in school, I'll show you what you'll need to get the job done. Let's start with the code that loads an existing file from the disk, the first piece of which you see in Listing 1.

Listing 1. FileWriter's OnLoad() Method

// Reference GUID -- {C3727EAF-D541-11d1-B1AA-00A045BFDB8F}

static const GUID GUID_VCDFileFormat10 =

{ 0xc3727eaf, 0xd541, 0x11d1, { 0xb1, 0xaa, 0x0, 0xa0, 0x45, 0xbf, 0xdb, 0x8f } };


// File filter

static TCHAR BASED_CODE szFilter[] = _T("VCD Data Files (*.vcd)|*.vcd|All Files (*.*)|*.*||");

void CFileWriterDlg::OnLoad()

   // Check for dirty

   if ( IsDirty() ) return;

   // Get a filename...

   CFileDialog dlg(TRUE,

                   _T("vcd"),

                   NULL,

                   OFN_HIDEREADONLY | OFN_EXPLORER,

                   szFilter,

                   this);

   dlg.m_ofn.lpstrTitle = _T("Read from a VCD Compound Data File");

   if ( dlg.DoModal() == IDOK ) {

      HRESULT hr = S_OK;

      try {

         // Make sure tree control is blank

         m_CStgTree.DeleteAllItems();

         // We should now have a filename...

         USES_CONVERSION;

         TCHAR strPath[MAX_PATH+1] = {0};

         _tcscpy(strPath,dlg.GetPathName());

         hr = StgIsStorageFile(T2W(strPath));

         if ( FAILED(hr) ) {

            // Not a valid storage file!

            AfxMessageBox(IDS_E_INVALIDFILE,MB_OK | MB_ICONINFORMATION);

            return;

         } // if

         CComPtr<IStorage> pIStorage;

         hr = StgOpenStorage(T2W(strPath),

                             NULL,

                             STGM_DIRECT |

                             STGM_READ |

                             STGM_SHARE_EXCLUSIVE,

                             NULL,

0,

                             &pIStorage);

         if ( FAILED(hr) ) throw hr;

         // Opened no problem...now run through the file

         // and add node information to the tree control.

//

         // First use OLE service to determine this compound file as one that is

         // handled by us.

         GUID guid;

         ReadClassStg(pIStorage, &guid);

         // If this test fails, the file is not handled by us...

         if ( !::InlineIsEqualGUID(guid,GUID_VCDFileFormat10) ) throw STG_E_OLDFORMAT;

         // Recurse through enumerated contents and store in

         // tree control

         hr = ReadStorage(pIStorage,TVI_ROOT);

         if ( FAILED(hr) ) throw hr;

         // With clean copy loaded, clear the dirty bit

         m_bDirty = FALSE;

         // We now have nothing to write to disk...

         CWnd* pBtn = GetDlgItem(IDC_WRITE);

         pBtn->EnableWindow(FALSE);

      } // try

      catch(HRESULT hrErr) {

         // Display error dialog

         LPVOID lpMsgBuf = NULL;

         ::FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |

                         FORMAT_MESSAGE_FROM_SYSTEM |

                         FORMAT_MESSAGE_IGNORE_INSERTS,

                         NULL,

                         hrErr,

                         MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),

                         (LPTSTR)&lpMsgBuf,

0,

                         NULL);

         // Display the string.

         if ( lpMsgBuf ) {

            AfxMessageBox((LPTSTR)lpMsgBuf,MB_OK | MB_ICONINFORMATION);

            // Free the buffer.

            LocalFree(lpMsgBuf);

         } // if

         // They couldn't load the file, so clear the tree

         // control for manual node insertion

         ClearTreeCtrl();

      } // catch

   } // if

After the perfunctory check for unsaved data and a request for the file to load, we need to make sure the file the user selected is at least a compound file:

// We should now have a filename...

CComPtr<IStorage> pIStorage;

USES_CONVERSION;

CHAR strPath[MAX_PATH+1] = {0};

_tcscpy(strPath,dlg.GetPathName());

hr = StgIsStorageFile(T2W(strPath));

if ( FAILED(hr) ) {

   // Not a valid storage file!

   AfxMessageBox(IDS_E_INVALIDFILE,MB_OK | MB_ICONINFORMATION);

   return;

} // if

Note that StgIsStorageFile() requires a wide (Unicode) string representing the file's path. The code you see here works whether you're building a Unicode version of the application or not. In this case, I convert the filename to a wide character string even if it was already stored in this manner (a Unicode build). Otherwise, you will need to sprinkle several #ifdef UNICODE preprocessor directives into your code, which I find clutters things up (the code could be somewhat more efficient in that case, however). I use the other structured storage API calls that require pathnames in a similar manner.

Assuming the file is a compound file, we open it for reading:

hr = StgOpenStorage(T2W(strPath),

                    NULL,

                    STGM_DIRECT |

                    STGM_READ |

                    STGM_SHARE_EXCLUSIVE,

                    NULL,

0,

                    &pIStorage);

if ( FAILED(hr) ) throw hr;

In this case we open the file in direct mode (versus transacted mode) for reading, and we want an exclusive share lock. The share lock is important, because if you don't exclude others to some degree, the system has to make a copy of the file for you to use. That is, even though you are reading the file, you (probably) want to deny other write operations while you have the file open. If you do, you can then work with the original file. If you allow other write operations (you don't specify STGM_SHARE_EXCLUSIVE or STGM_SHARE_DENY_WRITE), the system will create a snapshot copy of the file for you to read. This then allows others to modify the contents of the original file. The snapshot you're reading will remain unchanged. The problem with this is the copy operation itself can be time consuming. If you want to avoid the potentially long wait while the copy is made, at a minimum deny others write access.

If the file opened, we will have a valid IStorage pointer that represents the root storage of the file. And as it happens, the root storage always contains the filename. But because we're reading files we wrote (as you'll see shortly), we will also need to check for the file version. We do that using ReadClassStg():

// Opened no problem...now run through the file

// and add node information to the tree control.

//

// First use OLE service to determine this compound file as one that is

// handled by us.

GUID guid;

ReadClassStg(pIStorage, &guid);

// If this test fails, we do not handle the file...

if ( !::InlineIsEqualGUID(guid,GUID_VCDFileFormat10) ) throw STG_E_OLDFORMAT;

ReadClassStg() extracts from the file the GUID we stored (or will store when we write the file) that represents the version of the file. We test the equivalence of the stored GUID to the GUID we wrote, or at least the GUID we want to see, and if the test fails, we discontinue reading the file.

Assuming the version test passed, we extract the contents of the file:

// Recurse through enumerated contents and store in the

// tree control

hr = ReadStorage(pIStorage,TVI_ROOT);

if ( FAILED(hr) ) throw hr;

ReadStorage() is a routine I wrote that recursively (depth first) extracts the data, as you see in Listing 2.

Listing 2. FileWriter's Recursive ReadStorage() Method

HRESULT CFileWriterDlg::ReadStorage(IStorage* pIStorage, HTREEITEM hParent)

{

HRESULT hr = S_OK;

STATSTG statstg;

try {

// Determine this storage's name

hr = pIStorage->Stat(&statstg,STATFLAG_DEFAULT);

// Create a new node for ourselves

TCHAR strLabel[MAX_TEXT+1] = {0};

USES_CONVERSION;

_tcscpy(strLabel,W2T(statstg.pwcsName));

// Release name

CoTaskMemFree(statstg.pwcsName);

// Do the insert...

HTREEITEM hThis = m_CStgTree.InsertItem(TVIF_IMAGE |

TVIF_SELECTEDIMAGE |

TVIF_PARAM |

TVIF_TEXT,

strLabel,

IMAG_STORAGE, // image

IMAG_STORAGE, // selected image

0, // no state data

MODE_STORAGE, // lParam

hParent, // parent node

TVI_LAST);

// Iterate over storage to gather information about

// any sub-storages or subeams...

CComPtr<IEnumSTATSTG> pIEnumStatStg;

hr = pIStorage->EnumElements(0,NULL,0,&pIEnumStatStg);

if ( FAILED(hr) ) throw hr;

// Now that we have our iterator, we'll run through and determine

// what we have stored in this puppy.

hr = pIEnumStatStg->Next(1,&statstg,NULL);

if ( FAILED(hr) ) throw hr;

// Recurse/iterate...

while (SUCCEEDED(hr) && (hr != S_FALSE)) {

switch ( statstg.type ) {

case STGTY_STORAGE:

{ // scope

CComPtr<IStorage> pIChildStorage;

hr = pIStorage->OpenStorage(statstg.pwcsName,

STGM_READ |

STGM_SHARE_EXCLUSIVE,

&pIChildStorage);

if ( FAILED(hr) ) throw hr;

// Storages contain either more storages

// or independent streams. So to load all

// of the data, we recurse when we find

// a storage

hr = ReadStorage(pIChildStorage,hThis);

if ( FAILED(hr) ) throw hr;

} // scope

break;

case STGTY_STREAM:

{ // scope

CComPtr<IStream> pIStream;

hr = pIStorage->OpenStream(statstg.pwcsName,

STGM_READ |

STGM_SHARE_EXCLUSIVE,

&pIStream);

if ( FAILED(hr) ) throw hr;

// Stuff it into the tree!

TCHAR strData[MAX_TEXT+1] = {0};

DWORD dwRead = 0;

hr = pIStream->Read(strData,MAX_TEXT,&dwRead);

if (FAILED(hr) || !dwRead) { throw hr;

// Do the insert...

m_CStgTree.InsertItem(TVIF_IMAGE |

TVIF_SELECTEDIMAGE |

TVIF_PARAM |

TVIF_TEXT,

strData,

IMAG_STREAM, // image

IMAG_STREAM, // selected image

0, // no state data

MODE_STREAM, // lParam

hThis, // parent node

TVI_LAST);

} // scope

break;

} // switch

// Release name's memory

CoTaskMemFree(statstg.pwcsName);

// Snag next puppy. Note it'll return S_FALSE

// if there is no more to iterate.

hr = pIEnumStatStg->Next(1, &statstg, NULL);

} // while

} // try

catch(HRESULT hrErr) {

// Release name

if ( statstg.pwcsName ) CoTaskMemFree(statstg.pwcsName);

// Return the HRESULT...note the calling function

// OnLoad() will eventually display the error

// in a message box. We'll simply return the fact

// there was an error at this level.

hr = hrErr;

} // catch

return hr;

}

ReadStorage() is provided an IStorage interface pointer and a handle to the parent tree control's node. The first thing to do is record this storage's information in the tree control. Then, we'll dig into this storage for child node information. In the tree control, "storage information" is simply the name of the storage, which we obtain using IStorage::Stat():

// Determine this storage's name

hr = pIStorage->Stat(&statstg,STATFLAG_DEFAULT);

// Create a new node for ourselves

TCHAR strLabel[MAX_TEXT+1] = {0};

USES_CONVERSION;

_tcscpy(strLabel,W2T(statstg.pwcsName));

// Release name

CoTaskMemFree(statstg.pwcsName);

Here you see one of the COM rules I mentioned we would have to follow. Stat() allocates memory for the storage name. It is our responsibility to release the memory, which we do using CoTaskMemFree().

The storage's name is then placed into the tree control, the handle for which forms the parent handle for any child elements:

// Do the insert...

HTREEITEM hThis = m_CStgTree.InsertItem(TVIF_IMAGE |

                                        TVIF_SELECTEDIMAGE |

                                        TVIF_PARAM |

                                        TVIF_TEXT,

                                        strLabel,

                                        IMAG_STORAGE, // image

                                        IMAG_STORAGE, // selected image

                                        0, // no state data

0,

                                        MODE_STORAGE, // lParam

                                        hParent, // parent node

                                        TVI_LAST);

Now we determine if this storage contains other storages or streams. We do this using IStorage::EnumElements():

// Iterate over storage to gather information about

// any sub-storages or subeams...

CComPtr<IEnumSTATSTG> pIEnumStatStg;

hr = pIStorage->EnumElements(0,NULL,0,&pIEnumStatStg);

if ( FAILED(hr) ) throw hr;

EnumElements(), if it was successful, returns to us an IEnumSTATSTG interface pointer, which we use to iterate over the storages and streams the current storage may contain. There are actually four types of things that can be stored within a storage, but we're only interested in storages and streams. I'll defer discussion of the other two items to another article, as they're for slightly more specialized use. Let's stick to the basics for now.

In any case, we need to begin iterating, which we do using IEnumSTATSTG::Next():

// Now that we have our iterator, we'll run through and determine

// what we have stored in this puppy.

hr = pIEnumStatStg->Next(1,&statstg,NULL);

if ( FAILED(hr) ) throw hr;

// Recurse/iterate...

while (SUCCEEDED(hr) && (hr != S_FALSE)) {

   switch ( statstg.type ) {

      // Do something with the storage or stream...

   } // switch

   // Release name's memory

   CoTaskMemFree(statstg.pwcsName);

   // Snag next puppy. Note it'll return S_FALSE

   // if there is no more to iterate.

   hr = pIEnumStatStg->Next(1, &statstg, NULL);

} // while

Essentially we simply call Next() until there are no more elements left to examine. The switch() statement handles the storages and streams. Storages force a recursive call, while streams are a terminal condition for this iteration.

When we encounter storages, we simply open the child storage and recurse:

case STGTY_STORAGE:

   { // scope

      CComPtr<IStorage> pIChildStorage;

      hr = pIStorage->OpenStorage(statstg.pwcsName,

0,

                                  STGM_READ |

                                  STGM_SHARE_EXCLUSIVE,

0,

0,

                                  &pIChildStorage);

      if ( FAILED(hr) ) throw hr;

      // Storages contain either more storages

      // or independent streams. So to load all

      // of the data, we recurse when we find

      // a storage

      hr = ReadStorage(pIChildStorage,hThis);

      if ( FAILED(hr) ) throw hr;

   } // scope

break;

Streams, however, cause us to read the data the stream contains and place that into the tree control:

case STGTY_STREAM:

   { // scope

      CComPtr<IStream> pIStream;

      hr = pIStorage->OpenStream(statstg.pwcsName,

0,

                                 STGM_READ |

                                 STGM_SHARE_EXCLUSIVE,

0,

                                 &pIStream);

      if ( FAILED(hr) ) throw hr;

      // Stuff it into the tree!

      TCHAR strData[MAX_TEXT+1] = {0};

      DWORD dwRead = 0;

      hr = pIStream->Read(strData,MAX_TEXT,&dwRead);

      if (FAILED(hr) || !dwRead) { throw hr;

      // Do the insert...

      m_CStgTree.InsertItem(TVIF_IMAGE |

                            TVIF_SELECTEDIMAGE |

                            TVIF_PARAM |

                            TVIF_TEXT,

                            strData,

                            IMAG_STREAM, // image

                            IMAG_STREAM, // selected image

                            0, // no state data

0,

                            MODE_STREAM, // lParam

                            hThis, // parent node

                            TVI_LAST);

   } // scope

   break;

We begin our stream work by opening it for reading:

CComPtr<IStream> pIStream;

hr = pIStorage->OpenStream(statstg.pwcsName,

0,

                           STGM_READ |

                           STGM_SHARE_EXCLUSIVE,

0,

                           &pIStream);

if ( FAILED(hr) ) throw hr;

Once the stream is open, we read the data using IStream::Read(). If we could read the data, we then place it in the tree control:

// Stuff it into the tree!

TCHAR strData[MAX_TEXT+1] = {0};

DWORD dwRead = 0;

hr = pIStream->Read(strData,MAX_TEXT,&dwRead);

if (FAILED(hr) || !dwRead) { throw hr;

// Do the insert...

m_CStgTree.InsertItem(TVIF_IMAGE |

                      TVIF_SELECTEDIMAGE |

                      TVIF_PARAM |

                      TVIF_TEXT,

                      strData,

                      IMAG_STREAM, // image

                      IMAG_STREAM, // selected image

                      0, // no state data

0,

                      MODE_STREAM, // lParam

                      hParent, // parent node

                      TVI_LAST);

There is a detail here I should call to your attention, as you'll see it again when we write the stream data. Notice that for storages we inserted the name of the storage into the tree control, yet for streams we read the contents of the stream and inserted that. We ignored the name of the stream entirely. This is by design, as it made the user interface a bit easier to use. If we inserted streams by name, then we'd have to have another mechanism for inserting stream data. Since we care about storage names and stream data, but not stream names, I decided to algorithmically create stream names when writing the data to the file. You don't have to do things in this manner. In this case it made things a bit easier.

At this point you've seen how to read the compound file data into the FileWriter tree control. With the data in the tree control, it's all user interface and tree control work to add or remove storages and streams, or edit their contents. Since compound files don't play a part in that process, I'll not discuss it further. Definitely check out the code, however, if you're interested to see how to manipulate data using MFC's CTreeCtrl wrapper class. You've actually seen a bit of this so far when we inserted items into the tree control from the compound file.

Now it's time to write the (tree control) data to a compound file. For the most part this is the reverse of the code you've seen so far. A few details will change, however. Writing the data involves recursively traversing the tree control and writing the contents to the file. We have to know that a given tree node is a storage or a stream, so there is some logic involved with that decision. It isn't enough to see that a tree node has no children and assume that branch is a stream!

Recording the tree data to a compound file begins when the user clicks the Write button, at which time FileWriter executes CFileWriterDlg::OnWrite(). OnWrite() is shown in Listing 3.

Listing 3. FileWriter's OnWrite() Method

void CFileWriterDlg::OnWrite()

   // Get a filename...

   CFileDialog dlg(FALSE,

                   _T("vcd"),

                   NULL,

                   OFN_HIDEREADONLY |

                   OFN_OVERWRITEPROMPT |

                   OFN_CREATEPROMPT |

                   OFN_EXPLORER,

                   szFilter,

                   this);

   dlg.m_ofn.lpstrTitle = _T("Write to a VCD Compound Data File");

   if ( dlg.DoModal() == IDOK ) {

      HRESULT hr = S_OK;

      try {

         // Do the storage thing...note that Win2K implementations

         // should use StgCreateStorageEx()...

         CComPtr<IStorage> pIStorage;

         USES_CONVERSION;

         TCHAR strPath[MAX_PATH+1] = {0};

         _tcscpy(strPath,dlg.GetPathName());

         hr = StgCreateDocfile(T2W(strPath),

                                   STGM_CREATE |

                                   STGM_DIRECT |

                                   STGM_READWRITE |

                                   STGM_SHARE_EXCLUSIVE,

0,

                                   &pIStorage);

         if ( FAILED(hr) ) throw hr;

         // Doc opened...now run through tree control and add

         // node information.

//

         // First use OLE service to mark this compound file as one we handle

         WriteClassStg(pIStorage, GUID_VCDFileFormat10);

         // Loop thru nodes and write...

         hr = WriteStorage(pIStorage,m_CStgTree.GetRootItem());

         if ( FAILED(hr) ) throw hr;

         // If we got this far, make sure the root node

         // reflects the filename. This clears the "untitled"

         // text from a clean tree, or modifies the name

         // if you save the data to another file...

         m_CStgTree.SetItemText(m_CStgTree.GetRootItem(),dlg.GetPathName());

         // With clean copy loaded, clear the dirty bit

         m_bDirty = FALSE;

         // We now have nothing to write to disk...

         CWnd* pBtn = GetDlgItem(IDC_WRITE);

         pBtn->EnableWindow(FALSE);

      } // try

      catch(HRESULT hrErr) {

         // Error...

         LPVOID lpMsgBuf = NULL;

         ::FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |

                         FORMAT_MESSAGE_FROM_SYSTEM |

                         FORMAT_MESSAGE_IGNORE_INSERTS,

                         NULL,

                         hrErr,

                         MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),

                         (LPTSTR)&lpMsgBuf,

0,

                         NULL);

       // Display the string.

         if ( lpMsgBuf ) {

         AfxMessageBox((LPTSTR)lpMsgBuf,MB_OK | MB_ICONINFORMATION);

         // Free the buffer.

         LocalFree(lpMsgBuf);

         } // if

      } // catch

   } // if

After obtaining a filename, OnWrite() creates the compound file:

// Do the storage thing...note that Win2K implementations

// should use StgCreateStorageEx()...

CComPtr<IStorage> pIStorage;

USES_CONVERSION;

TCHAR strPath[MAX_PATH+1] = {0};

_tcscpy(strPath,dlg.GetPathName());

hr = StgCreateDocfile(T2W(strPath),

                      STGM_CREATE |

                      STGM_DIRECT |

                      STGM_WRITE |

                      STGM_SHARE_EXCLUSIVE,

0,

                      &pIStorage);

if ( FAILED(hr) ) throw hr;

Note that in this case we'll create a new compound file no matter the situation. That is, if there is an existing file, we'll overwrite it. If there is no file, we'll create it. The dialog box used to obtain the filename warned the user if the file already existed on the disk. With a newly created compound file ready to accept our data, we first record the class GUID (version information):

// Doc opened...now run through tree control and add

// node information.

//

// First use OLE service to mark this compound file as one we handle

WriteClassStg(pIStorage, GUID_VCDFileFormat10);

WriteClassStg() is a system API call that really masks the fact you're calling IStorage::SetClass() using the root storage interface pointer. What you're really doing is associating a GUID, which is unique in time and space, with the storage. In effect, you're marking it with a version. And as you've seen, you can read the class GUID and make a determination as to whether you want to handle a particular file or not. In our case, if the class GUID wasn't GUID_VCDFileFormat10, we didn't attempt to load the file.

As with OnLoad(), we next begin the recursive write of the tree control's node information:

// Loop thru nodes and write...

hr = WriteStorage(pIStorage,m_CStgTree.GetRootItem());

if ( FAILED(hr) ) throw hr;

There is something interesting to note here. When structured storage is told to write storages with the same name, it will do so. However, if the storages are empty, only a single storage will be written no matter how many you created. This is important because as you play with FileWriter, you may wonder why storages disappear when you write and then reload a file. If you don't change the name of new storages as displayed in the tree control, or add child storages or streams to those storages, the structured storage subsystem will optimize the saving of the storages to include only one in the disk-based file. This isn't a bug in FileWriter but is rather a feature of compound files.

Assuming the recursive write operation takes place without error, the file has been written. We now have a filename, so we can overwrite the <untitled> (root) node of the tree control:

// If we got this far, make sure the root node

// reflects the filename. This clears the "untitled"

// text from a clean tree, or modifies the name

// if you save the data to another file...

m_CStgTree.SetItemText(m_CStgTree.GetRootItem(),dlg.GetPathName());

We then also take care of some other minor user interface related tasks, such as clearing the dirty bit and disabling the write to file button.

The meat of the write operation takes place in WriteStorage(), which you see in Listing 4.

Listing 4. FileWriter's Recursive WriteStorage() Method

HRESULT CFileWriterDlg::WriteStorage(IStorage* pIStorage, HTREEITEM hParent)

   static iStreamNum = 0;

   HRESULT hr = S_OK;

   try {

      // Check this node type and write accordingly...

      // if it's a storage, we'll have to recurse.

      USES_CONVERSION;

      HTREEITEM hThis = m_CStgTree.GetChildItem(hParent);

      while ( hThis != NULL ) {

      // Iterate over siblings

      switch ( m_CStgTree.GetItemData(hThis) ) {

         case MODE_STORAGE | MODE_STGROOT:

         case MODE_STORAGE:

            { // scope

               // Write this item

               TCHAR strItem[MAX_TEXT+1] = {0};

               _tcscpy(strItem,m_CStgTree.GetItemText(hThis));

               CComPtr<IStorage> pIChildStorage;

               hr = pIStorage->CreateStorage(T2W(strItem),

                                             STGM_CREATE |

                                             STGM_WRITE |

                                             STGM_DIRECT |

                                             STGM_SHARE_EXCLUSIVE,

                                             0,0,

                                             &pIChildStorage);

               if ( FAILED(hr) ) throw hr;

               // Recurse

               hr = WriteStorage(pIChildStorage,hThis);

               if ( FAILED(hr) ) throw hr;

            } // scope

         break;

         case MODE_STREAM:

            { // scope

               // Write this item

               CComPtr<IStream> pIChildStream;

               TCHAR strStreamName[MAX_TEXT+1] = {0};

               wsprintf(strStreamName,_T("Stream%d"),iStreamNum++);

               hr = pIStorage->CreateStream(T2W(strStreamName),

                                            STGM_CREATE |

                                            STGM_WRITE |

                                            STGM_DIRECT |

                                            STGM_SHARE_EXCLUSIVE,

                                            0,0,

                                            &pIChildStream);

               if ( FAILED(hr) ) throw hr;

               // Write the data to the stream

               DWORD dwWritten = 0;

               hr = pIChildStream->Write(m_CStgTree.GetItemText(hThis),

                                         _tcslen(m_CStgTree.GetItemText(hThis)),

                                         &dwWritten);

               // Some form of write error

               if (FAILED(hr) ||

                  (_tcslen(m_CStgTree.GetItemText(hThis)) != dwWritten)) {

                  // Some error...

                  throw STG_E_CANTSAVE;

               } // if

            } // scope

            break;

         } // switch

         // Snag next item

         hThis = m_CStgTree.GetNextSiblingItem(hThis);

      } // while

   } // try

   catch(HRESULT hrErr) {

        // Return the HRESULT...note the calling function

        // OnWrite() will eventually display the error

        // in a message box. We'll simply return the fact

        // there was an error at this level.

        hr = hrErr;

   } // catch

   return hr;

In many ways, WriteStorage() very similar to ReadStorage(), at least in a mirror's image. We write the current storage and recurse any child storages. If we encounter a stream, we write it to the file at the current tree level. There are a few differences beyond the obvious, though, as I'll point out in the code.

The first thing to do is examine the tree node we're given as the parent. The tree control traversal takes the first child of the parent and works with that. Later, we write sibling nodes. The overall flow is much like this:

HTREEITEM hThis = m_CStgTree.GetChildItem(hParent);

while ( hThis != NULL ) {

   // Iterate over siblings

...

   // Snag next item

   hThis = m_CStgTree.GetNextSiblingItem(hThis);

} // while

Inside the while() loop we examine individual tree nodes to see whether we write a storage or a stream. When the tree nodes are created, the lParam value is filled with either MODE_STORAGE or MODE_STREAM. The root tree node also has an additional bit applied, MODE_STGROOT. This isn't for writing as much as it is for editing. If the user tries to edit the root node, we disallow the edit to take place. We'll instead force the user to save the data to a new file using the write file button.

Therefore, we retrieve the lParam value and decide what to do from there:

switch ( m_CStgTree.GetItemData(hThis) ) {

   case MODE_STORAGE | MODE_STGROOT:

   case MODE_STORAGE:

      // Process storages...

      break;

   case MODE_STREAM:

      // Process streams

      break;

} // switch

Let's look first at storages. If the tree node is a storage, we first write this tree node to disk using IStorage::Write():

// Write this item

TCHAR strItem[MAX_TEXT+1] = {0};

_tcscpy(strItem,m_CStgTree.GetItemText(hThis));

CComPtr<IStorage> pIChildStorage;

hr = pIStorage->CreateStorage(T2W(strItem),

                              STGM_CREATE |

                              STGM_WRITE |

                              STGM_DIRECT |

                              STGM_SHARE_EXCLUSIVE,

                              0,0,

                              &pIChildStorage);

if ( FAILED(hr) ) throw hr;

Assuming this node was written to the disk file, we then recurse to gather any child storages and/or streams:

// Recurse

hr = WriteStorage(pIChildStorage,hThis);

if ( FAILED(hr) ) throw hr;

Streams are a bit more difficult, but only because we have to record the stream data along with the stream name. For storages, it is enough to create the storage and give it a name. Streams are like binary files, however. Logically, we need to create a new file, name it, and then put information into the file. Streams follow this same approach. Note writing a stream is a terminal condition for our recursion algorithm. Once we hit a tree node that is a stream, we're done with this level and have no need to recurse deeper into the tree. This seems logical as well because streams don't contain other storages or streams. They simply contain data.

The first thing to do is to create the stream itself:

// Write this item

CComPtr<IStream> pIChildStream;

TCHAR strStreamName[MAX_TEXT+1] = {0};

wsprintf(strStreamName,_T("Stream%d"),iStreamNum++);

hr = pIStorage->CreateStream(T2W(strStreamName),

                             STGM_CREATE |

                             STGM_WRITE |

                             STGM_DIRECT |

                             STGM_SHARE_EXCLUSIVE,

                             0,0,

                             &pIChildStream);

if ( FAILED(hr) ) throw hr;

Here is where we encounter a detail I mentioned earlier. A storage is created using the tree node's text data, but a stream stores the tree node text data. The stream name must come from somewhere. In our case, we create a dummy name using the text StreamXXXX, where XXXX is a static counter value we increment each time we create a stream. It doesn't matter in this case what the stream name turns out to be as we're going to display the contents of the stream when the file is opened. All that matters is the stream is uniquely named within the parent storage.

With the stream created we turn to saving the tree node's text data to the stream:

// Write the data to the stream

DWORD dwWritten = 0;

hr = pIChildStream->Write(m_CStgTree.GetItemText(hThis),

                          _tcslen(m_CStgTree.GetItemText(hThis)),

                          &dwWritten);

// Some form of write error

if (FAILED(hr) ||

(_tcslen(m_CStgTree.GetItemText(hThis)) != dwWritten)) {

   // Some error...

   throw STG_E_CANTSAVE;

} // if

Now that the stream data is safely tucked away in the disk file, we're done with the write operation. As the recursive method wraps up, it will eventually release the root storage COM pointer. At that time the file is closed and our share lock is released. Others can then access the file.

That also completes our introduction to structured storage. With the FileWriter download I've included an application called DFView that is quite handy when working with compound files (note that DFView comes with the Microsoft Platform SDK and can also be found there). As you create files with FileWriter you can examine them using DFView. DFView will show you the file as it exists rather than as I've filtered it for the tree control in FileWriter. DFView is also good for examining other application's compound files, such as those created by Microsoft Word. You may be amazed to see what Word stores in its files.

There is also a good Microsoft article at their MSDN Web site if you're interested in a slightly deeper look at compound files and their creation and makeup. You can find it at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/stg/stg/compound_files.asp.

Hopefully you'll find many uses for this creative and useful technology. Structured storage has helped many developers overcome proprietary storage solutions and saved countless hours in development time. With any luck, you'll find yourself in that same category. We could all use a little time savings these days!

Comments? Questions? Find a bug? Please send me a note!

[Back] [Home]