The eXtensible Markup Language--A C++ Developer's Primer

By: Kenn Scribner for Visual C++ Developer


Download the source for Part III

Background: I started looking into XML and XSL in 1998 when the initial betas of IE 5 were coming out (in fact, Microsoft released to us a beta of MSXML even before our IE 5 beta). We were developing a client-server application with a thick client that would send processed data to the server for analysis. (In fact, it was our own version of SOAP, though we didn't call it that at the time.) When Kate (Kate Gregory, the editor of VCD) contacted me and asked for an article on XML, I invited myself in for a series of 4! At that time I was completing the SOAP book and wanted to lead the VCD readers through an introduction to XML right to SOAP and Biztalk. That was the goal, anyway...give these articles a read and see if I hit the mark.

The eXtensible Markup Language--A C++ Developer's Primer

Part I, XML: A C++ Developer's Primer
(Part II, The DOM and XSL)
Part III, SOAP
(Part IV, Biztalk)

Had I written this article a month earlier than I did, this installment on SOAP would be very different than it is today. Why? There's a new SOAP specification, version 1.1, that changes (or enhances, depending upon your point of view) the technology. Instead of saying "SOAP is this," I now have to say "SOAP was originally designed for (this use), but you can also do (other things) with it as well." Clearly, this is a good thing! But to make SOAP concepts a bit clearer, I'll often fall back to SOAP's original intent. I'll leave it to you to extrapolate to what other uses you can put the protocol! (That's half of the fun, after all.)

So let's start with that—SOAP's original intent. Have you ever tried to get a couple thousand DCOM clients talking to a single server? Was your architecture/implementation successful? In all likelihood, if you scaled an architecture that involved very many DCOM clients and servers on the same network, you'd find that it was great for a relatively low number of systems but would fail as you increased the number of nodes. The reason for this ultimately resides with DCOM itself. It was designed to provide nearly seamless remote COM object access without changing the way you write COM components. It wasn't designed to scale, and, in fact, it does not. That doesn't mean that DCOM is in any way bad—it's a good and useful tool to keep in your arsenal.

SOAP's intent was to provide a highly scalable distributed protocol that was at the same time vendor agnostic. That is, SOAP would allow you to send and receive data with a huge number of consumers while not tying you down to a particular vendor's operating system or hardware platform. You can use SOAP from mainframes just as easily as from palmtop computers. Ultimately, SOAP was more narrowly scoped to be an RPC protocol. RPC stands for Remote Procedure Call, and essentially SOAP identifies a way to encode a given method's arguments to be transmitted to a remote server for processing. In the strictest sense, SOAP specifies a wire protocol.

SOAP was not designed to be a distributed object platform, like DCOM or CORBA. Those systems use their wire protocols as but a part of the overall architecture. They provide much more capability than what the wire protocol affords, such as remote object activation, garbage collection (orphaned objects from broken connections), and security, to name a few. SOAP merely specifies how two objects talk over the wire. How they're activated, how you find remote objects, and how you secure your remote communications is completely out of SOAP's scope. SOAP was meant to serve as a lower-level layer; these facilities would need to be provided by higher-level layers.

So when you unravel the hype and really look at SOAP, what you see is a protocol specification, much like the specification for TCP/IP or HTTP. The SOAP specification tells you how to format the data within a packet and tells you how to send the packet, or at least suggests the most common method.

The encoding is, not surprisingly, in XML. The XML is shipped to a remote system commonly using HTTP, though in SOAP 1.1, other protocols are certainly within the realm of possibility. (This was the biggest single change between the 1.0 and 1.1 specifications, as HTTP was mandated in the 1.0 specification.) The beauty of this is that there are XML parsers for just about every operating environment available—even cellular phones—and the only thing more commonly used is HTTP. Neither technology is under the governance of a single vendor. In fact, both specifications are publicly debated and updated to ensure no single vendor dominates the given technology. Another benefit is that both technologies are essentially text-based, which means nearly anything that can process text can process SOAP information.

Before I dive into the nuts and bolts of SOAP, first let me reiterate the thought that SOAP isn't tied to HTTP anymore, at least with the release of the draft 1.1 specification. There are aspects of the new SOAP specification that make it easier for you to incorporate SOAP into a messaging protocol, say SMTP, or do even more exotic things. For the purposes of this article, I'm going to stick to good old HTTP and assume the data we're interested in sending will be remoted using HTTP, which infers a request/response model. This is most like an RPC scenario anyway—you request the information from a remote object, which in turn returns its response to you.

Using SOAP
There are any number of ways you can use SOAP, but the two most commonly found today are by delegation and by interception. When you delegate a SOAP call, you know you're using SOAP and you orchestrate the construction of the remote call. Perhaps you use an object designed to make the SOAP call for you, one that's most likely given some generic structure that contains the information the server requires to perform its task, or just as likely you create the SOAP packet yourself from within your object. The converse to this is interception, where some SOAP-enabled infrastructure steals the information from your method's call stack and, unbeknownst to you, ships it off to a remote server on your behalf. The remote server chews on the information and returns it to the infrastructure, which then, in turn, returns it to you without informing you that the method's call result(s) were computed off-system (this is the model DCOM takes).

I mention this because I've run into people who confuse SOAP, the architecture, with objects that implement the SOAP specification. How you communicate using SOAP is an implementation detail, whether it's by delegation or by interception. The fact that the data is encoded and transmitted in accordance with the SOAP specification is what's important here.

The SOAP packet
So let's get into SOAP itself. At the top level, SOAP describes a way to take information and put it into an XML document, which is then rolled into an HTTP packet to be shipped to a remote system. If you imagine SOAP as an onion, or perhaps some kind of Venn diagram, it would look very much like what you see in Figure 1.

The HTTP packet contains the SOAP Envelope, which itself is composed of a header and a body. The SOAP Header is optional—the SOAP Body isn't, at least with the standard SOAP encoding rules in place (you can change those, if you desire).

In textual form, a typical SOAP packet would look like this:

POST /SomeAction HTTP/1.1
 HOST: www.my-url.com
 Content-Type: text/xml
 Content-Length: nnnn
 SOAPAction: SomeAction
 
 <SOAP-ENV:Envelope
   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/_
 envelope/"
   SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/_
 soap/encoding/">
     <SOAP-ENV:Header>
        <!-- The SOAP header element is optional... -->
     </SOAP-ENV:Header>
 
     <SOAP-ENV:Body>
         <!-- Serialized object information... -->
     </SOAP-ENV:Body>
 
     <!-- Optional sub-elements... -->
 </SOAP-ENV:Envelope>
 

The HTTP header is relatively unchanged from a typical HTTP header, with the exception of the SOAPAction header. SOAPAction allows firewalls and filters access to the SOAP information contained within the envelope without having to parse the envelope. A more realistic SOAPAction header might look like this:

SOAPAction: "{00000000-0000-0000-C000-000000000046}#AddRef"
 

This might tell a COM-aware SOAP server that the AddRef() method of IUnknown was being invoked. The server can do with that information what it will, but usually you'll make sure the action being requested is one you truly want to grant. You can do more interesting things with the HTTP headers, such as using M-POST (after first trying POST) from the HTTP Extension Framework. There isn't much more to the HTTP side of SOAP, so let's concentrate on the more interesting aspect from an XML perspective—the HTTP packet's payload.

SOAP and remote method invocation
SOAP is all about interpreting a method's call stack and passing the method argument information you find there to a remote server so that the remote server can simulate the method on its side of the network. Many would argue, and correctly so, that SOAP is good for more than that. But when all is said and done, that's what SOAP was originally designed to do. So what is a call stack, and how do you get one?

To answer the second question first, call stacks are generated for you when you compile your source code. In fact, that's one of the major goals of the compiler—to turn an arbitrary logical representation of an algorithm (your source code) into machine code that the computer can actually execute. When you make a method call, whether as part of an object in C++, in Visual Basic, or even as a function in C or assembly language, the computer's processor will execute some sort of subroutine execution instruction after first placing a return address on its stack. Compiled languages that allow you to pass arguments will place those arguments on the stack, in some predetermined order, prior to issuing the subroutine execution instruction. The stack itself is simply a convenient place to store temporary information (if you're unfamiliar with the concept of a stack, I'd recommend reviewing any good data structures text). Once constructed, and just after the processor's subroutine execution instruction places the return address on the stack, the stack looks like Figure 2.

This particular stack is for C++ method calls; other languages might use similar stack arrangements, or they might not. The "implicit object pointer" you see is the C++ this pointer you get for free when you use C++ classes! The reason you don't need to specify the this pointer when you call object methods is the compiler shoves it on the subroutine call stack for you. Following the implicit object pointer, you'll find any arguments particular to the method you've invoked.

Normally what will happen, at least for COM and C++, is that the called subroutine (method) will pull the arguments from the stack and act upon them. Since the compiler knew what those arguments were (integers, floats, pointers, and so on), it inserts code to remove the argument(s) from the stack as they're consumed. The argument values are then used within the method as appropriate.

This is how SOAP interception works. Somewhere there's a call stack sitting in memory, and the SOAP infrastructure hauls off and interprets the method arguments. But many people don't realize that this is also the same model you use when you call a remote SOAP method by delegation. It's just that in the case of delegation, you interpret and pass the method arguments instead of letting the compiler or the SOAP infrastructure do it for you. The process is logically the same in both cases. It's more a matter of who does the argument interpretation rather than how the interpretation is performed.

By now, some of you might be wondering just where I'm going with this. What about XML? What's all of this call stack stuff? The answer lies with the SOAP Body. Let's first describe it by example. Imagine you have a method called Foo() that comes from a CBar object:

bool CBar::Foo(int iFooArg);
 …
 CBar bar;
 bool b = bar.Foo(27);
 

Ignoring the optional SOAP and HTTP Headers for the moment, which I'll often do from now on in example code, the Foo() method would be encoded as follows:

<SOAP-ENV:Envelope
   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/_
 envelope/"
   xmlns:m="http://www.myurl.com/cbar/schema"
   SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/_
 soap/encoding/">
     <SOAP-ENV:Body>
         <m:Foo>
             <iFooArg>27</iFooArg>
         </m:Foo>
     </SOAP-ENV:Body>
 </SOAP-ENV:Envelope>
 

This XML document would be shipped off to a remote server somewhere, processed, and the result would be returned:

<SOAP-ENV:Envelope
   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/_
 soap/envelope/"
   xmlns:m="http://www.myurl.com/cbar/schema"
   SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/_
 soap/encoding/">
     <SOAP-ENV:Body>
         <m:FooResponse>
             <return>1</return>
         </m:FooResponse>
     </SOAP-ENV:Body>
 </SOAP-ENV:Envelope>
 

In this case, the value given the Foo() method (27) is rolled into a SOAP Envelope (a process known as serialization) and shipped to the remote server for processing. Note that you've identified the method name and the argument name. The actual object in question must either be implicit (both you and the server agree all Foo() calls will be handled by a specific object on the server), or explicit, in which case you'll have to pass along some additional information necessary to identify the object you want to process the Foo() call (this is one purpose for the SOAP Header). The return address, of course, is the return URL you provided the server when you shipped off the HTTP packet. So all of the information you'd find in a local method's call stack is present, or can be present, in a SOAP packet destined for a remote site. This is important because the remote site needs to have enough information to reconstruct the method's call stack in some manner. The remote server will take the SOAP information you provide and activate a local object on your behalf to process the method argument information you provided.

SOAP XML encoding
All of that probably flew by rather quickly, so let's step back and look at SOAP from an XML standpoint. The SOAP specification (http://www.msdn.microsoft.com/xml/general/soapspec.asp) specifies three main XML tags:

The Envelope forms the XML document and contains the Header and Body, as well as other elements you insert (all of which must be namespace qualified).

The SOAP Header is a free-for-all element into which you can stick information relative to your method call, such as an object identifier of some kind (like the CLSID or ProgID if a COM object). Other things you'll commonly find in the Header include a transaction identifier and security information, if any.

The Header is self-contained. That is, you could reach into the Envelope and pull out the Header, and no element within the Header should refer back to the Envelope in any way. This allows for more robust automated header parsing.

The SOAP specification outlines a few attributes you can use nearly anywhere, but they're primarily directed toward Header elements. These attributes include the following:

Use SOAP-ENV:root to declare which XML node is the most important among sibling nodes with no other information as to which one to start with or is otherwise more important. If you have a chain of header information, use this attribute to identify the first node in the chain. (Similarly, you could use this attribute to identify the head node of a serialized linked list in the Body.)

SOAP-ENV:mustUnderstand is an important attribute, as it tells the server it must understand and be capable of processing the information within the element or it's required to return a fault to the caller.

SOAP-ENV:actor identifies information that's important for intermediate message processing when the SOAP Envelope passes from message recipient to message recipient, and ultimately to its final destination. Intermediaries are supposed to interpret the Header element and remove it. They may reinsert a new Header element. This isn't used with HTTP, but it's important when using SOAP in a message-based scenario. What information goes into a Header element anointed with this attribute is up to you.

SOAP-ENV:encodingStyle identifies a URL that holds your serialization rules. It's optional, and if you leave it off, you're not telling the recipient how you encoded the contents. You can identify this style on the Header alone, but it's more common to identify a style for the entire envelope instead. You'll see an example of this style in use shortly.

The layout of the Header is arbitrary unless you've provided a schema that says otherwise. That is, there's no particular Header formatting requirements for SOAP specification. If you want to constrain the Header a bit by providing a schema, feel free to do so. The one rule you must follow is that the Header must be the first element to follow the Envelope's opening tag, if you insert a header.

The SOAP Body does have certain constraints placed upon it by the SOAP specification. The SOAP Body must be the first element to follow the Envelope opening tag, unless there's a Header element, in which case the Body follows the Header. Any user-defined elements that become sibling nodes to the Header and Body must follow the Body element.

XML elements that are immediate children of the SOAP Body element are called independent elements in the SOAP specification. Children of independent elements are called embedded elements. The first independent element to follow the Body element's opening XML tag is the method element itself. The embedded elements you find in the method element represent the method arguments in precisely the same order you find them in the method's signature (essentially, the order and type of arguments as laid out in an object's method definition within a C++ header file, for example).

Some arguments are passed by value, such as iFooArg in the previous example. The value 27 was passed directly to the method. In SOAP terms, iFooArg is known as a single-reference accessor. It's accessed only once, at least as far as argument references go. But imagine this scenario:

bool CBar::Foo2(int* piFooArg);
 …
 CBar bar;
 int x = 41;
 bool b = bar.Foo2(&x);
 

In this case, the value passed to Foo2() is actually a pointer to an integer that happens to contain the value of 41. The value 41 isn't passed—a pointer to the integer variable is passed instead, and that pointer is later used to access the integer to pull out the value of 41.

In SOAP terms, this is called a multi-reference accessor, and it's encoded (serialized) slightly differently than the serialized value of 27 you saw previously:

<SOAP-ENV:Envelope
   xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/_
 envelope/"
   xmlns:xsi="http://www.w3.org/1999/XMLSchema/instance"
   xmlns:m="http://www.myurl.com/cbar/schema"
   SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/_
 soap/encoding/">
     <SOAP-ENV:Body>
         <m:Foo2>
             <x href="#arg"/>
         </m:Foo2>
         <m:piFooArg id="arg" xsi:type="int">
             41
         </m:piFooArg>
     </SOAP-ENV:Body>
 </SOAP-ENV:Envelope>
 

The href/id pair acts as a pointer. The variable x is referenced from within the method element, where it indicates the data is serialized in the XML element containing the corresponding id attribute. In this case, this particular element follows the method element (<m:piFooArg/>), but the only requirement is that it must be contained within the SOAP Body element and be a sibling of the method element.

The target (independent) element m:piFooArg not only contains the id attribute, but it also contains an attribute that indicates the type of the argument data, which in this case is "int." You'll need to specify the data type contained within multi-reference elements unless your schema defines their data type for you.

Note that the embedded element has no namespace qualifier, yet the independent elements are both qualified by a namespace. This differs in interpretation slightly from standard XML. In SOAP's case, embedded elements are assumed to refer to the namespace of the enclosing independent element, so no namespace qualification is required.

You might also (correctly) argue that the variable x is used only once, even though it's passed to the method by reference. This is true, and you're free to optimize this particular case if you desire to serialize the value 41 as a single reference value, just as the value of 27 was encoded previously. One slight difference is that the element tag name would be x, though, instead of the argument name assigned in the method signature, piFooArg:

<SOAP-ENV:Body>
     <m:Foo2>
         <x>41</x>
     </m:Foo2>
 </SOAP-ENV:Body>
 

The serialization code would need to be aware of this optimization and take the appropriate measures to recreate the by-reference value instead of assuming the argument data was passed by-value.

Once you understand the difference between SOAP's single-reference and multi-reference serialization formats, understanding the serialization of specific data becomes a bit easier. SOAP defines serialization formats for all sorts of data types, from integers and floats, to dates, strings, enumerations, structures, and arrays. Let's now turn to these serialization formats.

Serializing simple data types
To SOAP, a simple data type consists of a value without named parts, such as an array or structure would have. Strings are also considered a simple type, even though we think of them as arrays of characters, but I'll describe their encoding scheme in the next section. You've already seen two examples where an integer value was passed by value and by reference. Practically speaking, the simple types are strings, integers of various sorts (short, long, and so on), floating point values (floats and doubles), and date/time values. The more precise answer is that the SOAP simple types are identical to the built-in data types contained within the XML Schema Specification (see Part 2 of this series).

Serializing simple types usually involves a text conversion operation, such as itoa(). Date and time conversion is a bit more complex, as you'll need to follow the guidelines contained within the XML Schema Specification. However, to show an example, consider the date May 1, 2000, at 9:30 in the morning. This would be serialized as 2000-05-01T09:30:00-06:00. "May 1, 2000" was turned into "2000-05-01". "9:30 (am)" was turned into "09:30:00-06:00", which says the local time is 09:30:00, which differs from Greenwich Mean Time by six hours (-06:00). The T merely separates the date and time values.

Serializing strings
Strings, at least in C++, are usually considered arrays of characters, which means they're not an intrinsic C++ data type. However, to SOAP, they are intrinsic. Their encoding usually takes the form of a multi-reference accessor, though you may optimize the serialization as I mentioned previously.

For example, consider this code:

int Len(char* s);
 char szString[] = {"Hello, World!"};
 …
 int iLength = Len(szString);
 

Presumably, the Len() method will return the length of the string, in characters (yes, you'd normally use strlen() or a derivative, but this is to demonstrate a point). The SOAP Body serialization for the Len() method call would look like this:

<SOAP-ENV:Body>
     <m:Len>
         <s href="#str"/>
     </m:Len>
     <m:s id="str" xsi:type="string">
         Hello, World!
     </m:s>
 </SOAP-ENV:Body>
 

You can optimize this in this particular case by serializing the string data within the method element:

<SOAP-ENV:Body>
     <m:Len>
         <s>
             Hello, World!
         </s>
     </m:Len>
 </SOAP-ENV:Body>
 

And to reiterate, the reason you can make this optimization is that the string is referenced only once in the method signature, making the additional linking overhead (href/id) unnecessary. As with any optimization, however, you're not required to serialize the string in this fashion, so if you write code that takes care of the general case, the remote servers should be capable of managing the string's deserialization.

Serializing enumerations
When you compile code, enumerated values are converted into numerical values in the compiled code. That is, when you declare an enumeration and use one of the enumerated values, it's the same as using a number instead. For example, consider this enumeration:

enum Days { Sunday = 0, Monday, Tuesday, Wednesday, 
             Thursday, Friday, Saturday };
 

In this case, Sunday represents the value zero, Monday represents the value one, and so on. If a method accepts an enumerated value, it should accept either the enumeration or a numerical representation. For example, these two methods are equivalent:

void SetDay(Days day);
 SetDay(Wednesday);
 SetDay((Days)3); // note the cast
 

To SOAP, however, there is a difference, and you should always serialize the enumerated value. That is, for both of the SetDay() examples I just gave, this is the proper SOAP encoding:

<SOAP-ENV:Body>
     <m:SetDay>
         <day>
             Wednesday
         </day>
     </m:SetDay>
 </SOAP-ENV:Body>
 

As you can see, serializing the simple data types isn't too bad. Things get more interesting, however, when you deal with compound data types.

Serializing compound data types
SOAP considers any data type that has "named parts" to be a compound data type. Examples of compound data types include structures and arrays. If the data within the compound data type is accessed by its ordinal value, then the information is an array. If, on the other hand, the data is accessed by diving right to a given named value (and no other values have the same name), then that data type as a whole is considered a struct.

Serializing structs
Let's start with structures—they're a bit easier to serialize. A structure is really a conglomeration of other single-access or multi-access data types, and you serialize it as such. For example, consider this structure and method:

typedef struct tagPartInfo {
     int iPartNum;
     char* szPartDesc;
 } PARTINFO;
 
 bool SetPartInfo(const PARTINFO& pPartInfo);
 …
 PARTINFO p = {15,"Nutrino Destabilization Clarifier"};
 bool bReceived = SetPartInfo(p);
 

If you serialize this using SOAP, this is what you should wind up with:

<SOAP-ENV:Body>
     <m:SetPartInfo>
         <p href="#struct"/>
     </m:SetPartInfo>
     <m:pPartInfo id="struct">
         <iPartNum>15</iPartNum>
         <szPartDesc href="#desc" />
     </m:pPartInfo>
     <m:szPartDesc id="desc" xsi:type="string">
         Nutrino Destabilization Clarifier
     </m:szPartDesc>
 </SOAP-ENV:Body>
 

I'm sure by now you're seeing the pattern. You first serialize the method argument, and then you serialize any multi-reference data, all of which is contained within the SOAP Body. And as with the previous encoding examples, you can optimize the serialization to look like this, since the structure is referenced only once:

<SOAP-ENV:Body>
     <m:SetPartInfo>
         <p>
             <iPartNum>15</iPartNum>
             <szPartDesc>
                 Nutrino Destabilization Clarifier
             </szPartDesc>
         </p>
     </m:SetPartInfo>
 </SOAP-ENV:Body>
 

Essentially, serializing structures combines the serialization techniques for both single and multi-reference accessors. The data itself can get much more complicated, but the serialization appears pretty much as you see it in the examples. Arrays get a little more complicated, primarily because there are special cases and exceptions. Let's look at array serialization next.

Serializing arrays
Before I discuss array serialization, though, consider these arrays—they force the special cases I'll mention later:

int arr1[] = {0,1,2,3}; // complete array
 …
 int arr2[4]; 
 arr2[2] = 19; // partially filled array
 …
 int arr3[1000000]; // large ("sparse") array…
 arr3[2031] = 104; // …that is partially filled
 arr3[9988] = 7;
 

The first array, arr1, has four elements, all of which contain data. Let's provide this array to an imaginary function Add():

int Add(int* pAddends);
 …
 int iSum = Add(arr1);
 

The SOAP encoding for this would be:

<SOAP-ENV:Body>
     <m:Add>
         <arr1 href="#array"/>
     </m:Add>
     <SOAP-ENC:Array id="array" 
 SOAP-ENC:arrayType="xsd:int[4]">
         <xsi:int>0</xsi:int>
         <xsi:int>1</xsi:int>
         <xsi:int>2</xsi:int>
         <xsi:int>3</xsi:int>
     </SOAP-ENC:Array>
 </SOAP-ENV:Body>
 

The array data is contained within a multi-reference accessor denoted by the SOAP-ENC:Array tag name (you can use pAddends if you use a schema and indicate that the element derives from SOAP-ENC:Array, but it's just as easy to simply use SOAP-ENC:Array as the tag name). Note that the array is sized to be four (integer) elements, and all four elements were serialized individually.

The elements don't have tag names in the usual sense—they use their data type identifiers instead. This underscores the concept that array data is accessed by ordinal value rather than by a named member—these members don't have unique names to access.

Now consider the case where you want to add the elements you find in arr2, ignoring for the moment that some elements weren't initialized (if you really added the elements in arr2, you should get a random number based upon what trash was left in the memory used by arr2):

iSum = Add(arr2);
 

In this case, only one element of arr2 was established (the third element). Instead of sending the other array elements to the server (they're garbage anyway), you should serialize the (good) elements using the SOAP-ENC:offset attribute:

<SOAP-ENV:Body>
     <m:Add>
         <arr2 href="#array"/>
     </m:Add>
     <SOAP-ENC:Array id="array"
       SOAP-ENC:arrayType="xsd:int[4]"
       SOAP-ENC:offset="[2]">
         <xsi:int>19</xsi:int>
     </SOAP-ENC:Array>
 </SOAP-ENV:Body>
 

The offset is zero-based and indicates at what element the serialized values begin. It's assumed that if you have multiple values serialized, they follow consecutively from the first array element (the one placed into position by the offset attribute). And finally, the values that aren't serialized are assumed to be some default value, which you can safely assume is zero or NULL (other default values can certainly be used, but these are the two most common default values).

The final array example is that of a sparse array. Sparse arrays are large arrays that contain relatively few elements. How large is large and how few is few depends upon the application, but in any case, the array is encoded into a SOAP packet in the same fashion. To see the serialization, imagine you're adding the array elements given in arr3:

iSum = Add(arr3);
 

Here, only two elements out of 1000000 were initialized, so, as you might expect, only those two values need to be sent to the server. Unfortunately, they're not contiguous so you can't use SOAP-ENC:offset. To serialize them, use the SOAP-ENC:position attribute:

<SOAP-ENV:Body>
     <m:Add>
         <arr3 href="#array"/>
     </m:Add>
     <SOAP-ENC:Array id="array" 
 SOAP-ENC:arrayType="xsd:int[1000000]">
         <xsi:int SOAP-ENC:position="[2031]">104</xsi:int>
         <xsi:int SOAP-ENC:position="[9988]">7</xsi:int>
     </SOAP-ENC:Array>
 </SOAP-ENV:Body>
 

As with partially transmitted arrays, you can safely assume the unspecified array element values are some predefined default value.

Of course, you can get much more complicated than this. You can have multidimensional arrays, arrays of structures, structures of structures that contain arrays, and so on. And depending on the situation, you might be able to optimize the encoding (you've seen several examples so far). I recommend a quick review of the SOAP specification for a few other serialization examples you might find helpful when dealing with more complicated scenarios.

Other SOAP serialization issues
There are a number of other issues you should be familiar with in SOAP serialization; they're detailed in the following sections.

SOAP byte arrays
The first issue I'd like to mention is how to encode byte arrays. A byte array is nothing more than a memory buffer you want to serialize. It doesn't matter what kind of data is in the buffer—all that matters is that you want to send a chunk of memory from here to there, and you need a way to encode it. SOAP specifies the use of Base 64 encoding in this case.

It turns out that many firewalls, HTTP proxies, and protocol handlers consider certain characters to be special, like carriage returns, line feeds, and periods. If you simply take a chunk of memory and stuff it into a string, for example, and then serialize the string, you must be concerned about other systems molesting your data (they do that, and it can be irritating). Base 64 encoding transforms your buffer data into a format that's safe for transmission over the Internet. In fact, you probably use it anytime you send e-mail notes with attachments—it's the recommended encoding standard for binary data transmission via SMTP.

I'll leave the details of the encoding algorithm to RFC 2045, Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies, which you can easily find on the Web (it does a particularly good job of describing the algorithm, by the way). Essentially, though, all you're doing is character substitution. I've included a sample program I wrote to perform Base 64 encoding in the accompanying Download file , so feel free to try it out. (If you decode the encoded string you find in the SOAP specification, you'll find that the specification's authors encoded it incorrectly!)

In any case, imagine you wanted to encode this memory buffer as a SOAP byte array (it represents the string "Hello, World!"):

0000: 48 65 6C 6C 6F 2C 20 57
0008: 6F 72 6C 64 21 00
 

The array would be serialized in the SOAP Body as an independent element:

<m:tagName id="someid" xsi:type="SOAP-ENC:base64">
      SGVsbG8sIFdvcmxkIQ==
 </m:tagName>
 

Note that the xsi:type attribute's value is SOAP-ENC:base64, and that the data contained within the tag is Base 64 encoded data.

Default values
I mentioned this when I discussed arrays, but SOAP itself can't know what default values to apply to a given method's argument—those are clearly dependent upon the method itself. SOAP can, however, indicate that a default value is to be applied. It does so in a very efficient manner—the argument is ignored, and nothing is serialized in its place. Therefore, the lack of a serialized argument isn't an error but rather an indication that the server-side code should apply the default value for that argument.

Polymorphic accessors
This is a fancy way of saying that variant data is encoded in SOAP in a special manner. Actually, it isn't that cosmic! If you apply an xsi:type to a given embedded element, that element (as an accessor) is considered polymorphic. In plain English, what that means is this: Programming languages sometimes have both type variant and type invariant data. For example, in C++, the long integer is type invariant:

long iSomeIntegerValue = 0;
 

In C++, a long is a long is a long. However, if we turn this around and use a VARIANT (from our COM programming toolbox), we know the variant data type is really a union that can contain a wide range of data types. Its discriminator indicates the particular data type it contains.

VARIANT var1;
 var1.vt = VT_I4; // "vt" is the discriminator
 var1.lVal = 548;
 

You can easily convert variant data to another data type by coercion:

VariantChangeType(&var1,&var1,0,VT_R8);
 

This will convert the long data in var1 into its equivalent double representation, which will be placed back in var1. The fact that the same data can be represented in a number of different ways is where the polymorphism term comes from.

The same variant data would be encoded in SOAP in this fashion:

<m:tagName xsi:type="int">
     548
 </m:tagName>
 

The key is the inclusion of the xsi:type attribute, which wouldn't normally be necessary (especially if this particular element were outlined in a schema). The fact that this attribute is serialized with the data indicates this data is or can be polymorphically accessed, but it went in as an integer value.

Wrapping up
This wraps up the SOAP serialization process. There are more details, but this should give you a start. Part 4 of this series will deal with SOAP's response to the method call, including what happens when there's an error of some kind. I'll also take a look at BizTalk in the next installment.

Comments? Questions? Find a bug? Please send me a note!


[Back] [Left Arrow] [Right Arrow][Home]