Signing an XML document using XMLDSIG (Part 1)
This page demonstrates how to create a digital signature in XML. This is a simple [sic] example of an enveloping signature where we sign a straightforward text string inside an XML document.
2012-05-09: For an example of a enveloped signature, see Part 2.
To make a digital signature, you need a private key. Our example uses the 1024-bit RSA private key for Alice from RFC 4134 [SMIME-EX]. We use our CryptoSys PKI Toolkit to carry out the necessary computations. We treat an XML document as a simple text file and avoid using any of those frightful, unwieldy XML "DOM" packages.
We give full details of the exact data to be processed at each stage in order to produce the final signed XML document. We hope this is in sufficient detail to help you implement your own version.
For advanced users:
If this is too simple for you, see our page on
XML-Dsig and the Chile SII
where we look in detail at creating digital signatures in XML documents using the standards for electronic invoices
set by the Servicio de Impuestos Internos (SII) of Chile.
There are some useful hints and generic functions in VB6 to create
<SignedInfo> elements for XML-Dsig.
See How to create a SAT Cancelacion document
an enveloped XML-DSIG document with the namespace
http://cancelacfd.sat.gob.mx issued by the
Servicio de Administración Tributaria (SAT) in Mexico.
See also Accented characters and UTF-8 in XML-DSIG signatures where we look at a simple example to create an XML-DSIG signature of an XML document containing accented characters like áéíóúñ
>>I have some questions related to XML-Dsig: > >Argghh!! Run away! A near-universal reaction.
In this example we create the digital signature for the text
some text with spaces and CR-LF.
That is, the 35 bytes beginning with
's', 'o', 'm',... and ending with
...,'L', 'F', '.'.
There is exactly one CR-LF newline (the two-byte sequence
(0x)0D 0A) in the text,
between the two lines. There are two spaces before the word "with".
There is no newline at the end.
In hexadecimal format, the text is
73 6F 6D 65 20 74 65 78 74 0D 0A 20 20 77 69 74 68 20 73 70 61 63 65 73 20 61 6E 64 20 43 52 2D 4C 46 2E
Output XML file (1 kB).
<?xml version="1.0" encoding="UTF-8"?> <Signature xmlns="http://www.w3.org/2000/09/xmldsig#"> <SignedInfo> <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315" /> <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1" /> <Reference URI="#object"> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1" /> <DigestValue>OPnpF/ZNLDxJ/I+1F3iHhlmSwgo=</DigestValue> </Reference> </SignedInfo> <SignatureValue>nihUFQg4mDhLgecvhIcKb9Gz8VRTOlw+adiZOBBXgK4JodEe5aFfCqm8WcRIT8GL LXSk8PsUP4//SsKqUBQkpotcAqQAhtz2v9kCWdoUDnAOtFZkd/CnsZ1sge0ndha4 0wWDV+nOWyJxkYgicvB8POYtSmldLLepPGMz+J7/Uws=</SignatureValue> <KeyInfo> <KeyValue> <RSAKeyValue> <Modulus>4IlzOY3Y9fXoh3Y5f06wBbtTg94Pt6vcfcd1KQ0FLm0S36aGJtTSb6pYKfyX7PqC UQ8wgL6xUJ5GRPEsu9gyz8ZobwfZsGCsvu40CWoT9fcFBZPfXro1Vtlh/xl/yYHm +Gzqh0Bw76xtLHSfLfpVOrmZdwKmSFKMTvNXOFd0V18=</Modulus> <Exponent>AQAB</Exponent> </RSAKeyValue> </KeyValue> </KeyInfo> <Object Id="object">some text with spaces and CR-LF.</Object> </Signature>
Note that the whitespace inside the
elements is important and should not be changed.
Test with the XML Security Library Online XML Digital Signature Verifer.
Algorithm: XMLDSIG of simple text string.
T, text-to-be-signed, a byte string;
Ks, RSA private key;
OUTPUT: XML file, xml
- Canonicalize* the text-to-be-signed, C = C14n(T).
- Compute the message digest of the canonicalized text, m = Hash(C).
- Encapsulate the message digest in an XML
<SignedInfo>element, SI, in canonicalized form.
- Compute the RSA signatureValue of the canonicalized
<SignedInfo>element, SV = RsaSign(Ks, SI).
- Compose the final XML document including the signatureValue, this time in non-canonicalized form.
* Strictly, what we are doing here is encapsulating the text string T inside an
<Object> element, then canonicalizing that element.
There are two message digests to compute. The input to these two computations has to be exactly correct or you will get the wrong result. We use the SHA-1 message digest function, which outputs a hash value 20 bytes long.
Digest of the input text string
Form the canonicalized
<Object> element with all CR-LF pairs
(0x)0D 0A) in the text converted
to single LF characters (
In this case there is no newline after the text, so the closing tag
comes directly after the '.' character in the text string.
Note we have added the xmlns attribute,
which exists here but not in the original or final document.
This attribute is propagated from the parent
<Object xmlns="http://www.w3.org/2000/09/xmldsig#" Id="object">some text with spaces and CR-LF.</Object>
and compute the message digest of the byte string beginning
'<', 'O', 'b',... and ending
...,'e','c', 't', '>'
000000 3c 4f 62 6a 65 63 74 20 78 6d 6c 6e 73 3d 22 68 <Object xmlns="h 000010 74 74 70 3a 2f 2f 77 77 77 2e 77 33 2e 6f 72 67 ttp://www.w3.org 000020 2f 32 30 30 30 2f 30 39 2f 78 6d 6c 64 73 69 67 /2000/09/xmldsig 000030 23 22 20 49 64 3d 22 6f 62 6a 65 63 74 22 3e 73 #" Id="object">s 000040 6f 6d 65 20 74 65 78 74 0a 20 20 77 69 74 68 20 ome text. with 000050 73 70 61 63 65 73 20 61 6e 64 20 43 52 2d 4c 46 spaces and CR-LF 000060 2e 3c 2f 4f 62 6a 65 63 74 3e .</Object>
The exact byte string in this case to be digested is
DATA= 3C4F626A65637420786D6C6E733D22687474703A2F2F7777 772E77332E6F72672F323030302F30392F786D6C64736967 23222049643D226F626A656374223E736F6D652074657874 0A2020776974682073706163657320616E642043522D4C46 2E3C2F4F626A6563743E Hash(DATA)=38F9E917F64D2C3C49FC8FB5177887865992C20A Base64(Hash(DATA))=OPnpF/ZNLDxJ/I+1F3iHhlmSwgo=
Digest of the SignedInfo
Form the canonicalized
Note the xmlns attribute which we include here, but not in the final document.
This is propagated down from the parent
<SignedInfo xmlns="http://www.w3.org/2000/09/xmldsig#"> <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2001/REC-xml-c14n-20010315"></CanonicalizationMethod> <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#rsa-sha1"></SignatureMethod> <Reference URI="#object"> <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"></DigestMethod> <DigestValue>OPnpF/ZNLDxJ/I+1F3iHhlmSwgo=</DigestValue> </Reference> </SignedInfo>
In hex format, the byte string is
3C5369676E6564496E666F20786D6C6E733D22687474703A2F2F7777772E7733 2E6F72672F323030302F30392F786D6C6473696723223E0A20203C43616E6F6E 6963616C697A6174696F6E4D6574686F6420416C676F726974686D3D22687474 703A2F2F7777772E77332E6F72672F54522F323030312F5245432D786D6C2D63 31346E2D3230303130333135223E3C2F43616E6F6E6963616C697A6174696F6E 4D6574686F643E0A20203C5369676E61747572654D6574686F6420416C676F72 6974686D3D22687474703A2F2F7777772E77332E6F72672F323030302F30392F 786D6C64736967237273612D73686131223E3C2F5369676E61747572654D6574 686F643E0A20203C5265666572656E6365205552493D22236F626A656374223E 0A202020203C4469676573744D6574686F6420416C676F726974686D3D226874 74703A2F2F7777772E77332E6F72672F323030302F30392F786D6C6473696723 73686131223E3C2F4469676573744D6574686F643E0A202020203C4469676573 7456616C75653E4F506E70462F5A4E4C44784A2F492B3146336948686C6D5377 676F3D3C2F44696765737456616C75653E0A20203C2F5265666572656E63653E 0A3C2F5369676E6564496E666F3E
The message digest of this is
Actually, this digest value is not output directly. It is computed and then encrypted as part of the signature value calculation. But to verify the signature you need to be able to re-create it. (Thanks to Marcos Paulo Pereira Brito Garcia for pointing out an error in an early version of this.)
The byte string of the
<SignedInfo> element is input to the
sha1WithRSAEncryption signature algorithm and signed with Alice's private RSA key
to produce the 1024-bit RSA
signatureValue in hex format
9E285415083898384B81E72F84870A6FD1B3F154533A5C3E69D89938105780AE 09A1D11EE5A15F0AA9BC59C4484FC18B2D74A4F0FB143F8FFF4AC2AA501424A6 8B5C02A40086DCF6BFD90259DA140E700EB4566477F0A7B19D6C81ED277616B8 D3058357E9CE5B227191882272F07C3CE62D4A695D2CB7A93C6333F89EFF530B
In base64 this is
nihUFQg4mDhLgecvhIcKb9Gz8VRTOlw+adiZOBBXgK4JodEe5aFfCqm8WcRIT8GL LXSk8PsUP4//SsKqUBQkpotcAqQAhtz2v9kCWdoUDnAOtFZkd/CnsZ1sge0ndha4 0wWDV+nOWyJxkYgicvB8POYtSmldLLepPGMz+J7/Uws=
Comment on SignedInfo
In the composition of the
<SignedInfo> element above, we added some space
characters before the lines, to add to readability. These space characters must be
preserved in both the canonicalized version and the final XML document.
It gets even messier if you use tab characters (0x09) because, if they get changed later into
space characters, you will fail to get the correct signature value.
It is better practice to form the
<SignedInfo> element with no whitespace
before the elements and just a single newline after each line, as follows:
<SignedInfo xmlns="http://www.w3.org/2000/09/xmldsig#"> <CanonicalizationMethod Algorithm="..."></CanonicalizationMethod> <SignatureMethod Algorithm="..."></SignatureMethod> <Reference URI="..."> <DigestMethod Algorithm="..."></DigestMethod> <DigestValue>...</DigestValue> </Reference> </SignedInfo>
Note, though, that this will give a different signature value than our example above. If, at this stage, you are thinking, "But isn't that a rather stupid procedure if it can be messed up so easily?", you would not be wrong...
Canonicalization is a method for generating a physical representation, the canonical form, of an XML document that accounts for permissible syntactic changes.
In other words, no matter what (legal) changes you could make to a given XML document, the canonical form will always be identical, byte-for-byte.
The cute abbreviation for canonicalization is c14n denoting that there are 14 characters between the "c" and the "n" in a word that is obviously too long to begin with.
Note that the canonicalized data does not appear in the original or final XML document. It is composed in memory and a message digest or RSA signature value is computed from it.
This is the official (2001) outline of the procedure for c14n, taken from [XML-C14N]:
- The document is encoded in UTF-8
- Line breaks normalized to #xA on input, before parsing
- Attribute values are normalized, as if by a validating processor
- Character and parsed entity references are replaced
- CDATA sections are replaced with their character content
- The XML declaration and document type declaration (DTD) are removed
- Empty elements are converted to start-end tag pairs
- Whitespace outside of the document element and within start and end tags is normalized
- All whitespace in character content is retained (excluding characters removed during line feed normalization)
- Attribute value delimiters are set to quotation marks (double quotes)
- Special characters in attribute values and character content are replaced by character references
- Superfluous namespace declarations are removed from each element
- Default attributes are added to each element
- Lexicographic order is imposed on the namespace declarations and attributes of each element
Simple, eh? You may search in vain for the exact meanings of the word normalized in some of the above statements, and good luck with superfluous namespace declarations and default attributes. Get just one thing wrong here and your signature validation will fail.
To make it even worse, the rules above are for a complete XML document.
When you are canonicalizing a Subset of a document, like we are doing here,
you have to propagate the xml namespaces from the parent elements
that have been omitted (unless you are using Exclusive XML Canonicalization (xml-exc-c14n), which we are not!).
The merged xmlns attributes then have to be sorted in a certain order.
In this example, the
inherit the xmlns attribute from their omitted parent
In our example here, it was sufficient just to replace any CR-LF line break with a single LF (0x0A) character (point 2 above). All other issues were dealt with by simply hardcoding the necessary XML tags and attributes in our variable strings.
Other c14n issues
Given a simple text string input, and the fact that we are composing our own XML document instead of dealng with an existing one, the two other issues that we are most likely to have to deal with are UTF-8 encoding (point 1 above) and entity references (point 4):
- UTF-8 encoding
- If our text-to-be-signed string, T, contains any non-ASCII characters,
make sure these are converted to UTF-8 encoding.
For example, the character á (small letter a with acute accent) is encoded in the ISO-8859-1 character set (Latin-1) as the single byte value 225 (0xE1). This is not an ASCII character, as it has a value greater than 127. Such characters need to be converted to UTF-8 encoding. In this case, the byte
0xE1must be represented as the two-byte UTF-8 sequence
(0x)C3 A1. In CryptoSys PKI, use the
CNV_UTF8BytesFromLatin1function to convert a string containing Latin-1 characters to proper UTF-8.
- Entity references
- There are five predefined entities in XML:
- the ampersand (&),
- the less than symbol (<),
- the greater than symbol (>),
- the quotation mark or double quote ("), and
- the apostrophe or single quote (')
& < > " 'respectively. This only applies to characters inside an element's content, not the tags themselves.
So, for example, the 8-byte string
(0x)3C783E263C2F783E) is transformed to the 12-byte string
These two issues should cover almost all instances for a simple text string. However, we do not doubt that the complexity of the C14N procedure conceals other traps.
- [XML-C14N] RFC 3076 Canonical XML Version 1.0, March 2001, <http://www.ietf.org/rfc/rfc3076.txt>.
- [XML-DSIG] RFC 3275 XML-Signature Syntax and Processing, March 2002, <http://www.ietf.org/rfc/rfc3275.txt>.
- [SMIME-EX] RFC 4134 Examples of S/MIME Messages, July 2005, <http://www.ietf.org/rfc/rfc4134.txt>.
- XML Signature WG
- XML-Signature Syntax and Processing <http://www.w3.org/TR/xmldsig-core/>
- Canonical XML Version 1.0, <http://www.w3.org/TR/2001/REC-xml-c14n-20010315/>
- Exclusive XML Canonicalization Version 1.0, <http://www.w3.org/TR/2002/REC-xml-exc-c14n-20020718/>
- [GUTM04] Peter Gutmann Why XML Security is Broken, October 2004, <http://www.cs.auckland.ac.nz/~pgut001/pubs/xmlsec.txt>.
For more information, or to comment on this page, please send us a message.
This page last updated 6 October 2012