ChatGPT解决这个技术问题 Extra ChatGPT

Why is XmlNamespaceManager necessary?

I've come up kinda dry as to why -- at least in the .Net Framework -- it is necessary to use an XmlNamespaceManager in order to handle namespaces (or the rather clunky and verbose [local-name()=... XPath predicate/function/whatever) when performing XPath queries. I do understand why namespaces are necessary or at least beneficial, but why is it so complex?

In order to query a simple XML Document (no namespaces)...

<?xml version="1.0" encoding="ISO-8859-1"?>
<rootNode>
   <nodeName>Some Text Here</nodeName>
</rootNode>

...one can use something like doc.SelectSingleNode("//nodeName") (which would match <nodeName>Some Text Here</nodeName>)

Mystery #1: My first annoyance -- If I understand correctly -- is that merely adding a namespace reference to the parent/root tag (whether used as part of a child node tag or not) like so:

<?xml version="1.0" encoding="ISO-8859-1"?>
<rootNode xmlns="http://example.com/xmlns/foo">
   <nodeName>Some Text Here</nodeName>
</rootNode>

...requires several extra lines of code to get the same result:

Dim nsmgr As New XmlNamespaceManager(doc.NameTable)
nsmgr.AddNamespace("ab", "http://example.com/xmlns/foo")
Dim desiredNode As XmlNode = doc.SelectSingleNode("//ab:nodeName", nsmgr)

...essentially dreaming up a non-existent prefix ("ab") to find a node that doesn't even use a prefix. How does this make sense? What is wrong (conceptually) with doc.SelectSingleNode("//nodeName")?

Mystery #2: So, say you've got an XML document that uses prefixes:

<?xml version="1.0" encoding="ISO-8859-1"?>
<rootNode xmlns:cde="http://example.com/xmlns/foo" xmlns:feg="http://example.com/xmlns/bar">
   <cde:nodeName>Some Text Here</cde:nodeName>
   <feg:nodeName>Some Other Value</feg:nodeName>
   <feg:otherName>Yet Another Value</feg:otherName>
</rootNode>

... If I understand correctly, you would have to add both namespaces to the XmlNamespaceManager, in order to make a query for a single node...

Dim nsmgr As New XmlNamespaceManager(doc.NameTable)
nsmgr.AddNamespace("cde", "http://example.com/xmlns/foo")
nsmgr.AddNamespace("feg", "http://example.com/xmlns/bar")
Dim desiredNode As XmlNode = doc.SelectSingleNode("//feg:nodeName", nsmgr)

... Why, in this case, do I need (conceptually) a namespace manager?

******REDACTED into comments below****

Edit Added: My revised and refined question is based upon the apparent redundancy of the XmlNamespaceManager in what I believe to be the majority of cases and the use of the namespace manager to specify a mapping of prefix to URI:

When the direct mapping of the namespace prefix ("cde") to the namespace URI ("http://example.com/xmlns/foo") is explicitly stated in the source document:

...<rootNode xmlns:cde="http://example.com/xmlns/foo"...

what is the conceptual need for a programmer to recreate that mapping before making a query?

As a quick addendum, I concede there are probably situations where something like an XMLNamespaceManager would make things easier, but I believe in the above situations it makes things CONSIDERABLY harder than they have to be.
My primary source of confusion is why the relationship of prefix to namespace needs to be specified BOTH in the XML document and the code that implements the XPath Query. If the root node contains the mapping already, why do I have to essentially hard code information that is already parsed when the document is loaded? Also, if a third namespace is added to the document in the future, would I not have to alter and recompile my code to declare that third relationship?
REDACTED from above: What is wrong with simply putting the namespace prefix in the XPath query --doc.SelectSingleNode("//feg:nodeName") -- and being done with it? To the human brain, can there be any doubt as to what is meant by that code fragment? [PARAGRAPH] Stated differently, what is really added to the understanding of the situation by the extra lines of code and the instantiation of an XmlNamespaceManager that is not clearly derivable from the source XML document and/or the XPath Query?
REDACTED from above, continued: Surely, for a majority of XML documents and situations using XML and XPath, it is at least conceivable, if not quite practical, to simply get the namespace information from the document and query, rather than requiring foreknowledge of the namespaces, or manual parsing of the document to determine the arguments for AddNamespace()? I can't help but think I must be missing something obvious, and if I am, please enlighten me!
+1 for this question. I'm having this exact same thought right now. My root node has a bunch of xmlns:abc="..." xmlns:def="..." attributes; why on earth can't the XPathNodeIterator figure out what namespace is associated with a child node like <abc:SomeNode/> without an XmlNamespaceManager?

C
Community

The basic point (as pointed out by Kev, above), is that the namespace URI is the important part of the namespace, rather than the namespace prefix, the prefix is an "arbitrary convenience"

As for why you need a namespace manager, rather than there being some magic that works it out using the document, I can think of two reasons.

Reason 1

If it were permitted to only add namespace declarations to the documentElement, as in your examples, it would indeed be trivial for selectSingleNode to just use whatever is defined.

However, you can define namespace prefixes on any element in a document, and namespace prefixes are not uniquely bound to any given namespace in a document. Consider the following example

<w xmlns:a="mynamespace">
  <a:x>
    <y xmlns:a="myOthernamespace">
      <z xmlns="mynamespace">
      <b:z xmlns:b="mynamespace">
      <z xmlns="myOthernamespace">
      <b:z xmlns:b="myOthernamespace">
    </y>
  </a:x>
</w>

In this example, what would you want //z, //a:z and //b:z to return? How, without some kind of external namespace manager, would you express that?

Reason 2

It allows you to reuse the same XPath expression for any equivalent document, without needing to know anything about the namespace prefixes in use.

myXPathExpression = "//z:y"
doc1.selectSingleNode(myXPathExpression);
doc2.selectSingleNode(myXPathExpression);

doc1:

<x>
  <z:y xmlns:z="mynamespace" />
</x>

doc2:

<x xmlns"mynamespace">
  <y>
</x>

In order to achieve this latter goal without a namespace manager, you would have to inspect each document, building a custom XPath expression for each one.


Though the sample under Reason 1 seems valid AFAIK, I have to wonder how many cases exist like that in the real world, as it seems insanely complex. Certainly, using single-letter namespace and nodenames limits the number of possibilities somewhat, even though I have seen a few real-life examples of 2-, 3- and 4-letter abbreviations used as namespace prefixes, I have yet to see 1-letter prefixes outside of theory and examples. Basically, I find myself really searching for how to express any of that WITH a namespace manager.
To answer your question for Reason 1: it depends upon what I wanted to find or filter out of the data - something very hard to do with such complex AND AT THE SAME TIME nonsensical node names and relationships. However, your Reason 1 provides the most insight and clearest answer so far... As for Reason 2, I'm not sure that the code provided would execute, as your source uses namespaces, but you do not provide a namspace manager - am I wrong in this?
In both of my examples, I'm asking you to consider life without a namespace manager. As far as I can see, the question I pose in Reason 1 is impossible to answer without recourse to a namespace manager. I'm not asking how to extract any particular nodes, I'm asking which nodes you would expect those expressions to return.
You are right - the code in Reason 2 will need a namespace manager. I deliberately leave out the namespace manager because the point of your question (as I understand it) is that you believe we can live without it - this demonstrates a situation in which we can't.
Final answer for your question in Reason 1: //z should match <z xmlns="mynamespace"> and <z xmlns="myOthernamespace">, //a:z would return an empty set, and //b:z would match <b:z xmlns:b="mynamespace"> and <b:z xmlns:b="myOthernamespace"> -- the logic behind this is that there was no namespace manager specified, and there was no "attempt to get info from the document itself" command, thus namespaces are treated like any other attributes, and : becomes another valid character like - in my mind, if you know your data, or don't care, querying a node shouldn't be so painful
A
Adrian Zanescu

The reason is simple. There is no required connection between the prefixes you use in your XPath query and the declared prefixes in the xml document. To give an example the following xmls are semantically equivalent:

<aaa:root xmlns:aaa="http://someplace.org">
 <aaa:element>text</aaa:element>
</aaa:root>

vs

  <bbb:root xmlns:bbb="http://someplace.org">
     <bbb:element>text</bbb:element>
  </bbb:root>

The "ccc:root/ccc:element" query will match both instances provided there is a mapping in the namespace manager for that.

nsmgr.AddNamespace("ccc", "http://someplace.org")

The .NET implementation does not care about the literal prefixes used in the xml only that there is a prefix defined for the query literal and that the namespace value matches the actual value of the doc. This is required to have constant query expressions even if the prefixes vary between consumed documents and it's the correct implementation for the general case.


T
Tom Hunter

As far as I can tell, there is no good reason that you should need to manually define an XmlNamespaceManager to get at abc-prefixed nodes if you have a document like this:

<itemContainer xmlns:abc="http://abc.com" xmlns:def="http://def.com">
    <abc:nodeA>...</abc:nodeA>
    <def:nodeB>...</def:nodeB>
    <abc:nodeC>...</abc:nodeC>
</itemContainer>

Microsoft simply couldn't be bothered to write something to detect that xmlns:abc had already been specified in a parent node. I could be wrong, and if so, I'd welcome comments on this answer so I can update it.

However, this blog post seems to confirm my suspicion. It basically says that you need to manually define an XmlNamespaceManager and manually iterate through the xmlns: attributes, adding each one to the namespace manager. Dunno why Microsoft couldn't do this automatically.

Here's a method I created based on that blog post to automatically generate an XmlNamespaceManager based on the xmlns: attributes of a source XmlDocument:

/// <summary>
/// Creates an XmlNamespaceManager based on a source XmlDocument's name table, and prepopulates its namespaces with any 'xmlns:' attributes of the root node.
/// </summary>
/// <param name="sourceDocument">The source XML document to create the XmlNamespaceManager for.</param>
/// <returns>The created XmlNamespaceManager.</returns>
private XmlNamespaceManager createNsMgrForDocument(XmlDocument sourceDocument)
{
    XmlNamespaceManager nsMgr = new XmlNamespaceManager(sourceDocument.NameTable);

    foreach (XmlAttribute attr in sourceDocument.SelectSingleNode("/*").Attributes)
    {
        if (attr.Prefix == "xmlns")
        {
            nsMgr.AddNamespace(attr.LocalName, attr.Value);
        }
    }

    return nsMgr;
}

And I use it like so:

XPathNavigator xNav = xmlDoc.CreateNavigator();
XPathNodeIterator xIter = xNav.Select("//abc:NodeC", createNsMgrForDocument(xmlDoc));

welp, coming back to this after all this time - it's not just Microsoft - I believe it is in the XML or XPATH spec - and it happens in a similar way in other non-MS languages I've used - not sure if there's one that DOES extract namespaces for you, but then how does one specify which scope (because namespaces can be specified at any scope)... idunno - I would love a literal mode, where : becomes a literal character similar to a number, letter or - and thus prfx:NodeName is treated just like prfxNodeName or prfx-NodeName - a simple identifier... not standard-compliant though... sigh
K
Kev

I answer to point 1:

Setting a default namespace for an XML document still means that the nodes, even without a namespace prefix, i.e.:

<rootNode xmlns="http://someplace.org">
   <nodeName>Some Text Here</nodeName>
</rootNode>

are no longer in the "empty" namespace. You still need some way to reference these nodes using XPath, so you create a prefix to reference them, even if it is "made up".

To answer point 2:

<rootNode xmlns:cde="http://someplace.org" xmlns:feg="http://otherplace.net">
   <cde:nodeName>Some Text Here</cde:nodeName>
   <feg:nodeName>Some Other Value</feg:nodeName>
   <feg:otherName>Yet Another Value</feg:otherName>
</rootNode>

Internally in the instance document, the nodes that reside in a namespace are stored with their node name and their long namespace name, it's called (in W3C parlance) an expanded name.

For example <cde:nodeName> is essentially stored as <http://someplace.org:nodeName>. A namespace prefix is an arbitrary convenience for humans so that when we type out XML or have to read it we don't have to do this:

<rootNode>
   <http://someplace.org:nodeName>Some Text Here</http://someplace.org:nodeName>
   <http://otherplace.net:nodeName>Some Other Value</http://otherplace.net:nodeName>
   <http://otherplace.net:otherName>Yet Another Value</http://otherplace.net:otherName>
</rootNode>

When an XML document is searched, it's not searched by the friendly prefix, they search is done by namespace URI so you have to tell XPath about your namespaces via a namespace table passed in using XmlNamespaceManager.


Though I see no conceptual reason to require someone to acknowledge a non-"empty" namespace when there is still only one namespace used, why would it be necessary to make a function require more than a flag such as doc.SelectSingleNode("//nodeName", NamespaceFlags.UseDocumentNamespace)
-- That is, why require a programmer to instantiate a separate object, have foreknowledge of (or code to parse and determine) the namespace used in the document, then specify a completely random and artificial namespace prefix to insert into the XPath query?? Please forgive my tone - I am simply bemused.
@code - It's because in more complex documents (for example RSS feeds) where there is often more than one namespace in play. Having a special flag just to handle that particular condition (of the document only being in the default namespace as per your example) is a poor design choice and is adding extra complexity the the framework code. So why not cover all bases and ask the consumer of the code to pass an XmlNamespaceManager instead.
Your provided example (RSS), I believe, relates to my Mystery #2 in the original question (more than one namespace). the XPath Query and the RSS document themselves contain all information needed to query a node. The only situation I can imagine needing an XmlNamespaceManager is one where there are multiple namespaces ("someplace.org" AND "otherplace.net") using the same prefixes (both using xmlns:place or similar, but at different scopes in the document). Otherwise, the document and query provide all the information necessary to produce the desired result.
I appreciate your patience, but this still does not seem to answer my question. Why is it necessary to use more than //feg:nodeName to find a particular node? It should be relatively trivial to internally convert feg... to http://otherplace.net WITHOUT me having to explicitly state that relationship - it's right there in the root node! (xmlns:feg="http://otherplace.net"). At minimum, I think there should be a helper function like XmlNamespaceManager.GetNSFromDocument(xdoc)... If the answer is simply that they did not (yet) do this work for you, then OK! Is this the case?
C
Christian Schwarz

You need to register the URI/prefix pairs to the XmlNamespaceManager instance to let SelectSingleNode() know which particular "nodeName" node you're referring to - the one from "http://someplace.org" or the one from "http://otherplace.net".

Please note that the concrete prefix name doesn't matter when you're doing the XPath query. I believe this works too:

Dim nsmgr As New XmlNamespaceManager(doc.NameTable)
nsmgr.AddNamespace("any", "http://someplace.org")
nsmgr.AddNamespace("thing", "http://otherplace.net")
Dim desiredNode As XmlNode = doc.SelectSingleNode("//thing:nodeName", nsmgr)

SelectSingleNode() just needs a connection between the prefix from your XPath expression and the namespace URI.


C
Community

This thread has helped me understand the issue of namespaces much more clearly. Thanks. When I saw Jez's code, I tried it because it looked like a better solution than I had programmed. I discovered some shortcomings with it, though. As written, it looks only in the root node (but namespaces can be listed anywhere.), and it doesn't handle default namespaces. I tried to address these issues by modifying his code, but to no avail.

Here is my version of that function. It uses regular expressions to find the namespace mappings throughout the file; works with default namespaces, giving them the arbitrary prefix 'ns'; and handles multiple occurrences of the same namespace.

private XmlNamespaceManager CreateNamespaceManagerForDocument(XmlDocument document)
{
    var nsMgr = new XmlNamespaceManager(document.NameTable);

    // Find and remember each xmlns attribute, assigning the 'ns' prefix to default namespaces.
    var nameSpaces = new Dictionary<string, string>();
    foreach (Match match in new Regex(@"xmlns:?(.*?)=([\x22\x27])(.+?)\2").Matches(document.OuterXml))
        nameSpaces[match.Groups[1].Value + ":" + match.Groups[3].Value] = match.Groups[1].Value == "" ? "ns" : match.Groups[1].Value;

    // Go through the dictionary, and number non-unique prefixes before adding them to the namespace manager.
    var prefixCounts = new Dictionary<string, int>();
    foreach (var namespaceItem in nameSpaces)
    {
        var prefix = namespaceItem.Value;
        var namespaceURI = namespaceItem.Key.Split(':')[1];
        if (prefixCounts.ContainsKey(prefix)) 
            prefixCounts[prefix]++; 
        else 
            prefixCounts[prefix] = 0;
        nsMgr.AddNamespace(prefix + prefixCounts[prefix].ToString("#;;"), namespaceURI);
    }
    return nsMgr;
}

关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now