What is an XML parser and why should I use one?

An XML parser is the piece of software that reads XML files and makes the information from those files available to applications and programming languages, usually through a known interface like the DOM. The XML parser is responsible for testing whether a document is well-formed and, if given a DTD or XML schema, it will also check for validity (i.e., it determines if the document follows the rules of the DTD or schema). It is a programming library that improves programmers' access to the contents of an XML document.  There are two main categories of XML parsers:  Document Object Model (DOM) parsers, and Simple API for XML (SAX) parsers.  DOM parsers take the approach of creating an in-memory representation of the XML document, which the developer can then "walk" to find the information desired.  SAX parsers take a different approach that does not store the entire document in memory; rather, as elements are encountered in the XML document, an event is generated to inform the programmer's code of the fact.  The programmer's code then does whatever is appropriate at that point in the document.

Other tools, like XSLT transform engines and XML-object binding toolkits, are built on top of XML parsers and offer enhanced functionality in processing XML.

All parsers will produce parse errors upon encountering non-well-formed XML.  Many parsers will also validate the XML instance against a schema, if one is available.  (Often this requires setting an option on the parser before parsing takes place.)

Most development platforms and programming languages have at least one XML parser available.  XML parsing capabilities are built-in to the Java and Microsoft .NET platforms.  In Java, the programmer can use the default library that ships with the Java 2 Standard Edition (J2SE) environment; alternatively, the programmer can "plug in" any compliant parser, such as the Xerces parser from the Apache Foundation.

The alternative to using a parser is to program the processing of an XML document "by hand" (i.e., by reading the XML text, processing XML tags, etc.)  While seemingly an easy task, implementing the full XML specification (in addition to schema validation) is complicated.  Absent a very good reason, developers should use an off-the-shelf XML parser.