Thursday, October 11th, 2012

Content Smuggling

A few years ago, I discovered a peculiar design decision described in the PDF specification. This design flaw allows for an attacker to conduct XSS attacks against some websites that would not normally have XSS vulnerabilities. I reported this issue to Adobe in late 2009. Apparently, there are some challenging back-compat issues which make changing this design difficult. Given it’s been nearly three years since I first reported the issue to Adobe and a fix from Adobe doesn’t seem likely (Chrome has already fixed their internal PDF reader), I figured I should let web application community know about the exposures. I don’t expect “APT” or other 31337 $country “cyber liberation armies” to use this anytime soon, but it is interesting behavior and I hope web application security folks find it interesting. Hopefully some researcher who’s smarter than me can take this to the next level. Oh, and I apologize for the ugly PoCs in advance!

If we take a look at section 3.4.1, “File Header” of Appendix H in the PDF specification, we see the following:

13. Acrobat viewers require only that the header appear somewhere within
the first 1024 bytes of the file.

Anyone who has read the PDF specification probably knows about this behavior, in fact Julia Wolf mentioned this behavior in her epic OMG WTF PDF talk at CCC in 2010. This peculiar design allows for the creation of a hybrid file that is both some arbitrary file type (such as gif, png, doc…etc) and PDF. We do this by cramming a PDF header after another file header. An example of this is shown in the screenshot below:


Hopefully, by now we’ve already realized that hosting user controlled PDFs and serving those PDFs from a sensitive domain is dangerous from a web security perspective. However, with this quirky file header behavior, we’ll have the ability to smuggle PDFs onto a website that only accepts “benign” file types. As an example, I’ve uploaded such a file to an appspot web application I created. The PoC shows that we can load a single file as both a GIF (or any other file type we want) and a PDF. Adobe PDF reader needs to be set as your default PDF handler for the PoC to work.

The only difference in the two displays in the PoC above is the way we reference the file. In the case of the image, we simply use an IMG tag. When we want to force the browser to hand the file to the default PDF reader, we make use of the OBJECT tag and explicitly specify a content type to force the content to be handled by the default PDF reader. Of course, this technique can be generalized for other plugins.

<img src="" height=10 width=10></img>


<object data="" type="application/pdf" width="500" height="500"></object>

PDFs do not have by-design access to the DOM of the domain from which it is served. How then can we use a PDF to achieve XSS? Here is where the feature rich Adobe PDF Reader comes into play. Once the PDF is loaded, we have a couple different options to achieve XSS. First, we can redirect the PDF to a javascript url. These redirections will navigate the browser (not the PDF document) and results in true browser based XSS on the victim domain. Luckily, Adobe considers redirection from a PDF to javascript URLs a bug and has eliminated the most obvious methods for achieving this. There is however, another method which essentially achieves the same impact. We can use a built in API to make network requests to and from the victim domain. These network requests will carry any cookies associated with the victim domain, giving the attacker access to authenticated resources.

The following link demonstrates how this issue would be used against a website. The domain (the attacker’s domain in this example) loads a smuggled GIF/PDF from (the victim domain in this example). Once the PDF is loaded, we make use of the built in XML APIs to retrieve a file /secret.txt from (the victim domain).

IE users will see a warning in the PDF reader. This is because IE actually downloads the PDF and opens a local copy :) You can verify the IE behavior by browsing to this PoC with Internet Explorer (Adobe PDF Reader must be set as the default reader).

Lastly, you can inject a PDF into a website if you have already XSS. This might be helpful in bypassing XSS filters or application filtering. This is accomplished by injecting a PDF into the vulnerable site using the OBJECT tag.

<object data="http://vulnerable-domain/xss.asp?vulnerable-param=<injected PDF HERE>" type="application/pdf" width="500" height="500"></object>

An example of how this could be done is given below (this PoC best viewed in FireFox with Adobe PDF Reader, but the technique is possible for all browsers).

What’s the impact? Well, I suspect there are plenty of Internet facing web sites that are vulnerable to this bug. Any web application that accepts uploads of “benign” file types and then serves those files back to the user could be affected. This also affects websites which rely on content-type headers to prevent XSS (btw, this strategy doesn’t work). See Phil Purviance’s blog for tips on spotting (and exploiting) websites that use content-type to protect against XSS. This bug can also be used to exploit the applications that use content-disposition headers to prevent XSS bugs. The most common attack surface here will likely be internal content portals. Pretty much every internal content portal used in the enterprise is vulnerable to this issue (think Sharepoint).

You can test for this issue by trying to upload this file to a vulnerable web application. If you see the PDF header in the uploaded file AND the file is served from a sensitive domain (ex. it has auth cookies), then the application is vulnerable.

The proper defense for this is the usage of alternate domains for user supplied content (aka sandboxed domains). Sandboxed domains can be tricky to implement. Some of the most popular web applications on the web already make extensive use of sandbox domains, but the vast majority of web applications do not. Once again, internal content portals are in a hard spot as it’s more difficult to implement a sandboxed domain on an internal network. Sandboxed domains is a subject many “web application security specialists” understand poorly and probably deserves its own blog post. How to properly implementing a sandboxed domain is a great interview question for senior web application security roles because it tests design and implementation skills. It also requires a really solid understanding of browser/plugin same origin policy. I haven’t seen much written about sandboxed domains, but this blog post does a nice job of summing up some of the challenges of content hosting.

Happy hunting!


Posted by xssniper | Filed in Security, Web Application Security

3 Responses to “Content Smuggling”

  1. October 12th, 2012 at 11:16 am

    Content Smuggling said:

    […] […]

  2. January 17th, 2013 at 11:39 am

    Top Ten Web Hacking Techniques of 2012 | Phong Tử Blog - Cuộc Đời Lắm Gian Nan! said:

    […] Content Smuggling […]

  3. February 11th, 2013 at 1:04 pm

    Top Ten Web Hacking Techniques of 2012 | WhiteHat Security Blog said:

    […] Content Smuggling […]

Please leave a Comment