Pinpoint Tool Released
There are many times where I come across a drive-by download, especially malvertisements, and it takes me awhile to figure out which file on the compromised website is infected. I wrote Pinpoint to help me find the malicious objects faster so I can provide that info to webmasters for clean-up purposes. My hope is that this tool will be helpful to you as well.
Pinpoint works like wget/curl in that it just fetches a webpage without rendering any script. Pinpoint will then try to determine which links are used to make up the webpage such as Javascript, CSS, frames, and iframes and downloads those files too (some Javascript content will produce incorrect links). The list of links it finds shows up in the document tree on the main window.
At the same time, a log file is created which shows the links and in which file the link resided in. It will also download the file and calculate the "entropy"; the higher the value, the more rubbish characters it found which may help identify obfuscated Javascript.
You can of course spoof the user-agent string and referer values to ilicit a malicious response from the website. There's also a function to clear your cookies (see Options menu item) since many exploit packs check for the presence of cookies on repeated visits. Use Tor to get another IP address since it'll get banned usually after the first visit.
Here are a few examples to help you better understand what this program does.
Example #1
Visiting the website in the screenshot below with the appropriate user-agent and referer values reveals a suspicious-looking iframe.
Opening log.txt shows the files that make up the webpage. The second URL from the top looks suspicious and has a 51% entropy value which is not really high but may indicate the presense of some obfuscated Javascript.
By default, there are two other files that are created. The capture.txt contains the HTML source code of all of the pages it fetched. You can see the main webpage and somewhere in there is the malicious iframe. Under that is the HTML source code of the iframed page.
While your computer is safe since none of the scripts are executed, your anti-virus may still detect malicious scripts and trigger an alert (and blacklist my Pinpoint program in the process). Ideally you should be doing all of this in a virtual machine without AV anyway.
The second file that gets created is clean.txt which is a cleaned up version of the HTML source code that just contains Javascript, frame, iframe, calls to external scripts, etc. This should help you locate the malicious content faster. Unfortunately, you may still have to figure out where in the source code the malicious content is.
Example #2
Another infected website is visited and it shows two suspicious links -- one in the external Javascript section and the second in the iframe section.
Here's the log file that shows several things. First the presence of the external Javascript file about half-way down. The second is the last file with a weird URL. The entropy value is very high for this one which means this is likely heavily obfuscated (or packed/compressed).
The request chain reveals that the main page called up the external Javascript file and that file called up the iframe.
Here's the capture file which shows that iframe page with the entropy value of 86.48%.
And here's the clean file showing the external Javascript file. By the way, I made everything left-justified in this file to make it quickly readable.
Example #3
Here's the last example. This one doesn't show anything in the document tree because there's no links on the page.
The log file shows content with a relatively high amount of entropy.
The capture file shows obfuscated Javascript appended to legitimate webpage contents.
Finally, the clean file shows the important bits.
Most of the options don't need any explanation but here's a brief description of those that do:
Disable Compression - sends the HTTP request without the encoding option
Enable Entropy - performs the entropy check
Ignore Safe Sites - ignores common sites that host frameworks, ads, and other legitimate content so it doesn't get downloaded
Ignore CSS - ignores external CSS files so that it doesn't get downloaded
When visiting a large website full of links, AJAX calls, and embedded content, Pinpoint may choke on it. I'll explore other methods but for now this seems to work fine most of the time. You can find the tool here.