The Science of Google Dorking

In this post I'm in proposing some new and improved Google dorks for hackers/pentesters and generally any one that likes finding web based targets based on the vulnerabilities they expose, the dorks I will discuss here include servers exhibiting:

Local file inclusion / Remote File inclusion vulnerabilities
SQL injection
Error based injection

Whats so important about detailing targets so intensely?

"The devils in the details"

Hacking is a problem solving discipline, and like any problem solving discipline (physics, computer science,mathematics etc) its very important that you practice, but what's even more important is how you practice!

Google dorking gives hackers the ability to specify training environments down to every detail so that they can fine tune their skills, for instance if you want to master MySQL error based injection, you would know that it depends on:

The scripting language used
The OS type hosting the web server
The Web server software
The Web server software version
The plugin versions
The DBMS version
The domain (different countries have different colleges and DBMS/Web scripting courses and teach in different ways, thus developers from different countries are brought to different understanding of their environments and thus also hackers from different countries use different attack trends, so the defence disciplines can be differently applied, so the country a server resides in can make a huge difference, location location location!!)

and possibly more! Mastering a complex discipline like hacking is all about breaking it down into little pieces and mastering it from the bottom up! Making sure you can control as much of an attack as possible and that you understand how an environment will respond (bruteforcing and fuzzing is a last resort not a first!)

The Directives:

There a couple of directives you must first become familiar with before I can talk about hunting targets, these operatives are used to specify:

attributes of web pages and URLS
placed to look for web pages and URLS

You use directives by entering them in the google search bar as follows:

directive-alias:term

direcitves a declared using the ":", so if you want something to be interpreted as a directive, append a ":" with no spaces between the alias,":" and term
you may combined/repeat directives using binary operators like NOT/OR/AND and include wildcards, I explain this well further on in the article , some directives have a variant that allows multiple terms, these are usually prefixed with an "all" in the alias, e.g inurl and allinurl

directive-alias --- the name given to the directive, read on for some examples of directives
term --- the argument the directive is to use for specification

I've only inlcuded the directives here that pertain to finding targets, a few more exist you can look them up if you're interested ;)

Lets jump right in!

inurl --- restricts results to URLS with the specified text in them

e.g inurl:wp-content -- find all Wordpress urls, try it!

allinurl --- same as above but its used to specify multiple terms
intext --- restricts results to URLS referring t0 web pages with the specified term in the printable text of the web page

e.g intext:"Hacking Tutorials" --- finds pages that are likely discussing hacking tutorials

allintext --- same as above, allowing multiple terms
intitle --- specifies URLS referring to web pages with the specified term in their title
allintitle --- blah!
site --- one of my favourites, the restricts results to URLS that refer to the host(s) specified in the term

e.g site:www.google.com will return all the pages that are indexed on google

link --- specifies pages with links to URL specified in the term
inanchor -- specifies pages containing achnor tags with the term occuring in the href= agrument of the tag

inachor:www.exploit-db.com --- will return all pages that have anchors refering to exploit-db.com

cache --- this one is interesting, it searches googles cache for version of a site specified in the term

e.g cache:k3170makan.blogspot.com --- returns the last cached version of this site

filetype --- returns results to files with the specified file extension

e.g filetype:pdf --- returns ALL URLS to pdfs

Binary Operators

"to dork || !(to dork)"

You can also combined them, or other directives (not all of them) for a more strict specification of the results, by using the Binary operatives and the wildcard operative:

(read up on boolean algebra, do a couple of problems, being a computer science,physics student I've and to do two boolean algebra courses during my degree and i can tell you being able to apply my algebra skills in a google search is incredibly powerful!!)

"*" -- the wild card operative
"|" -- the OR operator
"+" -- the AND operator
"-" -- the NOT operator

Here are a few examples:

inurl:*.edu --- returns all urls ending in .edu, and interesting thing to note is this returns millions more urls than inurl:.edu??
inurl:*.ac.za -inurl:*.uj.ac.za --- this returns all south african academic institutes except for the University of Johannesburg (which is specified using *.uj.ac.za)
inurl:"*.gov.*" | inurl:"*.edu"--- this specifies all URLS registered for academic institutions or government institutions

I'm sure you can figure out how the other operators are used!

My scouter indicates that your that power level has increased! you are ready to learn a new technique...

LFI/RFI Detection:

Before we can ask google to look for things for us, we need to make sure we know what we are looking for, how do we know when a script is vulnerable to LFI/RFI attacks?

If the CMS ,CMS version and plugins being used are known to exhibit LFI/RFI vulnerabilities
unclean Data from GET/POST requests are included in the arguments to these functions

include
include_once
readfile
fopen (on the off chance)

Thats about all that gives away an LFI/RFI vulnerability, but then what identifies scripts using these functions? All we can go on is:

The arguments supplied to the scripts --- these arguments are popularly identified by tell tale keywords like 'page'/'link'/'url' etc
The errors that they generate --- PHP notoriously declares script names and the input that causes the error

This detail is often enough to find vulnerable servers.

Specifying script types:

We can use directives to locate the write script types, by having google only inlcude results for scripts with the right extension (we don't hafto include them, but for people you want to know exactly what environment they are attacking this specification is critical). The script type is often reflected in the extension of the file with the scripting instructions (this is not a perfect way of determining script type, but it proves effective enough because of conventions applied in the web development world). These extensions are included in the URL of the script, so naturally we would whip out the inurl,allinurl operatives to look for scripts like so:

inurl:".php" --- will return all urls with a .php in them, most likely urls refering to php scripts
allinurl:".php .aspx .py " --- will return urls with ".php",".aspx" or ".py" which will most probably be python,ASP.NET or PHP scripts

The most easily exploitable servers are locatable by certain query strings that may elude LFI vulnerabilities, these query strings include aguments that are most likely specifying scripts or other files as arguments them self!

e.g

www.vulnerable1.com/?page=contact.php
www.vulnerable2.com/index.php?pages=about.php
www.vulnerable3.com/download.php?file=details.pdf
www.vulnerrable4.com/forcedownload.php?file=something.txt

So we need to do some URL based restricting again, we can find example like the ones about by dorking in the following way.

inurl:".php?page="
inurl:"?forcedownload.php?=" ---this is actually an example from exploit-db.com
inurl:".php?=*.pdf" --- using the all mighty wild card!!
inurl:".php?*=*.php" --- the zinger!! this one of the dorks i'm most proud of!! it find literally ALL php scripts with php arguments

We're half way there now we need to find out which of the specified scripts actually call the right functions

Fingerprinting script function calls:

As discussed earlier in this section, we identify function calls by looking for error dumps, these error dumps appear in the body/content/visible text of the page, so naturally we need to use the intext,allintext directives, here are some examples:

intext:"Warning: include"
intext:"Warning: readfile"
intext:"Warning: include_once"
intext:"Failed opening stream"

You could combined them to rack up more results, like so

(intext:"Warning: inlcude(") | (intext:"Warning: readfile")

Putting what we have so far together:

as an example based on what I've written in this section, we can construct the following examples

inurl:*.php?*=*.php (intext:"Warning: include") | (intext:"Warning: include_once") | (intext:"Warning: readfile") | (intext:"Failed opening stream")

If you've read my article on LFI/RFI or just generally know about LFI exploitation, then you know about the dreaded open_basedir!! Well you can imagine that some of these scripts in the result listing we get will include ones with open_basedir restrictions. We need to get rid of those! Removing them from the results requires the use of the NOT operator, like so:

-intext:"open_basedir restriction"

So now you have a pretty effective google dork if you combine all this:

inurl:*.php?*=*.php (intext:"Warning: include") | (intext:"Warning: include_once") | (intext:"Warning: readfile") | (intext:"Failed opening stream") -intext:"open_basedir restriction"

Improvements

"are you getting that this is actually a scientific discipline yet? solve,improve,solve,improve...ad infinitum!"

This dork is pretty effective, but improvement can yet be made:

by using more binary expression we can make sure we only get scripts with script arguments that are reporting errors

at the moment the dork above does infact still return scripts reporting errors with out scripts as data in the query string e.g www.host.com with a "Warning : include" error dump, no *.php?*=*.php type query string in the url

The results list returns sites with .php script arguments in the query string AND error dumps in the page, BUT when looking at the URL we can tell immediatly that it is very unlikely we have an exploitable target, can you guess why??

The result list will also include URLS to forums/forum like sites discussing PHP script errors, like:

stackoverflow.com
php.net
etc.

i fear this is the fact that crushed my dreams of creating the perfect LFI dork, since the probably is no way to exclude all possible sites of this kind :( but i shan't give up!!
but pushing on! we need to negate these results from the list using the NOT operator again

Including all this information in the construction of a new dork, I came up with this lil baby:

inurl:"*.php?*=*.php" intext:"Warning: include" -inurl:.html -site:"php.net" -site:"stackoverflow.com" -inurl:"forums.*"

you'll notice I negated .html's this is a good thing to exclude most of the time!

And thats it folks!, thats it for LFI/RFI vulnerabilities though

Improving even further, you can wild card the forum negation a bit more, turning the dork into:

inurl:"*.php?*=*.php" intext:"Warning: include" -inurl:.html -site:"php.net" -site:"stackoverflow.com" -inurl:"*forums*"

as crazy as it sounds there are yet improvements that can be made, I'm working on that at the moment!!

I'll be updating this with the MySQLi and MySQL EBi dorks very soon! I was hoping you guys would enjoy the LFI/RFI part for now, coz i need a break, phew!

Comments

Keith anti-newb Makan22 January 2012 at 15:20
If you guys need to know a lil more drop me a comment, and ill help out as best I can!!
terry jackson12 March 2012 at 01:32
Really good information and well layed out unlike alot of security blogs. I will be a regular vistor now!

Thanks

T.J
Unknown9 July 2012 at 18:12
Hey Keith, very well done. Actually I'm still familiarizing how the whole thing goes, I'm a bit new to web so it runs a bit fuzzy this time. But I'll keep myself updated with your progress. Thanks!

______________

WordPress Development
Deck Helmet28 March 2013 at 04:16
I have found here much useful information for myself. Many thanks to the editors for the info.

Deck Helmet
Amber Anand10 April 2013 at 19:21
visit http://bypassthesecurity.blogspot.com
Unknown29 April 2013 at 04:49
Hey!
What a commendable work you have done, with simplest of language. I like the precious suggestions you shared with us in your expertly written blog post. I want to thank you for this.

Vachel
PHP Developer Chicago
cmscentral.net

k3170

Search This Blog

The Science of Google Dorking

Comments

Post a Comment