The Science of Google Dorking

In this post I'm in proposing some new and improved Google dorks for hackers/pentesters and generally any one that likes finding web based targets based on the vulnerabilities they expose, the dorks I will discuss here include servers exhibiting:

  • Local file inclusion / Remote File inclusion vulnerabilities
  • SQL injection
  • Error based injection
Whats so important about detailing targets so intensely?
"The devils in the details"
Hacking is a problem solving discipline, and like any problem solving discipline (physics, computer science,mathematics etc) its very important that you practice, but what's even more important is how you practice!

Google dorking gives hackers the ability to specify training environments down to every detail so that they can fine tune their skills, for instance if you want to master MySQL error based injection, you would know that it depends on:
  • The scripting language used
  • The OS type hosting the web server
  • The Web server software
  • The Web server software version
  • The plugin versions
  • The DBMS version
  • The domain (different countries have different colleges and DBMS/Web scripting courses and teach in different ways, thus developers from different countries are brought to different understanding of their environments and thus also hackers from different countries use different attack trends, so the defence disciplines can be differently applied,  so the country a server resides in can make a huge difference, location location location!!)
and possibly more!  Mastering a complex discipline like hacking is all about breaking it down into little pieces and mastering it from the bottom up! Making sure you can control as much of an attack as possible and that you understand how an environment will respond (bruteforcing and fuzzing is a last resort not a first!)
The Directives:
There a couple of directives you must first become familiar with before I can talk about hunting targets, these operatives are used to specify:
  • attributes of web pages and URLS
  • placed to look for web pages and URLS
You use directives by entering them in the google search bar as follows:
directive-alias:term
direcitves a declared using the ":", so if you want something to be interpreted as a directive, append a ":" with no spaces between the alias,":" and term
you may combined/repeat directives using binary operators like NOT/OR/AND and include wildcards, I explain this well further on in the article , some directives have a variant that allows multiple terms, these are usually prefixed with an "all" in the alias, e.g inurl and allinurl



  1. directive-alias --- the name given to the directive, read on for some examples of directives
  2. term ---  the argument the directive is to use for specification

I've only inlcuded the directives here that pertain to finding targets, a few more exist you can look them up if you're interested ;)
Lets jump right in!
  • inurl --- restricts results to URLS with the specified text in them
    • e.g inurl:wp-content -- find all Wordpress urls, try it!
  • allinurl --- same as above but its used to specify multiple terms
  • intext --- restricts results to URLS referring t0 web pages with the specified term in the printable text of the web page
    • e.g intext:"Hacking Tutorials" --- finds pages that are likely discussing hacking tutorials
  • allintext --- same as above, allowing multiple terms
  • intitle --- specifies URLS referring to web pages with the specified term in their title
  • allintitle --- blah!
  • site --- one of my favourites, the restricts results to URLS that refer to the host(s) specified in the term
    • e.g site:www.google.com will return all the pages that are indexed on google
  • link --- specifies pages with links to URL specified in the term
  • inanchor -- specifies pages containing achnor tags with the term occuring in the href= agrument of the tag
    • inachor:www.exploit-db.com --- will return all pages that have anchors refering to exploit-db.com
  • cache --- this one is interesting, it searches googles cache for version of a site specified in the term
    • e.g cache:k3170makan.blogspot.com --- returns the last cached version of this site
  • filetype --- returns results to files with the specified file extension
    • e.g filetype:pdf  --- returns ALL URLS to pdfs
Binary Operators
"to dork || !(to dork)"
You can also combined them, or other directives (not all of them) for a more strict specification of the results, by using the Binary operatives and the wildcard operative:

(read up on boolean algebra, do a couple of problems, being a computer science,physics student I've and to do two boolean algebra courses during my degree and i can tell you being able to apply my algebra skills in a google search is incredibly powerful!!)
  • "*" -- the wild card operative
  • "|" -- the OR operator
  • "+" -- the AND operator
  • "-" -- the NOT operator
Here are a few examples:
  • inurl:*.edu --- returns all urls ending in .edu, and interesting thing to note is this returns millions more urls than inurl:.edu??
  • inurl:*.ac.za -inurl:*.uj.ac.za --- this returns all south african academic institutes except for the University of Johannesburg (which is specified using *.uj.ac.za)
  • inurl:"*.gov.*" | inurl:"*.edu"--- this specifies all URLS registered for academic institutions or government institutions 
I'm sure you can figure out how the other operators are used!

My scouter indicates that your that power level has increased! you are ready to learn a new technique...
LFI/RFI Detection:

Before we can ask google to look for things for us, we need to make sure we know what we are looking for, how do we know when a script is vulnerable to LFI/RFI attacks?
  • If the CMS ,CMS version and plugins being used are known to exhibit LFI/RFI vulnerabilities
  •  unclean Data from GET/POST requests are included in the arguments to these functions
    • include
    • include_once
    • readfile
    • fopen (on the off chance)
Thats about all that gives away an LFI/RFI vulnerability, but then what identifies scripts using these functions? All we can go on is:
  • The arguments supplied to the scripts --- these arguments are popularly identified by tell tale keywords like 'page'/'link'/'url' etc
  • The errors that they generate --- PHP notoriously declares script names and the input that causes the error 
This detail is often enough to find vulnerable servers.



Specifying script types:

We can use directives to locate the write script types, by having google only inlcude results for scripts with the right extension (we don't hafto include them, but for people you want to know exactly what environment they are attacking this specification is critical). The script type is often reflected in the extension of the file with the scripting instructions (this is not a perfect way of determining script type, but it proves effective enough because of conventions applied in the web development world). These extensions are included in the URL of the script, so naturally we would whip out the inurl,allinurl operatives to look for scripts like so:
  • inurl:".php" --- will return all urls with a .php in them, most likely urls refering to php scripts
  • allinurl:".php .aspx .py " --- will return urls with ".php",".aspx" or ".py" which will most probably be python,ASP.NET or PHP scripts
The most easily exploitable servers are locatable by certain query strings that may elude LFI vulnerabilities, these query strings include aguments that are most likely specifying scripts or other files as arguments them self!
e.g
  • www.vulnerable1.com/?page=contact.php
  • www.vulnerable2.com/index.php?pages=about.php
  • www.vulnerable3.com/download.php?file=details.pdf
  • www.vulnerrable4.com/forcedownload.php?file=something.txt
So we need to do some URL based restricting again, we can find example like the ones about by dorking in the following way.
  • inurl:".php?page="
  • inurl:"?forcedownload.php?=" ---this is actually an example from exploit-db.com
  • inurl:".php?=*.pdf" --- using the all mighty wild card!!
  • inurl:".php?*=*.php" --- the zinger!! this one of the dorks i'm most proud of!! it find literally ALL php scripts with php arguments
We're half way there now we need to find out which of the specified scripts actually call the right functions


Fingerprinting script function calls:

As discussed earlier in this section, we identify function calls by looking for error dumps, these error dumps appear in the body/content/visible text of the page, so naturally we need to use the intext,allintext directives, here are some examples:
  • intext:"Warning: include"
  • intext:"Warning: readfile"
  • intext:"Warning: include_once"
  • intext:"Failed opening stream"
You could combined them to rack up more results, like so
  • (intext:"Warning: inlcude(") | (intext:"Warning: readfile")
Putting what we have so far together:
as an example based on what I've written in this section, we can construct the following examples
  • inurl:*.php?*=*.php (intext:"Warning: include") | (intext:"Warning: include_once") | (intext:"Warning: readfile") | (intext:"Failed opening stream")
If you've read my article on LFI/RFI or just generally know about LFI exploitation, then you know about the dreaded open_basedir!! Well you can imagine that some of these scripts in the result listing we get will include ones with open_basedir restrictions. We need to get rid of those! Removing them from the results requires the use of the NOT operator, like so:
  • -intext:"open_basedir restriction"
So now you have a pretty effective google dork if you combine all this:
  • inurl:*.php?*=*.php (intext:"Warning: include") | (intext:"Warning: include_once") | (intext:"Warning: readfile") | (intext:"Failed opening stream") -intext:"open_basedir restriction"


Improvements 

"are you getting that this is actually a scientific discipline yet? solve,improve,solve,improve...ad infinitum!"

This dork is pretty effective, but improvement can yet be made:
  • by using more binary expression we can make sure we only get scripts with script arguments that are reporting errors
    • at the moment the dork above does infact still return scripts reporting errors with out scripts as data in the query string e.g www.host.com with a "Warning : include" error dump, no *.php?*=*.php type query string in the url
  • The results list returns sites with .php script arguments in the query string AND error dumps in the page, BUT when looking at the URL we can tell immediatly that it is very unlikely we have an exploitable target, can you guess why??
    • The result list will also include URLS to forums/forum like sites discussing PHP script errors, like:
      • stackoverflow.com
      • php.net
      • etc.
    • i fear this is the fact that crushed my dreams of creating the perfect LFI dork, since the probably is no way to exclude all possible sites of this kind :( but i shan't give up!! 
    • but pushing on! we need to negate these results from the list using the NOT operator again
Including all this information in the construction of a new dork, I came up with this lil baby:
  • inurl:"*.php?*=*.php" intext:"Warning: include" -inurl:.html -site:"php.net" -site:"stackoverflow.com" -inurl:"forums.*"
you'll notice I negated .html's this is a good thing to exclude most of the time! 
And thats it folks!, thats it for LFI/RFI vulnerabilities though

Improving even further, you can wild card the forum negation a bit more, turning the dork into:
  • inurl:"*.php?*=*.php" intext:"Warning: include" -inurl:.html -site:"php.net" -site:"stackoverflow.com" -inurl:"*forums*"
as crazy as it sounds there are yet improvements that can be made, I'm working on that at the moment!!
I'll be updating this with the MySQLi and MySQL EBi dorks very soon! I was hoping you guys would enjoy the LFI/RFI part for now, coz i need a break, phew!