Saturday, May 4, 2013

gatherproxy.com proxy gatherer

So while browsing tbn I've found a thread asking for a gatherproxy.com bot: http://thebotnet.com/bot-requests/203930-req-gather-proxy-bot/

I've never heard of this site before but when I take a look at it all I see is an interface that slowly shows new proxies.  You can't grab them with an html parser as you normally would because the html is generated dynamically though some javascript functions on the page.  I figured this would be a be better to do in jquery/javascript because I'd need to use a headless browser to make it in vb, java or even python.

Getting started I needed a plugin for chrome (cause I like to use it better) that would let me inject my custom js into the gatherproxy site (just like scratchpad that you open with Shift+F4 in firefox).  So I've picked this one and it works perfectly: https://chrome.google.com/webstore/detail/javascript-injector-nicho/abdogfafejmdomllalkdegagoehgbdbk

Now that I had that installed I just wrote some simple jQuery that would grab the proxy ip/port and dump it to the console so I could copy and paste it into a text file.  Here is that code I came up with:
$('table tbody tr').each(function() {
var proxy = $(this).attr('prx');
if (proxy != undefined || proxy != null) {
console.log(proxy);
}
});


.each() method: http://api.jquery.com/each/
.attr() method: http://api.jquery.com/attr/

While inspecting the dynamically made proxy tables I found that each proxy is found in the tr element.  Then I found that each proxy tr element holds a special attribute prx that holds the ip:port that I need (What luck!).   Since there are other tr elements that don't have this trx attribute field I wrote an if loop to filter them out.  Lastly, I dumped the proxy to the console for manual copying and pasting into a text file.

The only downside to this is the lack of a gui but there is really no need for one.  This way, it grabs proxies very fast, and avoids the use of those slow headless browsers.  I assumed this is what AceJunior wanted because that is all that is going on with this site.  I looked into creating a text file and writing the proxies to it though js but I had a tough time with it even after reading through this article: http://www.html5rocks.com/en/tutorials/file/filesystem/#toc-file-appending

I'm going to make a video demonstrating the use for this so I don't get a million questions on it.  I'll edit this post with it + the url for my post on tbn once it's done.

No comments:

Post a Comment