Roundabout – A high-performance, distributed crawler
When I started development on the Arachni high-performance grid my focus was on the audit part, i.e. find a way to distribute the audit of batches of individual elements across multiple nodes and avoid duplication of effort amongst them.
It was a bit tricky to get right but it turned out to be quite do-able and worthwhile.
However, the crawl was done the old fashioned way, the master instance would crawl the targeted website and once completed it would then analyze all the pages it found and spread the workload.
I always intended to try out my hand on something similar but aimed towards the crawling process but it wasn’t a high priority.
But, as you can see from my last post, I did sort of figure it out, although I hadn’t had a chance to implement it until now.
This is tricky to do because there’s no way of knowing the workload before hand as it is basically a freaking labyrinth and precious information (new paths) can be hidden behind walls and walls of crap.
On the other hand, since when running Arachni in HPG mode you already have a few nodes up and running in the first place, why not utilize them a bit more — even if it turns out to be only slightly faster than a single crawler.
With that in mind, I yesterday started to implement that sort of a crawler, and here it is.
Its sole existence is that of a toy, a fun experiment, and not as a stable system. I may, in the future, put some more effort into it but my main reason for doing this is to explore this idea and eventually port it over to Arachni.
If you find this interesting, want to help out in researching or have any sort of feedback or just want to get in touch don’t hesitate to do so.
Leave a comment