Trains

© CBS

My brain has become kind of an amorphous blob of mush recently and I though I needed to change things up a bit. Apparently I’m not one of those people who can productively code for 10+ hours a day. (What did you think I did after work?) Anyway, I wanted to get back to my roots as a network admin and do something different so I decided to pick up a few inexpensive MicroTik routers and learn some advanced routing.

Anyway, while waiting for the gear to arrive I decided to watch this excellent training video and then decided I needed to design a “network” to route. The network I designed loosely resembles a real datacenter style network but much smaller. In fact, small enough to fit on my desk. Like a miniature train set.

Anyway, here’s the plan. I’m going to configure the routers first and then I’m going to try and get them to distribute their routes via RIP, then OSPF, then BGP. I’ll try configuring some HA stuff and testing what happens in various situations and maybe later on even play with VRRP. Here’s a diagram of the network topology I’m going to use first.

You might think it odd that I decided to handle all of my “peering” on a switch instead of with routers, but really this is pretty common. Since all those routers are “ISP” routers, they can be trusted to some degree so the additional flexibility and performance of using a switch should be nice. For example, to upgrade the TRANSIT router or to increase capacity, another one can simply be added to the switch and the full bandwidth becomes available to the mesh. Also, it allows me to plug one cable into my computer and run WinBox to manage all these things via MAC ;-)

Oh, and the 192.168.x.x addresses are “customer” addresses. That’s just so I’ll have something to ping and traceroute to.

 

DNSWash Is Alive

Well, the brochure-ware part anyway. I wrote up some initial policies last night, sketched the UI for the community portion and have started coding up the database. Anyway, further updates and lots of policy information can be found over on the new website:

http://www.dnswasher.org

The Project – DNSWash

I saw the news, OpenDNS is rightfully going to start charging for their service for businesses. Unfortunately, this puts a lot of people in a bit of a pickle because their pricing isn’t very transparent, and most people haven’t budgeted for the change – and we just started a new year so budgets don’t get re-done for a while.

Well, I got to thinking about it and decided “It can’t be that hard to implement DNS filtering, surely I could write a little filtering DNS server and hook it up to a blacklist and that should get people through for a while.” And surely enough, in about 20 lines of Ruby and about 15 minutes I had a fully functional recursive resolver with caching and filtering built right in.

My project for the weekend is going to be to write a web app to manage various filter lists and expose a basic HTTP API so that I can release the whole thing as a working proof of concept and get some feedback. I expect the way I’m going to implement this performance won’t be that good but it should be totally usable by the end of the week. The reason I chose this option is because I can throw a basic web app up on Heroku for free and that will mean the whole project can operate indefinitely at no ongoing cost to me.

But, what I really want isn’t a set of programs that work together to handle this simple task. Ultimately, I’d like to have a great database of community maintained and owned domain categorizations with real process and policy behind its maintenance. Once the database is mature enough for production usage I think the best way to share it would be using BIND RPZs per category and probably also dnsbl style.

Anyway, I’m going to put together a website for the project and start thinking about policies before I write the actual database app.

Stop SOPA / PIPA

You may have noticed many websites of the Internet are speaking out against SOPA and PIPA (Stop Online Privacy Act and Protect IP Act, respectively). While piracy online is a huge problem and not something to be taken lightly, these pieces of legislature are damaging to the way the free Internet operates.The proposed legislature offers a shoot-first, ask questions later approach to shutting down unfavorable sites. Unfortunately, those with their fingers on the trigger (movie studios, etc) have a long and checkered history of abusing their existing powers and giving them an even more blunt tool to work with would be most unwise.

For a general summary,  you can watch the rap below, or check out this great Infographic.

 

Ready to act? Visit http://americancensorship.org/

Ruby Concurrency

Over the last few months I’ve been experimenting with and testing various methods for using concurrency within a Rails application for improved performance. Specifically in handling large amounts of data. The app I’ve been testing with is for all intents and purposes simply a web scraper. I went through several designs of the underlying system and settled on a final design that I’m really happy with. The code doesn’t entirely work (it’s a database problem), but the theory is solid and the performance is purely amazing.

My initial inspiration was delayed_job since I was familiar with it and knew it would work and how I would implement that style but I ran into scaling problems even just on my iMac because just a few threads would hammer the database hard enough with reads and writes that things would get crazy in just a few minutes. I then decided it might work to use an array / queue to cache things needing written to the database and, well, don’t do that. The real problem was that the individual scrapers were reliant on a database for both getting a job and noting it was finished. This also led to a lot of duplication of logic so the scraper and controller mechanisms had to be tightly coupled and kept causing deadlocks and collisions.

I didn’t like that, not at all. I decided anything that constantly polled a database would never work at scale, and decided I’d fix the problem for real – with messaging. In my current implementation, I have a Padrino app that handles the web interface and could provide an API out to something else if you really wanted it to. On the backend I have two rake tasks, only one of which connects to the database.

When a job is created in the system, it sends a message to the scraper queue to process the initial URL. It also does some things like read robots.txt so that we don’t poll too hard or hit urls we shouldn’t. I mean, just because this is a toy app doesn’t mean we should be sloppy ;-).

The scraper queue is listened to by this rake task. The rake task manages a thread pool of 20 threads (in production that would probably be a lot higher) and simply processes requests as they come through using MetaInspector. Once it has fetched the data (which will almost always be more waiting on the remote server than anything) it packages it all up into a nice neat JSON blob (I’ll probably switch to MessagePack in the future) and queues it to hit the manager process.

The manager process is where the app’s intelligence lies. It listens to the write queue for completed scrapes, writes them to the database, figures out the next page to scrape and queues it. As you noticed there is very little code in either task, and even the models are pretty skinny. By relying on a messaging queue I’ve managed to greatly simplify the architecture of the app while still obeying things such as priority and scrape rates. I wasn’t able to accurately benchmark this mechanism due to system problems I didn’t have time to troubleshoot but it held up (using an sqlite database) to a fast enough write performance the database logs spinning past my screen were making me dizzy.

If you want to try it out, check out the code on my BitBucket. Also, feel free to share your thoughts, comments and suggestions either in the comments or directly.