Get Yer Feeds Here

Posted July 6th, 2007 by rybolov

Becoming slightly annoyed with the problems getting feeds from yahoo pipes, I set up a simple cron job to snarf the rss off the yahoo servers every 5 minutes using wget. Then I changed the hrefs to point at my own server.

While testing wget, I found out why the pipes were bombing out: The pipes server doesn’t issue a response until it has computed the feed, then it sends it all at once. This might be up to 10 seconds before the RSS reader gets any kind of a response back, which puts it into timeout territory some of the time. Trusty ol’ wget worked every time, though–I swear it’s one of most reliable programs I’ve ever used at feeding it glop and getting back pure water.

So here you go. If you were having problems with getting blank feeds, it should be happy now. These are off the chateaublogsville server.

Similar Posts:

Posted in Odds-n-Sods, Technical | No Comments »

A Day in the Life of the Feedmaster

Posted July 5th, 2007 by rybolov

My customers, they come to me looking for nourishment, a late-night snack, or maybe some light reading. They want to be fed and they want it now, and I wake from my slumber to give it to them. They walk away satisfied.

My name is Mike. I am a feedmaster. This is my story.

Late last night I took Chateau Blogsville live and I’ve been adding to the filters throughout today in order to tune the output. Suspiciously, this is what life is like for the analysts working in our SOC. =)

Lessons from tuning feeds periodically during the day:

I have a sizeable set of explicit blocks for quite a few terms coming from the search feeds. Even though I could build the search feeds with “NOT” values, I still had a bunch of trash that was more effectively deleted by a global junk screen.
I developed an “allow” filter based on keywords in the content. This is what I call the “relevancy filter”. In Chardonnay, it’s used for the dirty gray and gray feeds. In Eiswein, it’s used for everything.
I’ve done more blacklisting for the search feeds (dirty gray feed) on urls than I have on keywords for the time being, making broad slashes through aol.com and myspace.com. Time will tell if this will be a fool’s game, since the spam blogs can come on pretty strong, and the only way to be sure is to nuke them from orbit.
I think I’ve pushed pipes beyond what it can do. About every third time, I get a null results set (ie, it times out). If you’re using a smart feedreader (I just make the feed a live bookmark in firefox), it just keeps the last version and you don’t really know or care that your feed is outdated, as long as it catches up sometime.
“Privacy” is the hardest thing to explicitly allow thanks to real estate, vacations, and dating. “Risk Management” comes in a strong second, thanks to banks, loans, and project management. Surprisingly, nobody but security people talk about BS7799.
I’ve roped in some really, really surprising content through the blog searches on technorati and google. What this means is that I’ll find sites like The Technology Liberation Front which I’m now a fan of. With as much of a hassle the search feeds are to filter out the junk, I think they definitely add something that a closed or by-invitation-only blog feed is missing. I’ll most likely add more feeds like this as I think them up.
Some of you will notice that at no point have I blacklisted the C-word (c*mpliance) but notice how it chokes itself to death nicely when you deny all but allow “risk management” and “penetration testing”?
There are a couple of terms that I deliberately did not add to the relevancy filter. Dollar for the person who names one, and the C-word doesn’t count.

Chateau Blogsville is now officially open. I will replace the RSS icons with something better once my graphic designer gets them done.

Similar Posts:

Posted in Odds-n-Sods, Technical | 4 Comments »

And Now for Something Completely Different…

Posted July 4th, 2007 by rybolov

I’m doing some more yahoo pipes work–aggregating and filtering blog feeds. I’ve created a combination of whitelist and a highly filtered set of search results known as Chardonnay, and I’ll eventually make a less-filtered “2-Buck Chuck” and a highly-filtered Eiswein version.

My basic rule-of-thumb for the Chardonnay feed is that if the signal-to-noise ratio of a blog is less than 3:1 or so, I would bump it into Tier 2. Not that they don’t have any good content, but I was trying to keep my feed at least 8:2 signal-noise ratio.

For the Eiswein feed, I’m aiming for 9:1 signal-noise ratio. In order to do that, I have to filter everything, including myself. =)

As far as 2-Buck Chuck, well, let’s say it’s so unfiltered that it has chunks^wpieces of sediment in it. It’s also hard to build something like this and intentionally disable the quality controls you’ve built.

“Why the wine motif?” you ask. Well, I was looking for something that has a price and quality range, so wine fit right in there. I bought www.chateaublogsville.com which will be the entry site for the 3 security blog feeds. It might take me a couple of weeks to get up a simple site but in the meantime you’re free to subscribe to any of the feeds.

One thing that I’m finding out about blog feeds. For the Chardonnay, I had to look at a couple of approaches to feed aggregation. I started out with a linked-to list of people and a desire to have a google and technorati catch-all search to find some relevant information from little-known feeds. After working with some data munging for a couple of days, I notice that the source feeds fit into the following groups:

Tier 1 Feeds that I want to let through pretty much unfiltered (Mine, Matasano, Curphey, ISM-Community, Bejtlich, etc)
Tier 2 Feeds that need to be filtered for relevancy (Security Bloggers Network members, news site aggregages that I haven’t whitelisted above)
Tier 3 Feeds that need to be filtered for spam and then filtered for relevancy whilst wearing lead gloves (technorati and google searches)

Now that I write it all down, it sounds exactly like writing email filters or SIEM tuning or any one of a bazillion uses that you could have for filtering, so I’ve once again recreated ideas that already exist. Of course, I probably could have saved some time by approaching the problem from this angle, but really I had to move the ideas around a dozen different ways until it fit in a way that made sense.

The funny thing is that I had the hardest time filtering on privacy. I was getting too much junk off the blog search feeds (privacy of timeshares, that kind of thing), so what I’m playing with is killing privacy from the main filter and then filtering the search feeds on privacy and a second keyword.

The usual disclaimers work here: I’m playing with content provided by other people, so I don’t even remotely pretend to have any control over it. There are a couple pieces of junk that will slip through the filters. Because the source of the filters is open for the world to see, you can cheat them by including the right words.

Similar Posts:

Posted in Odds-n-Sods, Technical | 7 Comments »

Being a PR Wonk is Hard

Posted July 3rd, 2007 by rybolov

So I’ve gotten ISM-Community a wee little bit of press over the past week.

Dark Reading

IT Backbones Security

About the best advice I’ve gotten on PR stuff was from Paul Graham’s essay The Submarine:

“A good flatterer doesn’t lie, but tells his victim selective truths (what a nice color your eyes are). Good PR firms use the same strategy: they give reporters stories that are true, but whose truth favors their clients.”

In other words, I wrote our press release so that it was easy to cut and paste the sections that a reporter would want. Instead of giving them a list of facts, I gave them a modular story with some good quotes that could be cut off whenever they wanted.

Anyway, you’re hearing about me being a PR wonk because that’s about all I have time for right now. I can’t talk work-related stuff because for the next couple of weeks it’s all stuff that nobody needs to know about–covert missions and whatnot. =)

The Guerilla CISO

Feeds

Phone-Readable

Recent Comments

What’s Hot

Tags

Categories

Blogroll

Archives

Get Yer Feeds Here

A Day in the Life of the Feedmaster

And Now for Something Completely Different…

Being a PR Wonk is Hard

Visitor Geolocationing Widget: