Building A Feed Reader #2: Understanding The Problem(s)

Welcome back to this series where I share my experiences in building a feed reader.

To keep you interested, and prove that I’m not just talking, what I’ll be doing is showing you where I’ve reached so far and then go on to the meat and potatoes

Progress As Of Today

Grouping

Feed Cruncher can now group articles by Date and By Month, which you toggle with buttons on the menu bar

grouping

Grouping By Day

This groups all items by day, so you can see at a glance all articles from a particular day. This is good for blogs that have lots of daily posts. And believe me there are many!

GroupingDay

Grouping By Month

This group all items by month, so you can see at a glance all articles for a particular month. This is good for blogs that post less frequently

GroupingMonth

Settings

The application can now persist settings

Settings1

Settings2

Lesson For Today: Understanding The Problem

So, as I mentioned yesterday, I found myself bemoaning the inability to find a good reader. So I decided to write my own.

Deciding to write your own does not necessarily mean starting from scratch with File -> New Project. There must be some readers with source available. As it turns out there are two that I know of: RSS Bandit and RSSOwl.

RSSOwl I dismissed as a non starter because it is written in Java. My last brush with Java was many years ago and describing myself as rusty would be flattering myself.

Which leaves RSS Bandit. I used to use it a while back until the horrifying feature creep that Jeff Atwood speaks so eloquently about began to creep in. Newsgator? NNTP? It also uses Infragistics for UI, which is yet another huge library. A bit much if all you need is a listview and a treeview. And every time I used it it would seem to fool itself into thinking it was SQL Server 2008 and begin taking all the memory it could get its hands on!

Ok, ok, those are lame excuses. RSS Bandit is excellent software but it does not do some of the things that I consider crucial. I also don’t agree entirely with some of the design decisions that were taken. Eventually I decided to start off with a clean slate, make all the choices and design decisions myself and build the freed reader!

So, that is the problem.

Build A Feed Reader

From experience and education, and lots of reading, I have come round to the school of thought that all big problems (including Wars In Iraq, Inflation, Reality TV, etc) are actually composed of many smaller problems.

Solve these small problems and the big ones automatically get solved.

So what are the small problems in building a Feed Reader

Problem Definition

  1. Knowing where feeds can be found, or discovering new feeds to subscribe
  2. Storing these subscriptions, and allowing user to manage them
  3. Connecting to remote websites with subscribed RSS feeds
  4. Downloading RSS feeds from the Internet
  5. Interpreting these feeds
  6. Storing these feeds
  7. Displaying these feeds to the user
  8. Allowing user to mark as read, tag and otherwise manipulate these items
  9. Storing these changes
  10. Updating feeds
  11. Allowing user to search all these
  12. Other features e.g. search, export, etc

Looking closely at the problem definition, we can tentatively abstract the following as the core functionality

  1. Download Feeds (Connectivity & Interpretation)
  2. Store them (Storage)
  3. Display to the user (Interface)

Download Feeds (Connectivity & Interpretation)

Here we will need to connect to a website, download its RSS feed and then interpret it. Why interpret? Because there are several formats:

  • Atom
  • RSS
  • OpenSearch
  • RSD

At first I was considering doing this using a WebClient to download the data and a lot of XML and Regular Expressions to identify and parse the feeds until I stumbled upon the Argotic Syndication Framework. Brilliant does not begin to describe it. It does exactly what I wanted and then some.

In the course of using it I have found some quirks and some bugs but the guys get 11 points out of 10 for the excellent job. A third of my battle is won!

Store Them (Storage)

The next problem was storage. Basically there are three options:

  1. Store the actual XML on disk
  2. Store the actual XML in a database
  3. Interpret the feed and split it into its constituent parts and store those in the database

Storage on disk is the least attractive option for a couple of reasons

  1. There will eventually be a performance hit after getting a lot of files on disk to process one after another
  2. Search will be very difficult, if not impossible
  3. It is bound to be a wasteful strategy because if an entry is added you download the old entries all over again
  4. Some other reason I can’t remember at this point

Which leaves storage in a database. Which opens another can of worms … which database to use?

To be lightweight clearly we need something … well, light. Which knocks out of the running things like SQL Server Express. What is lighter than that? Microsoft Jet. Which leaves me very uncomfortable. I’ve never fully trusted jet after getting my fingers burnt several times. Is there something even lighter that can run in process?

Turns out there are several. SQL Server Compact Edition, VistaDB, CodeGear Blackfish. Since I am fiscally constrained, VistaDB and Blackfish need not apply. So SQL Server Compact Edition it is. Although I would have preferred to use either of the latter because they support a greater SQL dialect, triggers, stored procedures and views. There’s also SQL Lite, but this is a good an exercise as any to try out SQL CE before I deploy it for some real projects that are coming up.

Reading some of the white papers about SQL Compact Edition fills me with admiration. Some people in Microsoft are truly masters of spin. It is only a brave man who will attempt to explain how lack of triggers, views and procedures is a good thing! Maybe its just me but I don’t find the arguments as to lack of these things convincing in the least.

But I digress. Once settling on that and doing some research, it seemed that it addition to what’s missing, it actually stores XML as NText, which knocks out option 2.

Which means I must interpret the RSS feeds, extract the content and then store it, which on reflection sounds like a pretty solid solution. Why? I can search quickly using normal database expressions. I can do things like sort on various fields — date created, author, etc. And it won’t be too hard to generate XML should I need to

Display To The User (Interface)

This is the final part … displaying to the user.

Since I want the application to be desktop based, Windows Forms is the obvious choice. I could do WPF but I’m pressed for time and WPF might be an overkill for this. Displaying the feeds out to be simple enough. All we need is to load the feeds from the database at startup and populate the tree view on the left. Then when you click on a feed populate the listview on the right. Then when you click on an item, display its content in the WebBrowser pane below.

Here is a mockup

mockup

This, as it turns out, despite sounding simple, is a bit more complicated that I thought

Stay tuned for Episode 3: Downloading Feeds

kick it on DotNetKicks.com

Share and Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.
  • blogmarks
  • co.mments
  • del.icio.us
  • digg
  • Fark
  • feedmelinks
  • Furl
  • LinkaGoGo
  • Ma.gnolia
  • NewsVine
  • Reddit
  • TailRank
  • YahooMyWeb
 

Other posts

One response


  1. I really enjoy this series! At the moment i’m using rss.net (from www.rssdotnet.com) to consume feeds, but after what i’ve seen from Argotic … i will switch. :-)

    I also think about storing the feeds into a database. I thought about MS-SQL Express too, but what worries me is the 4 GB storage limit (not to mention: performance). If you use ADO.net it shouldn’t be difficult to use another database, as long as there’s a .net connector. Have you thought of using MySQL?

One trackback

Leave a Reply