Back to top

A User Interface for Wikipedia Search

Paul Nelson
Paul Nelson
Innovation Lead

Part 4 of 4

Okay, we have acquired the data, cleaned it up, indexed it, and the search results are high quality. All we need now is a sexy interface to package it all up.

While Search Technologies does have some really top-notch end-user interface programmers, it’s not my personal strong suit. So I started with a simple XSLT interface which transforms the results into a better styled results page. This was nice, but it lacked any sex appeal and had no features to speak of. Then I considered Java Server Faces, or Velocity Templates, or jQuery (reading the results in JSON directly from Amazon CloudSearch). Nothing felt right.

All along, what I really wanted was a Twigkit interface. After all, Twigkit interfaces are super easy to create, and they look fabulous. And they handle all of those annoyingly difficult search features, like navigators and paging.

Going the Extra Mile #14: An Amazon CloudSearch Platform for Twigkit
The only problem was that Twigkit didn’t have a Platform implementation for Amazon CloudSearch –it’s just too new. And so I would have to create one myself.

As it turned out, creating a Twigkit platform was not too difficult – once I figured out how some of the foundational technologies (JSP tag libraries and Google Guice) were put together. Twigkit platforms are built on three primary classes:

  • Platform – Actually executes the search
  • QueryAdapter – Translates Twigkit query objects into Amazon CloudSearch query objects
    • In other words: Query = Twigkit to CloudSearch
  • ResponseAdapter – Translates Amazon CloudSearch results into Twigkit results objects
    • In other words: Results = CloudSearch to Twigkit

Now anyone who wants to create a beautiful, easy interface to CloudSearch can do it with Twigkit. Give us a call and we’ll send you a copy of our Twigkit CloudSearch Platform jar.

Going the Extra Mile #15: A Java API for Amazon CloudSearch
Initially, I just converted Twigkit objects directly to/from the CloudSearch restful API. But that got to be a bit cumbersome, so eventually I decided to create an actual Java API for CloudSearch. Using this, anyone can easily interface with CloudSearch using any Java program, and all of the communication details are taken care of for you.

Creating the API was not difficult. The three primary classes are:

  • CloudSearchQuery – Holds query parameters
    • Search string, paging parameters, filters, facets, fields, sorting, etc.
  • CloudSearchClient – Executes the query (forms the search command, sends it to CloudSearch, receives and parses the results)
  • CloudSearchResult – Holds search results
    • Includes fields, facets, facet values, facet counts, total documents found, etc.

With the Java API, I was able to simplify the Twigkit Platform implementation. This made a clear separation between CloudSearch and Twigkit, improving the UI structure overall.

Give us a call if you’d like a copy of our Java API for Amazon CloudSearch.

Going the Extra Mile #16: User Interface Tweaking
Once the Java API and the Twigkit platform were ready to go, I plugged everything together and things started working. After that, it was just a matter of tweaking until I got exactly what I wanted. Mostly it was adjusting the CSS using FireBug.

The only part which required any additional thought was facet sorting. Turns out, I had to implement it as part of the CloudSearch Java API in order to get it to work. Se la vie.

And so, there you have it. A fully featured Wikipedia search application using a brand-new engine based in the cloud.

I was hoping for 26 “extra miles”, but only got to 16 – which I suppose means that this was more of a half-marathon than a marathon. But that’s okay, because there’s lots more we are planning to do with this Wikipedia search Lab.

All told, this probably represents about a month’s worth of work, spread out over three months of calendar time. Of course a lot of it was adding new features to Aspire (streaming decompression) and creating new APIs for Amazon CloudSearch (the Java API, the Twigkit platform), plus simply researching and learning a brand new search engine.

And it was all great fun.

-- Paul

< Return to Querying & Indexing                                                  Return to Blog Summary>