GSA Discontinued: Google, the Google Search Appliance, and Beyond
Key Criteria to Consider when Migrating from the GSA
"We want search to work just like Google.”
This is a common phrase I hear from executives either directly or by enterprise architects who engage us to help with their search applications. And not all of these customers use the Google Search Appliance (GSA). The influence of Google on enterprise search is profound beyond GSA product. Of course, many of our customers did buy the GSA to implement search “just like Google.” And many GSA customers are happy with their systems.
Now GSA customers have been given an end-of-life schedule without much indication what Google has in mind as a replacement. It’s a good time to reflect on what Google has contributed to enterprise search and consider future GSA alternatives to which current GSA owners can migrate.
The “Big Google” for Internet Search
Asserting the desire for search working “just like Google” is not a meaningful statement of requirements for implementing search. I think it does represent expectations that we all have developed by using “Big Google.”
Working like Google begins with full-text search of documents with a simple list of query terms.
Working like Google also implies getting back relevant and satisfying results. This is a very high standard in the enterprise world. What makes “Big Google” so compelling is itsPageRank system that returns results ordered by the number of pages that link to each relevant page. This is enormously effective with searching the Worldwide Web, but is not normally effective in the enterprise.
With “Big Google,” you just type an arbitrary term or terms and get back great results. And that is what people think about when search works “just like Google.”
Beyond those two elements, there are a few other features that people use on “Big Google,” though they are not likely to identify them when asked what searching “just like Google” means.
One of “Big Google’s” great features is query completion which is also referred to as query type-ahead, autocomplete, and other names. As you type every single letter of the query, complete queries are listed in a drop-down menu as you type. What is neat about the Google autocomplete is that it may include your own queries as well as other queries presumably extracted from some universe of popular queries. We’ve all cut and pasted an error message and Google autocomplete a query that exactly matches your error message – presumably that means that many others have the same issue as you. Whether the query returns a solution to your problem is another matter.
Another widely recognized features is query spell check. Google will take its best guess at the word you are trying to use and return those results with the option to re-run the query with the user’s original spelling. It works great because it is right most of the time, although occasionally you have to tell it the original spelling is what you want to search for. But this is far less frequent than silently being thankful that it knows how to spell something you don’t.
These are links that lead you to images, videos, shopping, news, and others. This feature is similar to, but not exactly like facet navigators used in enterprise search.
This returns different forms of the same word, and occasionally finds documents that have only synonyms of original query terms rather than the original terms themselves.
The Google Search Appliance vs. Google Internet Search (“Big Google”)
One brilliant insight of Google was that search is not usually considered the primary driver of applications which otherwise require a search function. The most common GSA search applications in the enterprise support website search or intranet search. The next most common search application for the GSA is to replace the search function in mission-critical enterprise applications that simply aren’t doing the job expected of them. The result of their insight was the creation of a search appliance - an integrated hardware and software platform in a single package that the customer could simply ‘plug and play’ with a minimum of configuration and maintenance.
The GSA has many of the same or similar features of “Big Google.” The GSA began with Google’s well-known single search box for entering arbitrary query terms that would be used to search the full text of documents or web pages. The GSA generally produced (and still produces) very good search results out-of-the-box, without much tweaking of ranking methods. The following GSA features are essential to providing a good search experience - they are also among the key criteria for evaluating a solution to replace the GSA.
Plug and Play
The GSA can also be used as a standalone search application. The out-of-the-box browser-based search application with a simple search box and results that look like “Big Google.” Since this is the default configuration for the GSA, it was (and is) entirely possible to install the GSA, do minimal configuration for crawling a website, and end up with a website search feature with Google’s familiar search box and results that look like exactly like Google results down to the color scheme and the Google logo. In less than an hour.
In general, the GSA produces good results out-of-the-box without much tweaking of relevance controls. Some search applications benefit from adjusting relevance controls to improve relevancy of search results. However, tweaking the GSA’s relevance controls, similar to other proprietary search engines, generally do not have a substantial effect on search results. Often, tweaking relevancy in open source search engines can produce much improved results.
User interface (UI)
The look and feel of the default UI and the rendering of result set are very simple. The GSA natively returns search results in XML format with an option to transform the XML into HTML with an XSLT stylesheet. The GSA includes simple tools to easily customize many look and feel elements. There are simple check boxes for some items and primitive WYSIWYG tool to alter the layout of the search results page. Of course, you can directly edit the XSLT to change the color scheme and most other elements of the search UI. The ease with which you can create a custom UI is a great appeal to many GSA customers.
Beyond these important elements, however, it took Google time to catch the GSA up to other full featured search engines. Google developed a strong open source connector framework for which it supplied specific connectors to major content management systems, file systems, and databases. The framework has a well-documented API that customers (and third parties) have used to develop their own custom connectors. The GSA also has a feed mechanism that can be utilized to push content into the GSA with simple scripts or with feeds that originate from content sources themselves. This is often the best way to integrate locally developed custom content systems.
The GSA has an added "dynamic navigation" feature, an easy-to-configure feature based upon available metadata within the content.
Perhaps most interesting is a top notch security manager that supports a number of protocols and methods that are found in most enterprises both for user authentication and results filtering. The GSA’s security features are probably the most advanced of all leading search engines.
Analytics collects and reports on user interactions within an application context. Search is an important source of analytics information since search often reflects user intent much more clearly than other sources of analytics data. For website search, the value is even greater. It is a truism in website search that most users search website content with "Big Google" rather than a website’s own search function. Google Analytics helps compare the local site search performance (and user search satisfaction) with searches via Google.
GSA Discontinued and Key Features of a Successful GSA Migration
Google has announced that it is phasing out the GSA. With over 200 GSA customers, we have been thinking specifically about the best approaches that they can take for their future search strategy. An effort to migrate a search application to a new technology is also the ideal time to evaluate the effectiveness of the current search.
My experience is that even the most sophisticated search applications utilize the GSA’s most popular features discussed above, including those things that lead to ease of use and maintenance. So the bare minimum of features required for a successful Google Search Appliance migration would include:
- Security - The new system must honor the security model of each and every content source with exactly the same result that the GSA provides. Enterprise security models typically support document-level security that guarantees search users only see documents they are entitled to see.
- Search relevancy - The new search engine must give relevant and satisfying results on par or better than the current GSA application, with a minimum of tweaking.
- User search features – These would include query completion and spell check.
- Search results presentation - Many customers develop their search application around the XSLT rendering of search results. So migrations need to support and maintain compatibility with those XSLT stylesheets that customers have modified. The search results page should also include a replacement for the GSA's dynamic navigation.
- Crawling - Crawling websites and other relatively simple content sources should be as easy to configure and crawl as Google’s web crawler.
- Connector integration - The new system should be able to integrate currently available and supported connectors for most off-the-shelf content management solutions.
- Ability to build custom connectors - The new systems should have an extensible connector framework or system to build custom connectors.
- Analytics and reporting - The new system should support Google Analytics, and ideally include search user identification and click tracking through logging.
Download our free e-book to read about the top 10 criteria for selecting your GSA replacement.
Search Technologies is well-prepared to help companies migrate their GSA to a new search platform. In addition to the 200 GSA customers we’re supporting, we also have extensive expertise in implementing and supporting many search and analytics platforms, including open source (Solr and Elasticsearch) and commercial (Microsoft Azure, Amazon CloudSearch, Oracle Endeca, HPE IDOL, etc.) solutions.
Better Search Beyond the GSA
One thing that is missing from the GSA and in many other search engines is a comprehensive method for processing content to optimize search. My experience has convinced me (and every consultant at Search Technologies) that effective content processing for the purpose of search is a requirement for a successful search application in the enterprise. This requirement is magnified where multiple sources of content exist and where the knowledge domain of the content is varied. Content processing can deal with the issues that arise in enterprise search by performing one or more of the following data transformations:
- Content fusion – combining multiple data streams or document collections into a single logical index. Fusion is particularly important for applications aiming to present a Google-style “one search box” interface to users, despite multiple source of content.
- Content enhancement – adding additional content as metadata to improve visibility of a document in a search context.
- Normalization – making sure dates and other similar data points have the same format so that document sorting and filtering work correctly.
Search Technologies has developed Aspire, a content processing framework designed to facilitate these transformations. I will discuss this in some detail in subsequent blog posts.
This last statement signals my next discussion about what to look for when seeking to improve core enterprise search and moving from the GSA to other search technologies in the most effective manner (read about how to choose a search solution to replace your GSA). Stay tuned for my next post and feel free to send us your GSA questions or comments at firstname.lastname@example.org.