DPMS Applicability and Use Cases
Content Providers on the Web
As the internet matures, many companies’ entire product inventory and sole value proposition is their electronic data. This includes digital publishers, patent and academic research resources, industry-specific Web portals, online classifieds, directories, E-commerce sites, and many more. Data quality is critical to the revenues of these companies, and they rely heavily on search applications to deliver value to their customers.
Intranet / Enterprise Search Applications
Behind-the-firewall enterprise search applications have a reputation for underachievement. They have to deal with a full range of user types, a disparate (often huge) range of document and data sources, and a constant rate-of-change of data repositories, security schemas and application requirements. DPMS provides a framework for taming this complexity, and helps the data "play well together".
Example Use Case: Publishing
This example concerns a large government publisher / printer. In the past, government documents were only available in print daily/weekly/monthly/annually. Individual publications were sometimes over 1,000 pages long.
This customer has a mandate to make all such content available in a digital format to enable wider, easier access by citizens.
In this application each document has its own history, user community and structure. In other words, the data is extremely heterogeneous. Further, a wide range of user requests must be catered for, from simple search requests by citizens about local issues, to very specific queries from research professionals focused on niche subjects. So far, more than 20 disparate collections have been fed into a single, publicly accessible search application. In such an environment, application development can easily grind to a halt under the burden of a seemingly endless stream of user requirements, document types, and schemas.
DPMS is providing a framework for solving this problem in a reliable, predictable, and scalable fashion. The DPMS technical architecture allows for the flexible incorporation of multiple collections without any loss of the richness and history of the originating sources. Users searching over the entire collection benefit from a tuned “single search box” experience. Researchers interested in individual collections benefit from source-specific search navigators and other advanced search functions.
This Government search application has won awards for furthering the general availability of Government information to US citizens.
Example Use Case: Patent Search
Although all patent documents are publicly available and many are described in a common XML format (WIPO Standard ST.36), creating a search engine for over 60 million documents from a wide variety of countries, languages, and patent offices is a daunting task.
Patent records from different countries vary significantly. Countries have their own numbering systems, status codes and classifications. Some provide only metadata but others include the full text of the patent. Some patents include PDF renditions and images of patent illustrations.
Patents have been around for a long time (the first was released in the United States in 1790), so their format and available metadata has evolved. Classification codes are constantly being revised and individual patents may (or may not) be updated to reflect these changes. New patents are issued daily, as are changes to existing patents and metadata (classification changes, changes in ownership etc.). These must be processed in a timely manner to keep the search experience relevant to the user community.
In such a challenging environment, with such a large database of documents, a well defined and controlled methodology is needed to bring order and reliability to the process. DPMS enables the majority of the documents to be ingested in an automated way, but documents not meeting a minimum quality standard are quarantined for later reprocessing. This approach enables the database to be iteratively refined in a predictable fashion to achieve the highest quality of search results.
Tests comparing this search-based application with competitive patent research products show that it provides significantly improved user satisfaction.
Example Use Case: TV Listings
Creating a searchable database of television listings involves merging data from a wide variety of sources, including standard listings from the “TV Guide”, IMDB movie listings, social network ratings, video on demand listings and others. This fusion process is complicated by the need to group listings; for example, grouping all showings of the same episode, all episodes of the same TV series, or all movies from the same franchise, so they can be browsed together and searched as required, as a single item.
DPMS is uniquely qualified to handle these data merging and enrichment tasks. Records can be grouped according to relevancy ranked matches using standard search engine techniques. Media can be rated using a combination of social ratings and other factors such as box office receipts and Neilson metrics. These ratings can be incorporated into search results to provide users with the highest rated media for any type of search, be it a person search, a program search, or a genre search.
Finally, this search application is primarily used by consumers pressing buttons on their TV remote control devices. So millisecond response times are an important part of the end-user experience.
Example Use Case: Government Laws
Many governments, at all levels, have what is known as "The Code" – a written description of all of the laws which govern the jurisdiction. Although this can be considered to be a single "document", it is usually published as many volumes, each of which may contain numerous titles, sub-titles, chapters, sub-chapters, parts, sub-parts, sections, and sub-sections.
Clearly, searching entire volumes as single logical entities will produce poor results, since each volume may be thousands of pages long and cover a wide variety of topics. Furthermore, the nested structure of large documents can be enormously complex and is typically hierarchical. Visibility into this structure is important to the user community. Finally, since these documents are created over long periods of time (hundreds of years in the case of the United States Government) and by multiple agencies within the Government, formats can vary dramatically.
With DPMS, highly productive searching over such collections can be accomplished through a well defined and predictable process. Volumes are split as necessary into smaller sections for better searching. The hierarchy of the document is preserved along with metadata at every level. Everything is entered into the search engine which can then serve up individual laws or sections as easily as entire chapters, sub-chapters, titles, etc.
Such a flexible environment provides great search productivity across the user spectrum, from casual browsers to professional researchers.