Back to top

Filing Cabinets and Metadata, the “M” Word

Like the typewriter, carbon copies, or fax machine, the traditional filing cabinet is largely a thing of offices past. Those hulking cold, grey or gunmetal behemoths no longer take up precious real estate in hallways and closets. 

The files they once contained are now stored on our hard drives, or in the cloud where they can be accessed from anywhere in the world. While the modern electronic filing cabinet, aka a shared hard drive, or for high-value documents, a content management system, is undoubtedly easier to scale or relocate, it retains some fundamental properties from the physical predecessor. 

Almost always, documents are kept in a hierarchy which has been designed by a human. In the old days, an individual would privately decide on a file plan for the documents kept in their personal drawer, which stood with others in the corner of the office. At an enterprise-scale, an archiving department would be collectively responsible for the creation and maintenance of a hierarchy for corporate use. Often, archive department staff would act as gatekeepers to shared information. Such centralized control of hierarchy design is rare today. Individuals still file their own documents into their own hierarchies on their laptops, and functional or departmental teams create and maintain shared project hierarchies. But gone are the days of the “enterprise hierarchy”. 

Or are they? 

Hierarchy is Metadata 
Without that metadata, carefully applied to define the position of a document within the numerous filing cabinets, hanging files and folders that used to exist in your basement, the chances of finding information to service an immediate business need were close to zero.

These days, thanks to enterprise search technology, we still have a reasonable chance of locating documents based on their internal content, regardless of how badly they may have been misfiled. But as data sets continue to grow, even that becomes harder within the enterprise. We are searching for needles in a haystack that doubles in size every 18 months or so. 

Don't Mention the “M” Word
If you are relatively young, metadata may not be a word that sits easily with you. Didn’t we do away with the need for metadata? Surely the advanced technologies offered by Bing, Google and others on the Web, prove that metadata is no longer important? 

Wrong. 

Furthermore, this popular misconception lies behind the continued under-performance of many enterprise search systems. At Search Technologies, we sometimes refer to metadata as the “M” word. On numerous occasions, members of an audience listening to a presentation about enterprise search have suddenly looked sleepy, or they start to fiddle with their Blackberries and iPhones, once the “M” word has been introduced on a PowerPoint slide. 

Let’s step back for a moment, and consider just how important metadata is to a modern search application. Generically, the three most useful functions of a search results user interface, in terms of helping to quickly find the desired information from a large data set, are:

  • Relevancy ranking
  • Results sorting by property, and
  • Search navigators

Both on the Web and behind the firewall, relevancy algorithms are substantially influenced by metadata such as document titles. You can’t order results by location, price or date unless metadata has been appropriately arranged in the search index. Search navigators (aka dynamic navigation or faceted search) are a proven way to deal with very large data sets. From Ebay and Amazon to numerous corporate intranets, search navigators help users to quickly cut the results-set down to size, and focus on areas of immediate interest. All of the above rely totally on metadata. 

As an aside, if you are getting into the much talked about “big data” analysis applications, then metadata will be equally important to you. All of those infographic pie charts, trend graphs and scatter-grams that provide a holistic view of what’s happening in your world, are driven by metadata. 

Human Help is Available
For all of the above, metadata is usually derived in an automated way using technologies such as entity extraction and categorization. This is cool. But at Search Technologies, we think it is a shame to ignore the human input. Here are two abstract examples of file placement within a hierarchy: 

C:Accounts/audits/XYZ-Corp/2012/balance-sheet.xls 

H:Engineering/projects/PSG-Upgrade/draft-specs.docx 


This is potentially useful, human-originated metadata, but it is encoded in a file path. The document called balance-sheet.xls may only contain numbers, plus a few low-value (for search purposes) words such as “total” and “balance”, which will not help it to be found. But the metadata embedded in the file path provides a strong indication of its value and purpose.

Enterprise Content Browsing
We are working with a growing number of customers who are recreating the “corporate file plan”, but in a decentralized, and perhaps even a democratic way. It seems to be innate human behavior to browse. Taxonomy creation is proposed by some to be one of the oldest professions, although the competition for that title is considerable.

Still today, humans innately understand hierarchy and know how to work with it. The modern equivalent of the archiving department is to use technology to stitch together all of the hierarchies in the enterprise. Given the number of people and departments involved, and the variance of their outlooks on the importance of document filing, and indeed the importance of the documents they are filing, this will make for an inconsistent hierarchy. Carl Linnaeus and Aristotle may well turn in their graves. 

Yet it can be extremely useful if combined with a good search experience. Consider the following user journey.

  • Deploy a simple search query and get results
  • Click a search navigator to quickly refine to a specific area, subject or department
  • Click on a search result. It looks useful, but the info seems incomplete
  • Click on a link embedded in the search result which takes you to the part of the “enterprise hierarchy” in which that document resides, and explore other files in that folder, or maybe navigate to related folders
  • Find a related folder that you didn’t know existed, but looks promising in terms of content
  • Move to that folder and deploy a further search, but limited to that folder and its sub-directories

Of course, all of this needs to be done while fully respecting your document-level security regime. So users of this combined search/browse environment will see a filtered view of the corporate hierarchy, depending on their permissions. This mixture of search and browse  suits most people, most of the time. That's an excellent foundation for enterprise search success. It is especially useful for high-value knowledge workers; the thinkers who make a difference to long-term corporate prosperity. 

Best Practices
For enterprise search applications specifically, we see a trend to take metadata seriously again. Within most large organizations there are a few folks for whom the importance of metadata never went away, and who, against all odds, have carried the flag though a period during which their message generally fell on deaf ears. We have always kept the faith. 

So let’s return to the subject of filing cabinets. Whether at an individual, departmental or project level, it is our experience that the tradition of filing documents into hierarchies lives on. This process creates useful, human-generated metadata. The more important the corpus, the more diligence tends to be applied to the creation of hierarchy and the placement of files within it. These hierarchies will be far from perfect. For sure, they won’t comply to an overall schema. 

The enterprise search users who make a significant difference to your corporate prosperity are smart people. They will look pragmatically at the hierarchies they come across during their information retrieval activities. They will contextualize, and they will make the most of what exists. 

According to analysts, unstructured information growth continues. Some say at 80% per year. Whatever the rate is, organizations should use every advantage they can to help ensure employees are productive. We think that the concept of Enterprise Content Browsing, combined with a good enterprise search implementation, represents current best practice.

At Search Technologies, we are working with a growing range of corporate customers to put this vision into place. The melding of search and browse is a killer app for business productivity. It requires a mix of technology, expertise, processes and pragmatism.

All of which are available from Search Technologies.