Improving SharePoint Online Search with Engine Scoring
In his “Improving Search Accuracy with Engine Scoring” white paper, our Innovation Lead, Paul Nelson, accurately laid out a challenge often encountered by our clients: The number one complaint about search engines is that they are “not accurate” and that they often show irrelevant, old, or even bizarre off-the-wall documents in response users’ queries. This problem is often compounded by the secretive nature of search engines and search engine companies. Relevancy ranking algorithms are often veiled in secrecy (described variously as ‘intellectual property’ or more quaintly as the ‘secret sauce’). Even when algorithms are open to the public (open source search engines, for example), the algorithms are often so complex and convoluted that they defy simple understanding or analysis.
Top quality search accuracy is not achieved with technology alone or through a one-time “quick fix.” It can only be achieved with a careful, continuous improvement process driven by a varied set of metrics that we refer to as Search Engine Scoring – read our in-depth white paper here. The starting point for such a process is an objective, statistically valid measurement of search accuracy.
The Impact of Poor SharePoint Search Accuracy
For organizations using SharePoint Online as an operational tool, poor search accuracy can be frustrating when it adversely impacts the bottom line. Below are just a few common examples.
- For corporate wide search or intranet search – Wasted employee time. Missed opportunities for new business. Re-work, mistakes, and “re-inventing the wheel” due to lack of knowledge sharing. Wasted investment in corporate search when minimum user requirements for accuracy are not met.
- For e-commerce search – Lower conversion rates. Higher abandonment rates. Missed opportunities. Lower sales revenue. Loss of mobile revenue.
- For publishing companies – Unsatisfied customers. Less value to the customer. Lower renewal rates. Fewer subscriptions. Loss of subscription revenue.
- For government – Unmet mission objectives. Less public engagement. Lower search and download activity. More difficult to justify one’s mission. Incomplete intelligence analysis. Missed threats from foreign agents. Missed opportunities for mission advancement.
- For recruiting companies – Lower fill rates. Lower margins or spread. Unhappy candidates assigned to inappropriate jobs. Loss of candidate and hiring manager goodwill. Loss of revenue.
To Buy or Not to Buy?
When organizations encounter poor relevancy from their search engines, they usually have one of two reactions:
- Give up and buy a new search engine – this is unproductive, expensive, and wasteful, and there’s no guarantee that the problem would be fixed.
- Find a way to tune search accuracy – the search expertise could be from inside or outside of the organization.
The clear (but not so simple) answer is that all search engines require tuning and all content requires processing. Search engines are designed and implemented by software programmers to handle standard use cases such as news stories and web pages. They are not delivered out-of-the-box to handle the wide variety and complexity of content and use cases found around the world. The more tuning and the more processing you do, the better your search results will be.
A Step-By-Step Process for Tuning SharePoint Online with Engine Scoring
In this section, I will outline a concrete, real-world approach we have taken to tune search accuracy in SharePoint Online.
The major steps are:
- Gather, audit, and process log files: query logs, click logs, and other indicators of user activities
- Compute engine score
- Compute any additional search metrics / reports to help evaluate results
- Perform Search Quality Analysis to formulate a set of candidate changes
- Modify engine to implement changes
- Re-execute queries to score accuracy improvements.
SharePoint Online Search Engine Scoring: Sample Implementation
An Intranet Use Case Example
For many of our clients, good intranet search plays a critical part of the business as intranet portals are used daily by all of their many thousands of employees. Some SharePoint Online customizations that can be implemented to improve the standard search include:
- Freshness boosting
- Custom ‘best bets’ implementation
- Training sessions, videos, and other materials
- Custom refiners
- User profile properties (for example, Department and Country) used to prioritize certain sites for specific users.
- User experience (UX) changes, for example, displaying additional metadata
The first step is to try to understand what the users' journeys look like, and from there to be able to understand what changes might improve user perception. To that end, Google Analytics can be used to collect more data about how users are using the search features and functions. This generates a tremendous number of events. However, making sense of all the data and improving the signal-to-noise ratio can be challenging. This is where engine scoring steps in.
Engine Scoring Procedure
The graphic below describes the flow of data from someone performing a search all the way to generating an engine score. The color codes indicate the tools we used at each step.
- If you are already using Google Analytics to gather data about how people are using the search pages, it would be relatively easy to extend the implementation to include extra events.
- One limitation of Google Analytics is that it often presents data in aggregate via merging or sampling. A custom developed R script can be used to extract the raw data for further processing to ensure accurate statistics. The R script extracted the data into a series of R data frames for analysis. This data can also be used to generate reports via R reporting and graphing tools.
- If your preferred technology for managing SharePoint Online is PowerShell, the engine scoring script can be implemented using PowerShell. The advantage of this is that it has good support for Azure Active Directory Authentication compared to some other technologies. Since this component was responsible for calculating final engine scores, we also extended it to send the final scores back to Google Analytics for reporting.
Engine Scoring Adaptations
In our "Improving Search Accuracy with Engine Scoring" white paper, there are two suggested pre-requisites for engine scoring:
- An offline ‘snapshot’ of the index to act as the experimental ‘control’
- An index with no personalized or security trimmed results
The purpose is to ensure any candidate changes to ranking and other configuration alone are responsible for changes in engine score and not due to some other organic changes, such as a wide-ranging permission change or similar. It also ensures that when replaying the search sessions, the engine scoring script receives the same results as the original user. This allows an ‘experimental’ approach whereby the experimenter controls all factors as much as possible.
This is perfectly fine for organizations that perhaps have a test UAT (User Acceptance Testing) system and where the search index can be copied and managed independently, for example, a public website or e-commerce catalog search. But for many organizations, neither of these is possible because a) it’s not practical to create a new SharePoint Online tenant with a copy of all their content just for this exercise, or b) users have access to different sets of content and it’s not feasible to run the engine score calculation independently for each user to accurately ‘see what they saw’ (unless we had all of their passwords!).
To solve these challenges, we performed two adaptations to the procedure: a baseline process and test user accounts.
1. A baseline process
We repeated the engine score with the same query and click logs, then performed an analysis to understand how much natural ‘jitter’ is expected even if nothing changes apart from content being added, updated and deleted, and permissions changing during natural daily business.
Problem: The engine score of a search engine where updates and deletions are happening is going to naturally vary as content changes, so how do we know if any change to engine score is attributable to optimizations?
In order to understand how volatile the scores were for a dynamically changing search index, we first performed a ‘baseline’ set of scores to calculate the level of natural ‘jitter’ caused during normal activity. The level of ‘jitter’ will vary with the number of queries, volume, and volatility of content and security model, so a similar baseline process is recommended for all clients seeking to generate an engine score against a dynamic / production system.
The diagram above illustrates the overall process and highlights the following aspects:
- First, run repeated baseline scores with no configuration optimizations (we did this every day for a fortnight).
- Scores take place against a dynamic index (therefore scores can fluctuate naturally).
- Once the level of ‘jitter’ has been quantified, each configuration optimization package is measured to generate an engine score. Optionally, the score for the configuration package could be an average of several benchmarks of the same package.
- Subsequent configuration packages also generate scores, and these are tracked over time for comparison.
The baseline scores indicated that while there was certainly variation in the engine scores for individuals between baseline scores, in aggregate, this noise was canceled out across all users, and the amount of jitter was not so high that it would’ve been impossible to proceed.
2. Test user accounts
Problem: Different users can see different content so how can you optimize for each user?
The other major adaptation was required to correct for the impact of personalization caused by user profiles and security trimming. Person A working in IT Operations in the UK and Person B working in R&D in the US may see completely different results for the same query. This is because they are ‘allowed’ to see different content through document permissions, and also because the search engine gives a ‘boost’ to some content for IT Operations and to other content for R&D.
To address this, we created a set of test user profiles, which correspond to the largest user group populations. We then filtered and split the query and click logs to have five corresponding sets, and discarded the rest.
We then generated independent engine scores from the sets, which were combined to get the overall score (they could also be compared to understand how well the search engine was working for each of the major groups within the organization).
In this way, we were able to observe that although there were variations in the engine score, overall in aggregate across the whole set, the differences canceled each other out and the variation was negligible.
Prioritizing Search Optimizations
With the above mitigations in place, we were ready to perform the Search Query Analysis exercise to evaluate which optimizations to prioritize. We performed this manually using R based reports and reviews of the logs and search results. In one of our client examples, the resulting analysis concluded that the following changes should be put in place in a series of SharePoint Online configuration iterations:
- Expand query to include a ‘phrase’ boost
- Use XRANK to boost based on filename and other managed properties
- De-boost content older than 3 years
- Slightly prioritize PDF and PowerPoint files
- Boost content in the same language (based on profile/browser)
- Synonym query expansion
- Comparison of standard SharePoint People search rank profiles
In most cases, these could be accomplished using only minor customizations to the SharePoint Online query transformations. These changes can be made iteratively with an engine score calculated at each step to evaluate the impact. Then, further evaluation and analysis of the impact on engine scoring can be conducted as part of your ongoing performance optimization.
Measure, Test, Validate, Repeat
Performing an engine score against SharePoint Online presents its own challenges caused by the cloud-based subscription model, the fact that the index cannot be copied or moved independently of the content, and the inherent personalization in the system.
There is no “easy fix,” no “silver bullet,” and no substitute to a little bit of elbow grease (aka hard work) when it comes to creating a satisfying search result. But with careful planning and observation, selection of the right tools, as well as some mitigating adaptation to the process, these issues can be worked around, allowing a score to be generated, even against a live, dynamic system. This way, we can start to empirically measure, test, and validate the impact of SharePoint Online search optimizations.
Our approach to improving search accuracy through engine scoring has worked for multiple search engines, including open source and commercial types. For further information on our engine scoring methodology, send us your questions, watch our on-demand webinar, or download our white paper.