Wednesday, October 28, 2009

The state of Search (SharePoint 2010)

Technorati Tags:

I recently attended the SharePoint conference in Las Vegas and the big emphasis was on Search and "findability". Microsoft has made search a key piece to SharePoint's enterprise content management capabilities. Getting content into SharePoint has been easy, but now customers are demanding better solutions on how to find and leverage the content once it is in SharePoint. Customers want to be able to browse, filter and make decisions on search results. SharePoint 2007 search lacked any useful way to refine and browse results. The only thing you had was the ability to page through results looking at the highlighted properties, title and url of the document. SharePoint 2010 search has made some progress versus 2007.  Below I will discuss the changes made in search results that enhance “findability”, improvements for content acquisition for search, and much needed bug fixes that should have been there years ago. All this is based on the July 2009 technical preview and material from the SharePoint Conference 2009.

 

Improvements in standard search center

The standard search center in 2010 has added a “Preferences” link for user preferences when searching.  There are two preferences. First there is enabling “Search Suggestions” very similar to other search engines when type in a keyword it will auto-populate suggestions of terms other users have used. This has to be enabled by the site collection administrator.

The second preference is which languages you want to search in. This will give you the ability to select which languages you want given higher relevance and how the search engine will interpret the keywords.

 

Improvements in advanced search

The advanced search web part has not changed at all in 2010. You still have the ability to search on keywords and use property restrictions. The improvements come with the plethora of new search results web parts including “Top Federated Results” , “Related Queries” and “ Search Refinements” that are included on the results.aspx page. The “Top Federated Results” web part allows you to add multiple search results from other sources such as bing, google, people, or even an external FAST search server.  You can set up your own federated search locations via the “Manage Federated Locations” page in Central Administration\Search Service Application page.

 

The “Related Queries” web part seems to be used for connecting your own federated search page, so users can take SharePoint results and view them together with a results from custom source.

Finally, there is the “Search Refinements” web part which allows users to navigate results by clicking on links that resubmit a search query using criteria to narrow the results down.  Out of the box the web part includes refinements for “Result Type” which is the file extension of the document that is returned. Author and “Modified Date” are also included. These refinements can be added to and modified via the “Filter Category Definition” builder.    The “Filter Category Definition” is another xml document that is similar to the xml that you define for the “Property Restrictions” in the Advanced Search web part. Here you can define your own refinement names and use the either values or ranges of values. For instance, if you have a date and time managed property you could define your own periods like “First Quarter” and  “Second Quarter”…. In the xml there is a way to tell the web part to use relative values to compute the ranges. I will explain more about that in a later post. The refinements can also use the new  “Managed Metadata Columns” which take advantage of the new taxonomy service for custom terms.

This web part increases the “findability” of SharePoint 2010 search. The FAST SharePoint Search will also include counts of the refinements, thumbnail previews of the documents and custom relevancy sorting making the SharePoint search even more usable.

 

 

Search Optimizations

SharePoint 2010 offers many new search result web parts and they all can be put on the same web part page. This could be a performance problem rendering the page when executing multiple queries to multiple sources. Fortunately, SharePoint 2010 has optimize page rendering by making all of these web parts Ajax enabled providing fast rendering by allowing the queries to be executed asynchronously. You will notice each of the results web parts there is an “Ajax Options” section in the tool pane. You can enable “Asynchronous Update” which by default it is not. You can also have your queries refresh them selves based on a interval or display a manual refresh button to keep the network traffic down.

Other optimizations involve crawling. SharePoint 2007 had problems crawling large amounts of data. It could take weeks to do a full crawl of a million documents.  With SharePoint 2010 new optimizations have been included.  First there is “security only” crawling. Periodically SharePoint 2010 will do a security only crawl which involves only crawling the acl (permission changes) without bringing down the content to the index servers. Secondly, the crawler can crawl any “Business Data Connector” source and will inline cache the data for better performance on searching.  Unfortunately, there is still no ability to inject a document into the index. This would be useful if you are doing workflow and add a document. The users will still have to wait until the next incremental crawl. If you need this capability you will have to purchase FAST SharePoint Search. The ability to use multiple index servers and

Finally, there is a new option to decrease the amount of database space that is used to store managed property values. This option is available within the Managed Property edit page in the Search Service Application. It allows for storing text based managed property values as binary hashes.

 

Query Language Improvements

The search box now supports more robust keyword searching. In SharePoint 2007 if you entered two or more keywords into the search text box they were implicitly OR together. In SharePoint  2010 you can now tell search what you want to  do with the keywords. For example, this would find all document containing both the term CAT and DOG:

cat AND dog

This would find all documents containing either cat or dog:

cat OR dog

In the July technical preview the and/or must be all caps for it to work.

SharePoint 2010 also now supports wild card searching from the search box. So if you wanted to find content that contained both computer or computation, you can enter comp*.

Bug Fixes

One of the biggest bugs in SharePoint 2007 FullTextSQL queries was the inability to OR together two different managed properties and return correct results.

Custom cross list search pitfalls.

The crux of the bug is that if there is not a value for both OR managed properties the document will not come back in the results. So if you have managed properties like col1 and col2 and the document only has a value for either col1 or col2 then it will not be returned. SharePoint 2010 has fixed this to a certain extent.  The July technical preview shows it only working with text based managed properties. If you try to OR a text managed property with a decimal managed property is still will not work. Hopefully beta 2.

Another issue was the “Query malformed error” you would receive when the FullTextSQL would OR together more than 10 values on the same managed property. There is a hard limit of 10 in SharePoint 2007. This is fixed in 2010.

Summary

In summary, the state of search in SharePoint 2010 is great. Things are looking up. With the new search web parts, optimizations, query language enhancements, extended configuration and bug fixes, it looks like SharePoint 2010 will become a key component of a enterprise caliber ECM solution. I will post soon how to set up custom search refinements in the “Search Refinements” web part. So stay tuned.