Protiviti / SharePoint Blog

SharePoint Blog

September 30
Search Quick Tip: Configure the SharePoint Search Results Web Part to Avoid Inconsistent Search Results!

​We recently had a client that used many large pdf files.  They were about 900 pages long each.  The problem was that of 10 documents only 2 would come back in the search results.

These documents were tagged with properties such as "Document Type."   Some of the values for Document Type were "Working Policy", "Internal Project" and others.  The names of the documents were Company Policy 2007-2008, Company Policy 2008-2009 and so on through 2012-13. 

So here is the issue:  If you search for "Policy" only one or two of the 10 documents would come back in the search results.  If you search for pdf you get the same type of results.

Oddly, if you refined your search to say "Working Policy" you would see the missing files show up in the search results box.  Also if you did an exact query for "Company Policy 2011-2012" then the document would also show up.

Immediately, I looked to issues with individual files and, as suspected, there were issues. After looking at search log files I could see many of the files were not indexing cleanly.  There were errors about the body field holding too many characters as well as items that were just skipped because of illegal content or some other issue that the crawling process did not like.

This led me to analyze the files themselves.  Some files had macros in them, some files had hidden characters that were replacing the real title value.  So you would get only half a title and some odd unicode characters in the search results.  We corrected this, but still we had the main issue present- not all items returned in our Search.  

Next I focused on the item sizes.  With SharePoint 2013 if you have the Aug 2013 update you can increased the max indexed size and content properties for your Search Application.  This sounded promising so I updated my farm and reran the crawl. 

Now all items were showing up in my crawl index as expected.  The crawl log showed all items as a green indicator icon meaning they were crawled successfuly no issues.  Still I could not get all the items to return in a search.

This issue was eventually escalated and we brought in another resource to assist.  This resource had recently faced this same issue and knew that at times SharePoint can see two items as the same document.  This is regardless of the actual physical file name or the title you give the file in a document library. 

In the Search Results Web Part you can configure your search to group by DocumentSignature.  What this does internally is more complex than we can explain here, but the end result is that your near duplicate files in the SharePoint Search Results webpart are now visible.

Voila, all documents now return in search.  When we searched for "Company Policy" all the items would return successfully in the search results.  Due to some internal checks Search likes to remove duplicates from your search result, which would normally be good.  Unfortunately in this case it is bad, really bad.  What should happen is the function that does this "duplicate removal" process should check the title and filename and if they are different skip this whole removal process.

I understand the need for hashes and near hashes etc to clean up search.  But if your end user says that mydoucment1.pdf is different than mydocument2.pdf then the Search Team really should respect that and not remove search results by flagging them as duplicate.  Really who knows best the Search Team or the guy that works with the data every day?  Hopefully we can get this escalated and processed through our MVPs and we will see a change.  It is really a minor change, but it is important.  Imagine, you hear so much about how great SharePoint Search is, decide to use the O365 or On-Premises platform and then more than half of your documents will not show up because you are using a template to create them and SharePoint Search says "they must be duplicates!"

We had another similar issue where an excel file that was being uploaded and downloaded multiple times into a library was seen as the same file.  The idea was that everytime the file is uploaded an event receiver should kick off.  Well when SharePoint saw this document as the same item that already existed it would not kick off our event receiver.  We had to disable the ParserEnabled property for the web application to get this solution to work.

Quick Launch


© Protiviti 2019. All rights reserved.   |   Privacy Policy