Exchange public folders search fail: Error while crawling LOB contents

Crawling fails for Large Exchange Public Folders

Recently I faced an interesting issue with crawling exchange public folders with SharePoint Search. I’d like to share a few details and a solutions that fixes it.

Our Exchange Public Folder crawling fails with “Internal Error Occured” messages. Interestingly but not all folder fail to crawl, only several ones.

exchange://XXXX.XXXX.XXX/owa/? ae=Folder&t=IPF.Note &id=PSF.LgAAAAAaRHOQq &pspid=_1349876110817_788781350  &s_p=https&s_ce=0000000g1q0enrplrak9g67lu00000000

Error while crawling LOB contents. ( An internal server error occurred. Try again later. )

exchange

This issues reproduced on FAST Search for SharePoint, as well as for SharePoint 2010 Enterprise Search. Later on we tested it on SharePoint 2013 Search and still had the same problem. In depth examination of  ULS logs revealed following errors.

03/14/2013 16:00:42.51 mssdmn.exe (0x11EC) 0x1D00 SharePoint Server Search FilterDaemon e4ye High FLTRDMN: Errorinfo is “An internal server error occurred. Try again later.” [fltrsink.cxx:553] d:\office\source\search\native\mssdmn\fltrsink.cxx

03/14/2013 16:00:42.51 mssearch.exe (0x06B0) 0x22DC SharePoint Server Search GatherSvc fus3 Medium Transaction failed in plugin RHTG Url pf://xxxx.xxxx.com/owa/? ae=Folder &t=IPF.Note &id=PSF.AEpHAAAB  &pspid=_1363252609418_378466705 Error 0x8004fd11 [gthrtrx.cxx:1042] d:\office\source\search\native\gather\gthrsvc\gthrtrx.cxx

After various troubleshooting sessions with Microsoft Support Services we identified that only folders with 500+ messages fail to be crawled.

Problem Solution

It turned out that it was Exchange Server throttling that caused such behavior. In order to resolve this issue you need to apply two fixes.

1. Disable Throttling for the Default Access Account of SharePoint on Exchange

  • Open Microsoft Exchange Management Shell on your Exchange Database Server, then
    • Start > Microsoft Exchange Server 2010 > Exchange Management Shell;
  • Execute:
    • New-ThrottlingPolicy SharePoint  //this will create new Throttling policy with name “SharePoint”;
  • Execute:
    • Set-ThrottlingPolicy SharePoint -RCAMaxConcurrency $NULL -RCAPercentTimeInCAS $NULL -RCAPercentTimeInMailboxRPC $NULL -RCAPercentTimeInAD $NULL -EWSMaxConcurrency $NULL -EWSPercentTimeInAD $NULL -EWSPercentTimeinCAS $NULL -EWSPercentTimeInMailboxRPC $NULL -EWSMaxSubscriptions $NULL -EWSFastSearchTimeoutInSeconds $NULL -EWSFindCountLimit $NULL
  • If you have  Microsoft Exchange Server is 2010 SP1, then execute:
    •  Set-ThrottlingPolicy SharePoint -CPAMaxConcurrency $NULL -CPAPercentTimeInCAS $NULL -CPAPercentTimeInMailboxRPC $NULL
  • In order to validate correct settings execute:
    • Get-ThrottlingPolicy SharePoint
  • Following properties should not have any value:
    • CPAMaxConcurrency (Exchange 2010 SP1)
    • CPAPercentTimeInCAS (Exchange 2010 SP1)
    • CPAPercentTimeInMailboxRPC (Exchange 2010 SP1)
    • EWSMaxConcurrency
    • EWSPercentTimeInAD
    • EWSPercentTimeInCAS
    • EWSPercentTimeInMailboxRPC
    • EWSMaxSubscriptions
    • EWSFastSearchTimeoutInSeconds
    • EWSFindCountLimit
    • RCAMaxConcurrency
    • RCAPercentTimeInAD
    • RCAPercentTimeInCAS
    • RCAPercentTimeInMailboxRPC
  •  To apply Throttling policy for SharePoint account – execute:
    • Set-Mailbox <Domain>\<UserName-DefaultContentAccessAccount> -ThrottlingPolicy SharePoint

2. Increase max concurrent objects for Exchange session on Exchange Servers

  • On you Exchange Database Server open following registry folder:
    • HKLM\System\CurrentControlSet\Services\MSExchangeIS\ParametersSystem\
  • Create new key
    • MaxObjsPerMapiSession
  • Create new Dword (32-bit) entry
    • objtFolder and assign value 1000 ;
  • Create new Dword (32-bit) entry
    • objtMessage and assign value 500;
  • Reboot Exchange servers
  • Do Full recrawl of you Content Source

Enjoy! Your large exchange folders should be successfully crawled now!

Advertisements

Neural networks in Visual Studio and SharePoint Search

Here’s a wonderful video of Dr. James McCaffrey from Microsoft Research explaining basics of Neural Networks and it’s implementation pitfalls

Developing Neural Networks Using Visual Studio
http://channel9.msdn.com/Events/Build/2013/2-401.

Highly recommended for those who would like to get better understanding of science behind Sharepoint Search Ranking models. Looks like this video is a final missing piece in this SharePoint puzzle.

You might want to check more details in following blog posts and official MSFT pages

SPCUA 2013 slides – English

As per request of several people whom I met on SPCUA 2013,  I post English version of my Enterprise Search session slides here

Slides from SPCUA 2013 SharePoint Ukraine – Enterprise Search

22 мая я вместе с Иваном Подобедом ездил в Киев выступать на SPCUA : киевской конференции SharePoint Conference Ukraine 2013. Иван рассказал про Shredded Storage в 2013, а я о том что надо и не надо делать чтобы потроить успешный и хорошо ищущий поисковый портал, а также как считается релевантность в поске шарепоинта и при чем тут гиперболический тангенс, о ужас.

http://spcua.com/speakers/kozhemyakin

Скачать слайды можно по ссылке, либо на slideshare

Также можно почитать “статью”, которая , по неизвестной причине, оказалась выложена в версии “черновик”, а не “финальный вариант”.

А еще можно оставить отзыв о моем докладе и о конференции в целом можно тут : http://spcua.com/news/ostavte-svoj-otzyv-o-spcua-2013

Custom meta tags for web pages in SharePoint Search

I’d like to share with you an amazing feature of SharePoint Search that exists for very long time,  at least since 2007 version, but very few people, according to my observations, are aware of it.

First of all, it is possible to increase findability by doing basic SEO of intranet sites by adding well-known (http://en.wikipedia.org/wiki/Meta_element) meta tags to web-pages : title, keywords, description. These tags will be picked up by SharePoint Web Crawler and propagated to crawled properties, and then automatically linked to appropriate managed properties, which have very high impact to overall ranking.

Secondly, it is possible to crawl custom meta tags from web-pages using and leverage them in search (a few example are in the end of the post).

All you need to do is to add them the same way as you do with well known tags.

<meta name="XXX content="YYY">

_mata

Then doing a full crawl, go to Search Schema (SharePoint 2013) or Metadata Properties (SharePoint 2010 or FAST), then Crawled Properties Categories. Note: Unfortunately during incremental crawl new properties will not be picked up.

_crawled

Then select “Web” cateory, and here is your newly added crawled property. Now, in order to use it create new managed property and manually map it to crawled property. Then do a full crawl again.

_mapped

That’s it.

Now a few ideas/examples how you can use efficiently use it in your search:

  • Increase findability by using well-known tags
  • Personalize search results leveraging custom tags:
    • In our portal we boost pages if office_city value matches city from user profile. It can be easily done via XRANKs and Query Rules in SharePoint 2013, or custom pre-processing with boosting of query in FAST Search (not an easy option, though).
  • Perform nice and easy integration with intranet portals to enrich search content with structured information.
    • In our portal we crawled employee recognition intranet site, where a set of pages represent rewards for employee and is enriched by custom tags as “employee name”, “reward description”, “reward category”. Then we created a structured search vertical that triggers when user query matches description/category of reward. Technically it was done using  Federated Web part and  Search scopes in SharePoint 2010 Search(FAST) and later on migrated to Result Sources and Results Block in SharePoint 2013 Search.
    • _custom

How SharePoint 2013 Ranking models work

Introduction

Think of search as a two phase process, firstly search engine identifies documents that match query, secondly it performs ranking to predicts relevancy of each document. Typically search engines calculate rank taking into account many different things including attributes of document, importance of query terms, user clicks and so on – all of them are called ranking features.

Let’s review SharePoint 2013 ranking models in details : which features do they include, how it works and where it can be configured.

A list of available ranking models can be obtains using following commands in powershell:

(pay attention that in 2nd line there should be NO $ near  -Level ssa,  MSDN example contains this typo (http://technet.microsoft.com/en-us/library/ff607990.aspx)

$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "Search Service Application"
$owner = Get-SPEnterpriseSearchOwner -Level ssa
$models = Get-SPEnterpriseSearchRankingModel –SearchApplication $ssa -Owner $owner
$models

cmdlets

Here’s a list of available ranking models:

  • Default Ranking Model
  • Catalog ranking Model
  • Recommender ranking model
  • People Search expertise social distance ranking model
  • People Search name social distance ranking model
  • People Search name ranking model
  • Popularity ranking model
  • People search application ranking model
  • People Search social distance model
  • People Search expertise ranking model
  • Site Suggestion ranking model
  • Search model With Boosted Minspan
  • O14 Default ranking model
  • Search Model Without Minspan

Let’s export first model to XML in order to examine it more closely.

$models[0].RankingModelXML > "c:\Default Search Model.xml"

Ranking models in SharePoint 2013 can be a combination of several ranking models. Default ranking model is composed of 1st stage – Linear ranking model (HiddenNodes=1) which produces a limited set of candidates (maxStageWidCount=1000) for 2nd stage – Neural Network (HiddenNodes=6).

“Why does it have two stages?”

The reason for this is – performance optimization. 2nd stage contains several MinSpan features (to boost proximity of query terms) which require a lot of computational power to calculate.

defaul model

A picture may help you to better understand the process of ranking.

ranking flow

In case of Linear ranking model, according to it’s name, the final score is calculated as a sum of normalized values produced by each feature.

In case of Neural Network ranking model it’s not that simple. Values of features(200) are transformed (202)  and then normalized(204) to have expected value 0 and standard deviation 1. Then it’s all is mixed together in Hidden Nodes layer (208) and final score (210) is calculated using a rather scary (at least at first look) formula with tanh (hyperbolic tangent).

neural network

“What feature does default model use?” 

As it can be seen from Default Ranking Model XML, it uses following features. (Notice: there is no sign of a feature related to “freshness” of document”).

I plan a separate post to describe the meaning of all features and it’s configuration in detail.

  • BM25
  • UrlDepth
  • InternalFileType
  • Language
  • ClickDistance
  • QueryLogClicks
  • QueryLogSkips
  • LastClicks
  • EventRate
  • Min Span – Title
  • Min Span – Content

“Why does it use neural network?”

That’s a tricky question. Some search engines use linear combination of features to calculate final score. For example in FAST Search for Sharepoint there was a number of weights in RankProfile to be configured, and the process of score calculation (and it’s debugging\tuning) was quite understandable, at least it was possible to see the impact of each feature to final score. According to inventors of this SharePoint ranking model (see links to patents in the end of this post), neural network is used to detect and capture non-linear cross dependencies between features, as a result quality of search is increased (up to 10% according to provided metrics in patent).

 “What makes you think that ranking model was implemented in SharePoint as described in patent?”

As I described in previous post, there’s a way to get the rank log in SharePoint 2013. powersearching.wordpress.com/2013/01/25/explain-rank-in-sharepoint-2013-search/

I reconstructed transformation/normalization functions in Excel, then copied values from existing rank log and raw ranking model weights, then reconstructed final score calculation using tanh. Resulting values from rank log and manually calculated from the very beginning –  matched!

Frankly speaking there is one last step missing in this chain – in SharePoint 2013 resulting rank is then somehow normalized again as an additional step after neural network, ad it’s not described in patent. Hopefully it will be possible to find out this missing piece.

excel

 “How ranking models are tuned/trained?”

Microsoft describes they approach of evaluation and tuning of search relevancy in this wonderful post http://msdn.microsoft.com/en-us/library/bb499682(v=office.12).aspx. Regardless it is targeted to SP2007, it can be applied to any other Enterprise Search solution, including SharePoint 2013.  Ranking models are tuned using machine learning using Gradient Descent method and Lambda Rank metric, which is build on top of query judgments(evaluation of results quality) submited by query assessors.

Microsoft.Office.Server.Search.RankerTuning namespace contains a lot of code related to process of ranking models training. We were able to recreate some of web pages for example EditRankingModel page which can customize and auto tune feature weights and run side by side analysis of given and default ranking model. We’re not yet managed to make it function correctly, but hopefully soon it will be ready as a ranking management tool.  However it should be mentioned that without descent amount of query judgments the task of training neural networks seems  not doable.

judge

References

Reference to patents described in this post.

SharePoint 2013 explain rank page (ranklog).

Summary

This post covers SharePoint ranking models and score calculation from bird’s eye perspective.

Stay tuned, in next post I will cover detailed analysis and side by side comparison of several ranking models, and it’s configuration and parameters explanation.

Please share your comments regarding what specifically you’d be interested in.

Enterprise Search @Belarus SharePoint User Group

Yesterday I spoke at Belarus SharePoint User Group and presented brief overview how we built our in-house Global Search solution as well as few other relevant topics. Such as a summary of available options to connect external content to SharePoint search, including our EPAM Data Import Framework, which in several aspects outcomes SharePoint BCS model.

The killer point of yesterday discussion was  presentation of my finding regarding relevancy calculation in new SharePoint 2013 search. As far as I’m concerned (and briefly described in previous Explain Rank post, SharePoint uses neural network to calculate final score, and as a part of calculation process it uses hyperbolic tangent 🙂 Which kind of scared everybody, but at least now everybody knows that SharePoint relevancy is not a piece of cake.

You might want to check out slides here:

http://www.slideshare.net/AlexKozhemiakin/sp-user-group

tanh

%d bloggers like this: