Tuesday, November 18, 2008

Crawling of PDF document from SharePoint search

I had issue while crawling of our SharePoint sites, I was getting error for the outlook message file in which there were embedded PDF document.

Errors which I was getting into crawl log:
1. The filtering process could not be initialized. Verify that the file extension is a known type and is correct
2. Error HRESULT E_FAIL has been returned from a call to a COM component.

I have found on the internet that I have to install extra filter into server which will search and crawl for PDF files, there are 2 free filters available. One is from Adobe Ifilter 6.0 and one from Foxit Filter.

I tried to play with both filter but I was getting error into PDF and message file, I know, I was missing in some little configuration and installation of something.

At the last, I have found one very good and important link, as per that link, Adobe has not created any separate iFilter for PDF file types after Adobe Reader 7 version.

So they suggested us to installed Adobe Reader 8 or reader 9 version into our server, because after reader 7, Adobe package iFilter functionality into same software as plugs-in.

So after doing configuration from below links, I got the success, now I am able to crawl message file and PDF documents from the SharePoint sites.



Good Reference Links.

http://www.adobe.com/support/downloads/detail.jsp?ftpID=2611&promoid=DNRLI (all about Adobe PDF IFilter v6.0) 

http://www.adobe.com/support/downloads/product.jsp?product=1&platform=Windows (Different products from Adobe)

http://blog.tylerholmes.com/2008/04/walkthrough-installing-adobe-v6-pdf.html ( How to install Adobe filter)

http://downloads.fuxinsoftware.com.cn/pub/foxit/manual/enu/FoxitPDFIFilter10forMOSS_manual.pdf (Manual for Foxit Filter)

1 comment:

  1. Hi sanket,

    Thats really wonderful info about crawling into PDF files. your blog has helped me to set up search from PDF files in an easy way. thanx a lot...