Please check this thread:
http://forums.asp.net/p/1828622/5097523.aspx/1?Re+SQL+Server+2000+Full+Text+Search+using+CONTAINS+to...
After doing lot of testing and investigation, I found out the following on SQL Server 2000 under Windows Server 2003 using Indexing Service:
1. Arabic Text in Static PDF Files (created using Adobe LiveCycle Desinger Static PDF with XDP for data binding) is stored using the unicode of the connected char shapes in reverse order !!!!!!
2. Arabic Text in MS Office Files is stored using the unicode of the isolated char shapes in normal order. This is the correct way as per my expectation.
I verified this when searching for the "Sick Leave" Form Requests. I found, by chance, the Arabic Text of "Sick" inside the "Characterization" result field and while it looks normal in Query Analyzer Window, when I verified the unicode value of each letter, I figured out what is going wrong. I used the following code in Query Analyzer to verify:
print unicode(substring(N'ﺔﻴﺿﺮﻣ', 5,1))
print unicode(substring(N'مرضية', 1, 1))
print nchar(65251)
print nchar(1605)
--- result is --->
65251
1605
ﻣ
م
The following queries returns entirely different results:
select *
from openquery(ISRV, 'Select Filename,PATH,rank,url,characterization from SCOPE() where contains(contents, ''"ﺔﻴﺿﺮﻣ"'')')
order by FileName
select *
from openquery(ISRV, 'Select Filename,PATH,rank,url,characterization from SCOPE() where contains(contents, ''"مرضية"'')')
order by FileName
Now next question is why the Arabic Text inside the Static PDF File is stored in this wiered format ????!!!!
I have stored some Arabic Text in a regular TXT File, but until now, it is not picked up by the Indexing Service scanning engine (since one week). Once it is scanned, I will confirm the result.
I think I need to post this question to Adobe Support.
Any one can help with this?
Tarek.