Hindi searches

Jump to navigation Jump to search


I'm asking around abound Hindi searches. Could you help me out, please? --Anatoli 00:42, 31 January 2011 (UTC)

00:42, 31 January 2011

Thanks for your reply.

Here's what I'm asking for.


First of all, the pairs with nuqta (a dot underneath) and without it should be searchable the same way Roman letters with diacritics and without are searchable.

The letters are not identical but So that if a user typed खून, ख़ून would also be listed.


  • Different forms of alif: ا, أ‎, إ‎, ﺁ‎ and ٱ‎‎ should be searchable together, e.g. أمس and امس, etc.
  • Words containing any of these diacritics could be searchable as if they don't have them and the other way around:

ـَ fatHa, ـِ kasra, ـُ Damma, ـْ sukuun, ـّ shadda, ـٰ dagger 'alif.

Is it possible? Anatoli 12:51, 31 January 2011 (UTC)

12:51, 31 January 2011 (UTC)

12:51, 31 January 2011

More diacritics for Arabic:

12:53, 31 January 2011 (UTC)

12:53, 31 January 2011

Tim Starling added a bug for this, and I will add all of the characters you have listed above to the bug. I told him that we wanted this behavior in AutoComplete (in the search fields) as well as included in the DidYouMean extension which suggests alternative results when your search doesn't lead to an existing page, is there anywhere else this should be enabled?

14:37, 31 January 2011


Stephen G. Brown reminded me of Arabic kashida: الكـــــــســـــــــر = الكسر. The symbol is ـ. I also forgot the pairs: ي / ى like in مصري / مصرى. Sorry, I didn't think of these yesterday.

Yes, it's what you're asking. The treatment for alternative letters should be like for Roman words conaining any of Appendix:Variations_of_"a" or others, so when you type etre, you can see être in the search box. Anatoli 19:21, 31 January 2011 (UTC)

19:21, 31 January 2011

Thank you, I tested various Arabic alifs, they seem to work, no luck with Hindi variants, though. Anatoli 21:41, 31 January 2011 (UTC)

21:41, 31 January 2011

I am pretty sure no modifications have yet been made, the bug was only just submitted and I understand that the guy who works on search functions for Mediawiki is pretty busy at the moment.

21:43, 31 January 2011

Then it means that various versions of alifs have already been addressed before. They could be used as a template, perhaps? Please let me know if you don't understand any of the requirements, not sure I expressed myself well - I wrote between jobs and then late at night. Anatoli 22:09, 31 January 2011 (UTC)

22:09, 31 January 2011

Persian often uses a zero-width nonjoiner (& # x200C;) as in ویکی‌پدیا. People who don’t know how to access it tend to substitute a space: ویکی پدیا. It’s a misspelling, but lots of people can’t help it.

In languages like Khmer and Thai that do not use word spaces, there is often a zero-width space (& # x200B;) as in តើអ្នកនិយាយ​ភាសាអង់គ្លេស​ទេ. More often than not, it is simply left out (តើអ្នកនិយាយភាសាអង់គ្លេសទេ). Both spellings are correct.

I think Anatoli neglected to mention the word-final Arabic pair ه/ة. The final letter ة may be typed as ه.

23:05, 31 January 2011

Do you know how Mediawiki currently handles this? We obviously don't want all spaces to be normalized to that character, and we don't want that character to be normalized to a space either (or terrible matches would be made).

23:13, 31 January 2011

No, I don’t know how it works. &# x200C; and &# x200B; definitely should not be removed or changed, but &# x200C; should be seen by the software as the equivalent of a space, and &# x200B; should be seen as the equivalent of nothing at all.

00:07, 1 February 2011


Is there any update? Anatoli 03:56, 4 February 2011 (UTC)

03:56, 4 February 2011

Sorry, yeah, the bug is here if you want to check it out or add to it, there have been no updates from developers on the bug at this point. It will probably take a while, they are busy people.

03:18, 7 February 2011

Thank you! --Anatoli 22:34, 7 February 2011 (UTC)

22:34, 7 February 2011