Persian often uses a zero-width nonjoiner (& # x200C;) as in ویکیپدیا. People who don’t know how to access it tend to substitute a space: ویکی پدیا. It’s a misspelling, but lots of people can’t help it.
In languages like Khmer and Thai that do not use word spaces, there is often a zero-width space (& # x200B;) as in តើអ្នកនិយាយភាសាអង់គ្លេសទេ. More often than not, it is simply left out (តើអ្នកនិយាយភាសាអង់គ្លេសទេ). Both spellings are correct.
Do you know how Mediawiki currently handles this? We obviously don't want all spaces to be normalized to that character, and we don't want that character to be normalized to a space either (or terrible matches would be made).