User:Tbm/Reports/QA tools to improve the quality, reliability, and consistency of Wiktionary

From Wiktionary, the free dictionary
Jump to navigation Jump to search

I received a Rapid Fund grant to work on QA tools to improve the quality, reliability, and consistency of Wiktionary.

Results

[edit]

1) Created a Python module to interact with and modify entries from the English Wiktionary. This is a set of building blocks that can be refined further for other work in the future.

2) Created a set of Python functions to extract hyphenation information from Wiktionary and to deal with different hyphenation rules for a number of languages (such as German and Hungarian). I have then created a list of words where the hyphenation pattern does not match the word. I invited editors from those languages to fix the identified issues.

3) Created Python code to represent basic information about Yiddish words (such as noun genders and plurals) from English Wiktionary and Swedish Wiktionary. This was the basis for scripts to compare information between the two in order to find discrepancies.

4) Implemented a number of consistency checks for Yiddish entries on Swedish Wiktionary, for example to make sure the headword listed in the entry matches the entry and to ensure gender information expressed in different ways matches.

5) Created a Python module to interact with data from a Swedish-Yiddish dictionary published by the Swedish Institute or Language and Folklore (ISOF). This dictionary is available in computer-readable format (JSON) and distributed under the CC0 license, which makes it suitable for Wiktionary. Additionally, I created scripts to compare this information with Yiddish entries from the English Wiktionary and Swedish Wiktionary.

Discussions

[edit]

Metrics

[edit]

Hyphenation

[edit]

Misc hyphenation fixes and cleanups

[edit]

Use template parameters for hyphenation

[edit]

Merge hyphenation info

[edit]

Fix incorrect hyphenation info

[edit]

Fix more inconsistencies with hyphenation: drop full stop

[edit]

Fix case mismatch in hyphenation

[edit]

Yiddish

[edit]

Add gender to noun

[edit]

Fix order of transliteration

[edit]

Standardize separation of gender alternatives

[edit]