Wiktionary talk:Frequency lists/Hindi 1900

From Wiktionary, the free dictionary
Jump to navigation Jump to search

How was this list compiled? The source link is broken.

Found it! [1] Aryamanarora (talk) 20:56, 14 November 2015 (UTC)[reply]
No. That link is also broken. Steipe (talk) 03:01, 8 December 2023 (UTC)[reply]

Apparently that list was compiled at IIT Kanpur sometime in the earlier 2000s. I have not been able to find source information, and the list is problematic as an actual frequency list since some words were excluded (corpus linguists sometimes exclude so-called "stop words"). For example और (aur: and) is not in the list, or ने (ne: ergative case marker), etc. An updated list could be based on IIT Kanpur work by Verma et al. "Shabd: A psycholinguistic database for Hindi" (2022: Behaviour Research Methods, 54:830–844.), this is at least a stable reference, and the data can be downloaded at the OSF (Open Science Framework): Shabd: Word Frequencies for Hindi Words. Steipe (talk) 03:01, 8 December 2023 (UTC)[reply]