Wiktionary talk:Statistics

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

"good" and "bad" entries[edit]

Can we come up with different adjectives here? I think "good" and "bad" may be significantly misleading to the uninitiated. I had proposed "interesting" and "uninteresting", but Dangherous didn't like them. —scs 00:08, 30 June 2006 (UTC)

How about describing them as "entries with wikilinks" and "entries without wikilinks".
Also, how about removing the "mostly redirects" comment. It used to be true, but I'm not so sure that it is, now. --Connel MacKenzie T C 05:36, 30 June 2006 (UTC)
It's not about wikilinks... is it? They are mostly redirects I guess, but up to now I haven't found any decent descreption of what is considered an entry and what not. Since we do have a huge amount of redirects, I expect them to be the majority. Now, "good" and "bad" are the terms that have always been used. Never mind, though, it's just a detail. Call them "empyreal" and "purgatorial" if you like. — Vildricianus 08:46, 30 June 2006 (UTC)

scrunch up a little[edit]

For all namespaces after NS:1=Talk, why not just combine two rows into one, and add a "Talk" column for them? (I'm tempted to suggest that all except the subtotals should have Show/hide type auto-hiding.) --Connel MacKenzie T C 05:38, 30 June 2006 (UTC)

The show/hide crap will make it complicated, but feel free to play around with the table. — Vildricianus 08:46, 30 June 2006 (UTC)

Take after the French[edit]

This page would be more interesting/helpful if it contained more information in an easier to use format, such as fr:Wiktionnaire:Statistiques. Jade Knight 19:43, 23 October 2006 (UTC)

Spanish and English statistics[edit]

Curiouser and curiouser... we now have more Spanish words than English ones. Beobach972 03:15, 24 October 2006 (UTC)

Right - on this iteration I did not exclude the "form of" templates. --Connel MacKenzie 03:16, 24 October 2006 (UTC)


I'm surprised to have gotten so little feedback on the "Detail" section. Perhaps the explanation is clear enough? Honestly, I expected somebody to ask why the numbers (say, for English) don't add up to get "Total definitions." (The answer is that "real definitions" is exclusive of the others, but something can count as an "inflected form" and as "slang" while actually being only one definition line.) I also kindof expected someone to ask why "total language sections" is so much higher than "real definitions" and so very much lower than "total definitions." I guess that is self-evident? --Connel MacKenzie 20:48, 23 May 2007 (UTC)


What does it refer to exactly? Does translingual mean words which are used in more than one language? DaGizza 23:03, 10 January 2008 (UTC)

  • It refers to two main groups of things. 1) Symbols that don't really belong to any language at all (see %). 2) Taxonomic names (some people call them New Latin) that are used across all languages (that use the Roman script) (see (Homininae). SemperBlotto 23:11, 10 January 2008 (UTC)

I also used it on CCC which is initialism for Chaos Computer Club, which works in English and German, not sure if that was right. Mutante 23:16, 10 January 2008 (UTC)

Language codes[edit]

Could we add language codes to this data? I'm going to do so manually right now but it ought to be added to the script that generates this page too. — hippietrail 05:07, 3 February 2008 (UTC)


I've converted vi:Wiktionary:Thống kê to use {{PAGESINCATEGORY:}} for the language breakdown. It'd be a bit more difficult to do that here; for instance, Category:English language doesn't directly contain all English words, so you'd have to add up all the parts of speech. In any event, it'd be a nice extension to the automatically-updated Special:Statistics page. – Minh Nguyễn (talk, contribs) 22:08, 21 May 2008 (UTC)

Statistics update[edit]

Is this supposed to be updated so rarely? The last dump is 50 days old. --Vahagn Petrosyan 20:49, 4 March 2009 (UTC)

I could be wrong, but I think the question if one of responsibility. Connel took care of this page for a long time, but he has been mostly absent as of late, and not doing the updates. Conrad did it a few times, and certainly has access to fresh dumps. I suggest you nag him. -Atelaes λάλει ἐμοί 20:55, 4 March 2009 (UTC)

please tell me...[edit]

What are "Form-of" definitions, and why has Mandarin only got 80 of them? Can someone please leave a message for me on my talk page about it? Cheers Tooironic 13:45, 21 November 2009 (UTC)

A "form of" definition consists of an entry that is defined solely as being a "form" of another word. For example, each English noun has a plural "form", and each English verb has a past, past participle, and present participle "form". A Latin verb may have over 100 "forms" (see the links in the inflection table at amō, for example). I suspect Mandarin doesn't have very many "form-of" entries because Mandarian verbs have oly a single form, which is the main entry form. "Form-of" entries exist primarily in languages that conjugate their verbs or inflect their nouns and adjectives. --EncycloPetey 16:58, 21 November 2009 (UTC)

Gloss definitions[edit]

What is meant by "gloss definitions"? - -sche (discuss) 10:21, 8 February 2013 (UTC)

I think it's a definition that is not a "form-of" definition. Maro 18:46, 15 February 2013 (UTC)
See here: gloss. It'd be good to add this link to the table header: [[gloss#Noun 2|gloss]]

Fix grammar[edit]

Template:edit protected

"requests for definitions, this may divide things incorrectly"

This is a comma splice. Please change the comma to a semicolon or add "and" before "this." 2001:18E8:2:1020:1463:E53C:61CD:5659 15:37, 13 June 2013 (UTC)

English lemmata[edit]

In June of 2012, Ruakh counted how many English lemmata Wiktionary covered in three different ways. See here. "Approach 1 gave 298,322; Approach 2 gave 299,516" and approach 3 (which lumped different parts of speech together, rather than considering them separate lemmata) gave 133,470. - -sche (discuss) 04:51, 30 August 2013 (UTC)

How does Latin have more entries and definitions than English?[edit]

How does a long-dead foreign language get more stuff here than the current, wider used, actual language of this wiktionary?- 00:20, 17 June 2014 (UTC)

Latin words have loads of inflected forms. — Ungoliant (falai) 00:21, 17 June 2014 (UTC)
Thanks! :)- 01:30, 17 June 2014 (UTC)
I prefer using the gloss definitions column as a measure of how much content we have in a given language. The entries and definitions columns are heavily biased towards languages with complex inflection. Poor English, with its 4~5 inflected verb forms, stands no chance against Latin, which has over 100. — Ungoliant (falai) 01:36, 17 June 2014 (UTC)
Maybe the gloss definitions column should be first one or should be given prominence in some other way. --Vahag (talk) 08:23, 17 June 2014 (UTC)
I support that idea. If no one objects I’ll change the format for the next dump. — Ungoliant (falai) 13:10, 17 June 2014 (UTC)
No objection, but if "gloss definitions" is moved to come after "definitions", the latter should probably be renamed "total definitions" in the interest of clarity. Actually, as long as things are being changed around, could you also put a 1 or something after gloss definitions, so it can be linked to an explanation like this? Given that even I who edit this dictionary had to ask what the term meant, the number of passersby who know what it means is probably small enough to make it worth a footnote. - -sche (discuss) 15:23, 17 June 2014 (UTC)
While we’re at it, if there is any other layout change anyone wants to propose, speak up. I’m thinking of moving the data of appendix defs/entries to the same columns as the non-appendix data, since most languages have 0 anyway. — Ungoliant (falai) 15:43, 17 June 2014 (UTC)
Now that we have categories for every language called "Foo lemmas" and "Foo non-lemma forms", maybe the number of pages in each of those categories for each language could be added to the table. —Aɴɢʀ (talk) 20:35, 21 December 2014 (UTC)

Translation statistics[edit]

I’ll be keeping translation statistics at this page. — Ungoliant (falai) 15:54, 28 July 2015 (UTC)

I'm gonna bookmark that :) —Aryamanarora (मुझसे बात करो) 22:05, 8 December 2015 (UTC)
Good stats, thanks! Russian at #2, after Finnish (60,823 translations). Not bad at all! --Anatoli T. (обсудить/вклад) 23:13, 8 December 2015 (UTC)
Finnish is a surprise to me - and then there's Hindi, somewhere in the 40's. —Aryamanarora (मुझसे बात करो) 21:39, 3 January 2016 (UTC)

Statistics on Sindhi language[edit]

The information on Sindhi language is NOT correct even as of 2-12-2015. There were more than 1000 definitions in Sindhi wiktionary on that date. Please fix the error.

Aursani (talk) 09:57, 21 December 2015 (UTC)

This information is about English Wiktionary only. — Ungoliant (falai) 13:50, 21 December 2015 (UTC)

Statistics on lemmas and non-lemmas[edit]

I think it would be useful if the statistics included measures on how many lemma and non-lemma entries have been created or removed. Right now there is only a generic "entries" column, but that includes all entries, and I don't know if it distinguishes cases where a new lemma POS section has been added to a page that already has a section for the current language. That is what I would consider an "entry", a single page can have multiple entries in one language. —CodeCat 21:35, 22 February 2016 (UTC)

Lemmas pie chart[edit]

Numbers from subcategories of Category:Lemmas by language, code copied from mw:Extension:Graph/Demo/CategoryPie:

The chart updates automatically. Would it make sense to add this to the page? --Yair rand (talk) 04:52, 24 February 2016 (UTC)

Why does, eg, Spanish have 47,817 lemmas, German have 42,014, but Spanish doesn't show up on the chart? DTLHS (talk) 04:57, 24 February 2016 (UTC)
Hm. Might be an API limitation. It seems to be ignoring all languages past the first 500 in the list. I'll go ask the author of the chart template if there's any way to fix it. --Yair rand (talk) 05:07, 24 February 2016 (UTC)
Apparently it can't find more than 500 subcategories at a time, and it can't automatically just get the largest categories. I've changed it to a manual list of the largest 150. Unfortunately, this won't automatically add in new languages that enter the top 150. --Yair rand (talk) (not logged in) 14:34, 24 February 2016 (UTC)