I use the word "most" much more often than I use the word "shall", but that's not how they're ranked in the Gutenberg texts. I suspect the language is changing enough that "should" is more common, and I think I use "most" about more often than that. Inevitably, there will be conflicts over what the 100 most common words actually are, in the English language; I'm not sure that Gutenberg got it completely right.
These categories really need an explanation at the top. I would have had no idea that this was from analyzing Project Gutenberg books if there had not been the previous comment above. Also, why are there 101 words in this cat? --Cromwellt|Talk|Contribs 15:48, 28 September 2006 (UTC)
Oh, and it depends on what the criteria are. If we are referring to the most common in print, that will be different from the most common in conversation, and both will be different from the most common in school books as well. The link at the top of this cat goes somewhere else, and I can't find the correct target. --Cromwellt|Talk|Contribs 16:09, 28 September 2006 (UTC)
Here's a alphabetized version of the LT's list. As you can see, the current contents of this category don't match it very closely...
I a about all an and are as at back be been before big but by call came can come could did do down first for from get go had has have he her here him his if in into is it just like little look made make me more much must my new no not now of off old on one only or other our out over right said see she so some that the their them then there they this to two up want was we well went were what when where which who will with you your
I suggested a feature be added to User:AutoFormat to ensure this and categories like it don't get modified; that'd help... JesseW 09:33, 7 February 2009 (UTC)
This discussion is no longer live and is left here as an archive.
This is outdated, and there are not 100 words in it! Below is a copy of the discussion, from Wiktionary:Requests for cleanup, more than 2 years ago. I'd strongly favor this information in appendixes.
First of all, there are 101 words in there. Secondly, I often see a word that ranks somewhere over a Hundred in Gutenberg, but is in this category. Third, there are so much of those lists around, I do not know which one to choose. henne 17:09, 11 January 2007 (UTC)
This was a list of words created and designated by THEM, and is not based on what words are most common. It's a "starter vocabulary", and the equivalents of these words are deemed to be a good starting point for a new Wiktionary project. --EncycloPetey 06:22, 14 January 2007 (UTC)
My analysis of Project Gutenberg (as a corpus) has no relation to this person's project. I find them interesting in comparison to each other, as well as to the other Frequency lists we have.
Perhaps if I actually had compared them in earnest, I would have noticed (before now) that it links to a copyright site, that has a no-commercial reuse clause. So this should move from WT:RFC to WT:RFDO. :-(
Are we talking about the UK National Literacy Trust list of 100 basic words? Do we think they are out to sue us for breaking copyright? There might be a reason for WMF (or us) to convince them to use one or more of the licenses that would make it automatic for us to keep it. It is safe to assume that they haven't copyright on the category name. Keep category, find or compile copyright acceptable list of basic words. Perhaps the vast technical resources of Simple English Wiktionary can help. Or we could borrow/link from them. DCDuringTALK 16:15, 13 November 2009 (UTC)
It could be replaced with the top 100 Gutenburg words by frequency. We should remove copyvio's regardless of what we think the other organization will do. If they do allow it in the future it can always be added back. --Bequw → ¢ • τ 17:31, 14 November 2009 (UTC)
The 100 most common words according to Gutenberg are mostly pronouns and prepositions. That list would not be too useful and we already have it as an appendix I believe. -- Prince Kassad 18:06, 14 November 2009 (UTC)
Keep unless this is an indubitable copyvio. --Dan Polansky 21:04, 15 November 2009 (UTC)
Keep per Dan Polansky. Razorflame 20:17, 16 November 2009 (UTC)
Keep & RFC These words were added by the long-gone Conan (and his Bot). The original link on the category page was not to the UK site but to here (a cached copy of the now-dead original page). But oddly, the words in this category don't match up to either published list. So there's no copyright violation going on. It leaves us with the fact that we don't know how these words were choosen, which is why I think they should be sent for cleanup, not deletion. --Bequw → ¢ • τ 20:37, 17 November 2009 (UTC)