User talk:Emi-Ireland

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Welcome!

Hello, welcome to Wiktionary, and thank you for your contributions so far.

If you are unfamiliar with wiki editing, take a look at Help:How to edit a page. It is a concise list of technical guidelines to the wiki format we use here: how to, for example, make text boldfaced or create hyperlinks. Feel free to practice in the sandbox. If you would like a slower introduction we have a short tutorial.

These links may help you familiarize yourself with Wiktionary:

  • Entry layout explained (ELE) is a detailed policy documenting how Wiktionary pages should be formatted. All entries should conform to this standard. The easiest way to start off is to copy the contents of an existing page for a similar word, and then adapt it to fit the entry you are creating.
  • Our Criteria for inclusion (CFI) define exactly which words can be added to Wiktionary, though it may be a bit technical and longwinded. The most important part is that Wiktionary only accepts words that have been in somewhat widespread use over the course of at least a year, and citations that demonstrate usage can be asked for when there is doubt.
  • If you already have some experience with editing our sister project Wikipedia, then you may find our guide for Wikipedia users useful.
  • The FAQ aims to answer most of your remaining questions, and there are several help pages that you can browse for more information.
  • A glossary of our technical jargon, and some hints for dealing with the more common communication issues.
  • If you have anything to ask about or suggest, we have several discussion rooms. Feel free to ask any other editors in person if you have any problems or question, by posting a message on their talk page.

You are encouraged to add a BabelBox to your userpage. This shows which languages you know, so other editors know which languages you'll be working on, and what they can ask you for help with.

I hope you enjoy editing here and being a Wiktionarian! If you have any questions, bring them to the Wiktionary:Information desk, or ask me on my talk page. If you do so, please sign your posts with four tildes: ~~~~ which automatically produces your username and the current date and time.

Again, welcome! Chuck Entz (talk) 01:54, 27 May 2014 (UTC)

Category boilerplate[edit]

Would you please put this on the other categories you created too? Keφr 07:50, 14 August 2014 (UTC)

Keφr, thank you. I did as you suggested. Please let me know if you see any other things that need to be corrected. I'm a newbie and want to make sure I'm not making a mistake that later will have to be rectified for hundreds of words. I hope you see this; I am trying to figure out where to answer you. Emi-Ireland (talk) 21:46, 14 August 2014 (UTC)

I think it's commendable that you want to be so cautious. Not everyone is, and that can cause frustration sometimes. For now I think it's best if you look to the existing things (entries, categories) for other languages, and then copy it and modify it to suit your language. For example, if you ever want to create Category:Wauja numerals, then you can look at the contents of Category:English numerals for an example, and copy the code that is there while changing the language code. —CodeCat 21:56, 14 August 2014 (UTC)

Thank you, CodeCat. That was useful advice. I have started a numerals category. Emi-Ireland (talk) 03:41, 15 August 2014 (UTC) By the way, does anyone know why when I mouse over "Wauja" on this page: https://en.wiktionary.org/wiki/Appendix:Cardinal_numbers_0_to_9 , the alt text says "this page does not exist"? Emi-Ireland (talk) 04:01, 15 August 2014 (UTC)

The same reason the link is red: no one has created a dictionary entry for the term Wauja.Chuck Entz (talk) 04:28, 15 August 2014 (UTC)

I did create an entry yesterday: https://en.wiktionary.org/wiki/wauja#Wauja. Maybe I did not code it properly. Or perhaps it needs time to update across wiktionary? Emi-Ireland (talk) 15:18, 15 August 2014 (UTC)

I don't think you should be leaving extra information on category pages. It's unlikely that someone will notice them there. It's better to put the information in the entries themselves, or in an appendix page. —CodeCat 15:22, 15 August 2014 (UTC)
First of all, Wiktionary is case sensitive, so edits to wauja have no effect on links to Wauja. Secondly, the point of the link provided by the template is to refer people to the English-language entry on the language, if there is one, in case they've never heard of the language. Chuck Entz (talk) 17:42, 15 August 2014 (UTC)

Links to categories[edit]

Start the link with a colon: Category:Wauja numerals. — Ungoliant (falai) 15:30, 15 August 2014 (UTC)

Thank you for answering my question before I even had time to ask it! Also, thanks very much for advising me about where to put various types of information. I want to make sure I understand your comment. Which extra information are you referring to? Are you referring to the Usage Notes on Wauja Numerals I posted about 1 minute ago on this page: https://en.wiktionary.org/wiki/Category:Wauja_numerals ? The Wauja have only about a dozen cardinal numbers. For each entry, I wanted to include a link back to the general usage notes on numbers. If this is not the correct place for me to ask such questions, please let me know.Emi-Ireland (talk) 15:36, 15 August 2014 (UTC)

If you want to add the same usage note to many pages, it would be best to create a template for them. You can then transclude the template onto many pages easily. —CodeCat 15:38, 15 August 2014 (UTC)
Information like that is usually present in the Appendix namespace, for example: Appendix:Portuguese nouns, Appendix:English verbs. It’s uncommon for categories to have any content other than the catboiler template and the entries (which are listed automatically). — Ungoliant (falai) 15:43, 15 August 2014 (UTC)

Thanks for the helpful advice. I will move the explanatory content from the Category page to an appendix page. Regarding using a template: is it considered bad form to include the same info on multiple pages? Because the explanation of how the Wauja use numbers is a fairly long paragraph. Should I simply put that paragraph on an appendix page and link to it from all the cardinal number lemma pages? Or is it OK to have the same thing appear on every cardinal number lemma page? Thanks very much for your advice. Emi-Ireland (talk) 15:49, 15 August 2014 (UTC)

That’s up to you. Here’s an example of what CodeCat is saying: {{U:en:who and whom}}. It is used in the pages who, whom, whoever and whomever. — Ungoliant (falai) 15:53, 15 August 2014 (UTC)

Thank you for the example, Ungoliant. Yes, I think I will make a template. The Wauja are much less reliant on numbers than are speakers of English, and that needs to be put in context. Their language is very rich and complex, but in their daily life, they traditionally had little need for numbers above five. During the past generation, however, they have been brought into the Brazilian cash economy to some degree, and this has suddenly created a linguistic need for using and understanding numbers in the hundreds and thousands. It will be interesting to see how the young generation of Wauja handles this.Emi-Ireland (talk) 16:47, 15 August 2014 (UTC)

yanumaka[edit]

Friendly advice, per WT:WWIN "Wikitionary is not Wikipedia". Specifically, a lot of the content you added in this (particularly the culture section) does not belong in a dictionary. It may be of use on Wikipedia though. :) User: PalkiaX50 talk to meh 21:03, 17 August 2014 (UTC)

I understand. I'll remove it. Certainly the history of jaguars in relation to the community should go elsewhere. But it seems there are some gray areas here. I think that to understand the meaning of the word jaguar yanumaka in Wauja, it is relevant to know that the jaguar is a symbol of inherited chiefly rank. It is not a definition of the word, but a symbolic association that is so important that jaguar ornaments are used to convey a message to all who visit the village that a certain man is the chief. In other words, the Wauja word has strong symbolic associations that the English word does not. These symbolic associations are not literal definitions of the word. Nonetheless, I understand the need to respect the culture of Wiktionary, and will remove all the historical and cultural material. If I put the cultural material in Wiktionary, is there any way to keep it in one grouping by culture, comparable to a grouping by language in Wiktionary? Thanks.Emi-Ireland (talk) 22:05, 17 August 2014 (UTC)

Would a definition like this work for yanumaka? # {{label|wau|figuratively}} chief, leaderCodeCat 22:12, 17 August 2014 (UTC)

Well, that's an interesting question. The problem is, you would never say, "He's a yanumaka." or "He's like a yanumaka." Grammatically, you could construct such a sentence, but it wouldn't make sense to a Wauja. That is simply not a way the word can be used. However, Wauja can -- and do -- sing verses that are ostensibly about jaguars, but that everyone listening knows is in direct reference to the chief and no one else. I think I need to have a separate space, outside Wiktionary, where I can elaborate such things. The problem is, I can put up a site in WordPress, but after I'm gone, it will not persist. I guess a book would make the record permanent, but the Wauja themselves (and indigenous people generally) use the Internet much more than books. Also, the things I am removing from yanumaka are the very things that the Wauja themselves would think were important and want to keep in. I do understand, however, that Wiktionary can't be everything to everyone. Is there a way to use Wikipedia as a place to record ethnographic information about the Wauja in English, Portuguese, and Wauja? The idea is to have the information equally accessible to Wauja and non-Wauja. Academic ethnographies printed in books are not nearly as accessible as web pages that are maintained by a community.Emi-Ireland (talk) 22:46, 17 August 2014 (UTC)

Such poetic allusions happen in every language, even English. Medieval European texts are full of such symbolism. So is Shakespeare for that matter, and many religious texts. But even modern texts are full of references that are assumed understood. Understanding them requires understanding the context, the culture in which it was written. Because we live in the culture they're written in, we don't even realise that they might be incomprehensible to someone else, because it's obvious to us. I don't know if that means it's lexically significant to include it in a dictionary entry, though. —CodeCat 22:38, 17 August 2014 (UTC)

One thing occurs to me... for endangered languages, Wiktionary is an important linguistic a cultural revitalization tool. Could there be a way to allow (for endangered languages, at least) to include cultural and historic information that could be hidden if the user preferred not to see them? Because endangered languages have so little content written in them or about them. It's mostly bibles or inscrutable linguistic treatises by non-native speakers.

My current plan is to get a thousand words in the Eng/Wauja site, and then start setting up Eng/Port, and keep adding to both sites. Next year, I will be visiting the community and training young people to work on their own incubator site, all in Wauja. If historic and cultural information cannot be included in the English/Wauja and Portuguese/Wauja sites, could it perhaps be included in the all-Wauja site? Because that would be of much greater benefit to the Wauja community. Thanks for any ideas you can offer.Emi-Ireland (talk) 22:46, 17 August 2014 (UTC)

You might check out Wikibooks. I hope my edit didn't come as too much of a shock, but I wanted to show what a typical dictionary entry would be like. There's room for some of the stuff I removed to be incorporated into the definition or perhaps a usage note, but the important thing to remember is that dictionaries are about the words, not the things the words refer to.Chuck Entz (talk) 22:47, 17 August 2014 (UTC)
To elaborate: I think you should have the bulk of your information at wikibooks (www.wikibooks.org), and have the Wiktionary entries on the words link to it, just as I put a link to the Wikipedia entry. Chuck Entz (talk) 22:57, 17 August 2014 (UTC)
OK, will do. Thanks for taking the time to explain to me what I need to do. In looking at my other entries, it appears I was filling in this additional information only for animals. I didn't do it for other kinds of terms. I will be mindful of that going forward. I would be grateful if you looked at some of my other entries (non-animal lemmas) and let me know if they are OK. I want to make sure I am on the right track going forward. Emi-Ireland (talk) 23:15, 17 August 2014 (UTC)

Thanks for providing an example. Is it considered bad form to provide external links to more information?Emi-Ireland (talk) 22:55, 17 August 2014 (UTC)

It's ok, within reason. Links to sister projects- including Wikibooks- are routine. As for other sites, subject-matter links should be used sparingly, but are ok as long as they don't dominate the entry. Since we're so limited in our scope, it can be helpful to link to more comprehensive sources. I removed your references mostly because they seemed to be for the purpose of backing up the encyclopedic content rather than as sources for further information. Since Wiktionary is usage-based, rather than authority-based, there's normally no need to include references for definitions, except examples of usage. With less-documented languages such as this, though, there may be no usage in citeable form, so a dictionary will do. It's also routine to link to authoritative dictionaries, lexicons, etc. to give more information on a term. The only external links that are inherently bad form are to commercial and ideological sites that are trying to sell something or to push an agenda. Chuck Entz (talk) 23:44, 17 August 2014 (UTC)

Regarding Shakespeare, you are right, all languages and cultures make use of symbolism. I guess I would say that if the cultural context is radically different, you need more explanatory notes. You need lots of notes to read Chaucer, even more for Beowulf. Without explanatory notes, it doesn't make much sense.Emi-Ireland (talk) 22:51, 17 August 2014 (UTC)

But "different" is relative, and Wiktionary tries to be agnostic about that. It's aimed at English speakers, but not necessarily people born and raised in Anglo-Saxon-Norman culture. People in India will make very different allusions and references that a speaker in Scotland might not understand, despite it being English. —CodeCat 23:16, 17 August 2014 (UTC)

That's true. Thanks for the feedback. I'm glad you guys pointed out your concerns before I had gone too far down that path.Emi-Ireland (talk) 23:29, 17 August 2014 (UTC)

Some notes[edit]

I made some changes to -naun.

  1. I wrapped the definition in the template {{n-g}}, which is short for {{non-gloss definition}}. This template should be used when the definition is not actually a meaning but more of a description of what something is or how something is used.
  2. I also changed the headings; according to WT:ELE (the main policy for entry layout), headings should only have the first word capitalized, and certain headings should be level 4 (with ====).
  3. I added a cat2= parameter to the {{head}} template. This simply tells it to add a second category to the entry. In this case I've added Category:Wauja inflectional suffixes, which seems fitting because this is apparently a plural suffix. There are also categories for other types of suffix depending on what kind of words they form, such as Category:English noun-forming suffixes.

I also have some notes about the long list of related terms.

  1. Words that are actually derived from something use the "Derived terms" heading instead.
  2. For suffixes, you can also use this instead of listing all the terms manually. On a line right below the "Derived terms" header, write: {{suffixsee|wau}}. This will automatically show a list of all the pages in Category:Wauja words suffixed with -naun, which is much easier than having to maintain the list manually.
  3. More importantly though, if this is simply an inflectional suffix, then it is probably better not to use a category for it at all. We don't put all the English plural noun forms into Category:English words suffixed with -s; just imagine how big such a category would be! So what I would recommend is to use this on the entry yamukunauntope: {{affix|wau|yamukunaun|t1=children|-tope|t2=all, every}}. After all the word was formed by adding -tope to the plural, not by first creating the plural. The plural already existed before.

CodeCat 02:12, 12 November 2014 (UTC)

Yes you are right on all counts. This is very helpful. I will remove the list of words and put it in my personal "words to post" list. Then the words will show up on the appropriate category page. The list is somewhat useful (unlike a list of English words suffixed with -s) because only a few categories of nouns can take plural at all. Emi-Ireland (talk) 02:55, 12 November 2014 (UTC)

I changed the Etymology for yamukunauntope as suggested. Makes more sense this way. People can still find the singular if they care to do so.Emi-Ireland (talk) 03:00, 12 November 2014 (UTC)

An alternative you could try is, instead of categorising the plural forms, categorise the singular. Many languages divide lemmas of words based on the type of inflected forms they take. For example, there's Category:Dutch nouns with plural in -en, Category:Finnish kala-type nominals or Category:Latin second conjugation verbs. So if you think it's better, you can make a category like Category:Wauja nouns with plural in -naun or if you don't need to be specific, Category:Wauja nouns with plural. There is no ready-made category for this, so you'll have to add the parent categories yourself if you create it. —CodeCat 03:04, 12 November 2014 (UTC)

Thanks, that sounds like a very good idea. I noticed that some of my -naun plurals (seen here https://en.wiktionary.org/wiki/Category:Wauja_non-lemma_forms) are not showing up on the page "Wauja words suffixed with -naun" (at https://en.wiktionary.org/wiki/Category:Wauja_words_suffixed_with_-naun) Shouldn't they be listed there, as with any other suffix? Emi-Ireland (talk) 03:18, 12 November 2014 (UTC)

They don't just magically appear in such a category. You have to use a template that's designed to add them. That's what {{suffix}} or {{affix}} or {{prefix}} does. Quite often, half of what templates do is add entries to categories. Some of them, such as {{head}} can do quite a variety of categories, depending on the parameters, while others are very simple and straightforward. Chuck Entz (talk) 03:34, 12 November 2014 (UTC)
Thank you, Chuck, that's what I needed. I put in an etymology for amunaunaun, and it did the trick.Emi-Ireland (talk) 03:50, 12 November 2014 (UTC)

Question about adding a parameter to create a footer link to the other suffix category[edit]

One last question tonight -- I like yamukunauntope much better now with your edits. It's cleaner and more concise. However, at the bottom of the page, there's a link to "Wauja words suffixed with -tope," but no longer a link to "Wauja words suffixed with -naun". This word has two suffixes and it's good to keep track of that. Some words have three or more suffixes and it will be interesting to monitor which can combine with which, in what sequence, and which are never seen together. Is there some way I can add a parameter to one of the templates on the page that will not add anything to this entry except a link at the bottom: "Wauja words suffixed with -naun"? Thanks for all your advice. It is a challenge to keep up! Emi-Ireland (talk) 04:40, 12 November 2014 (UTC)

There is no clear answer to this because it depends on how you interpret "suffixed with". So far, most people on Wiktionary have interpreted it to mean "this word was created by adding this suffix". In the case of yamukunauntope, the suffix -naun was already there, so it wasn't created by adding it. Therefore it's not added to the category. There are probably some people who think that the categories should show all the affixes that a word contains rather than only those which were used to create the word from its parts. So this is really something that might need a Beer Parlour discussion first as there are probably a lot of arguments in favour of one approach or the other, and also many possible ways to solve it. —CodeCat 14:46, 12 November 2014 (UTC)

Miscellanea[edit]

  • aminya — 6K worth of examples is an overkill, and the first definition suggests that this is a particle, not an adverb.
  • Category:wau:Obligatory Possession — category names of this form are reserved for topical, not grammatical categories.
  • Please stop making links to nonexistent sections in your edit summaries.
  • What kind of reference is '"Aminya pagatapai" written by Tukupe Waura, Facebook IM with E. Ireland, 10/09/2014.'?

Keφr 13:11, 31 December 2014 (UTC)

Thanks for your comments, as always.

  • Re: examples for entry aminya: I normally use fewer examples. In the case of this word, there is not a ready gloss in English. It can be translated many ways. I will look through the examples and see which, if any, are redundant and can be deleted or used for another lemma, instead. An important goal in contributing to Wiktionary is language sustainability for an endangered language and the people who speak it. For the grandchildren of today's fluent speakers, thorough documentation via examples is critical. Entries without examples are of minimal value, at best. From the perspective of endangered languages, the problem is not that there are too many documented examples, but that there are too few. Native North Americans who are working on language revitalization in their communities have impressed upon me that detailed examples, and variety of examples, are what is needed, along with good audio recordings. Your comment, however, suggests that "too many examples" is indeed a problem for Wiktionary admins, and I want to understand your concerns and abide by community norms. I had not considered, for instance, that 6K might be seen as a problem. Is it a server load issue? I understand that millions of 6K increments add up. In any case, I will make sure that my examples are not redundant.
    • Have you looked into using something like {{+obj}}? There is no documentation for it yet, but the syntax looks like {{+obj|wau|infinitive|means=the negated activity}} = [+ (infinitive) = the negated activity]. I think it would show the grammatical structure much more clearly and concisely than a series of examples (which for the reader, especially one not familiar with the language, essentially means guesswork) could. Keφr 06:07, 1 January 2015 (UTC)
  • Re: adverb vs. particle: aminya is hard to pin down. I would say it's not a particle, since a particle is "a minor function word that has comparatively little meaning and does not inflect, in particular." This is not the case with aminya, a negator, which completely changes the meaning of the associated verb clause, and also makes the statement imperative or conditional. The closest English equivalents are "not" (an adverb) and "don't", "shouldn't", "mustn't", etc. (all verb constructions). It cannot be a verb, as it does not inflect. Of course, lack of inflection is not a property exclusive to particles. I will ask Alexandra Aikhenvald, an authority on Arawak languages, how she advises that I label it.
    • I think words that determine a verb's mood (especially in a visibly grammatical way) are typically labelled particles, since this quality is not typical of adverbs (even though you could argue that words like probably denote some kind of modality). English "mustn't", "shouldn't" and "don't" are contractions of an auxiliary verb and a particle. Maybe this is a "predicative" (I saw this part of speech proposed once for some language, it might have been Slovene)? Maybe a defective verb? How similar is this word to other (words that can doubtlessly be called) adverbs? Does it occupy the adverb slot in standard word order? Can it modify an adjective? Does it even make sense to apply "Indo-European" part of speech classification to Wauja? And so on. Keφr 06:07, 1 January 2015 (UTC)
  • Re: request not to use "obligatory possession" as a category: OK, I'll abide by that. I was under the impression that I could create custom categories, but apparently I was mistaken. That being the case, is there any way that I can somehow tag or mark or categorize certain entries to allow me to sort them into a group, and see them listed together on one page, the same way you can do for suffixes? An important feature of Arawak languages is that nouns fall into classes according to how they are inflected. It would be useful to be able to click on a link and see several hundred nouns meeting certain criteria, then click another link to compare that group to another such set. Is that possible on Wiktionary?
    • I did not say that this sort of category is undesirable — maybe it is. I am saying that it uses the wrong naming convention: prefixing by language code is reserved for topical categories, not grammatical ones. Though in fact, we do have a fairly elaborate category boilerplate system, and it would be nice to integrate it into that. Also, I dislike using the term "possession" for the word. I suggest something like "Wauja nouns taking an obligatory possessive". Keφr 06:07, 1 January 2015 (UTC)
  • Re: links to nonexistent sections in my edit summaries. Being relatively new to Wiktionary, I mistakenly thought that on a page with entries in multiple languages, it was expected that I would add the code for my language along with all the others. I did not realize this applied only to languages with their own dictionary, but now I do. If this is not the issue you are flagging, please give me an example so that I will understand.
    • This is not what I was referring to. Every time you put something between /* and */, you link to a section whose name is between those two markers. These are not mere boilerplate you put there to show off your markup-savviness. Keφr 06:07, 1 January 2015 (UTC)
      • Ah, thanks for the clarification. That's very different from the conventions I am familiar with as a web developer. When you code CSS, a "comment" is anything put between /* and */, and it is ignored by the browser. It's seen only by those who look for it. You put comments in there as a courtesy to the coders who may have to edit your work at a later date, so they can understand how you intended the code to be structured. Clearly-commented code is one aspect of doing the job well. So I assumed the "edit summary" was supposed to contain a few words summarizing the edit I had just made. Apparently this is not the case. —This unsigned comment was added by Emi-Ireland (talkcontribs) at 18:14, 2 January 2015 (UTC).
        • No, you still failed to get it. Edit summaries are for precisely what you described. I was pointing out that the semantics of /* */ in edit summaries are different from CSS (which I know quite well, you need not explain it to me). Look at the history of this page and see.
          • I looked at the history and see "Miscellanea: head to wall. thump. thump. thump. thump. thump." next to your name. This does not help me understand. I will research this over the weekend and find out what you are trying to say. BTW, why did you change what I had written on my talk page? Was it because of the link? Thanks. Emi-Ireland (talk) 22:23, 2 January 2015 (UTC)
            • To the left of that is an arrow. If you click the arrow, you are taken to this very section, as you can see in the browser's status bar. Now look at the other arrows in edit summaries and see where they take you. Keφr 00:29, 3 January 2015 (UTC)
        Also, could you put a {{Babel}} tower on your user page? I would appreciate it. Keφr 18:29, 2 January 2015 (UTC)
  • Re: reference to non-published works: Wauja is an endangered language. To date, there are no published works (other than a few medical information leaflets) by native speakers written in Wauja. (There are bible translations written by English-speaking missionaries with translation assistance from Wauja speakers, but that is not the same as a bible written by a native speaker of Wauja.) The Wauja have a rich literary tradition, but it is oral. This means that all examples refer to utterances that are recorded and transcribed or, in recent years, gleaned from emails and other written electronic communications. I regularly communicate in written Wauja with native speakers via email, Skype, Facebook and Facebook IM. Since these short texts are written by the native speakers themselves, and not transcribed by me or another non-native-speaker, such email and Facebook IM communications are perhaps even more authoritative than the other forms of reference I have used (such as transcripts of recordings by village elders, which I transcribed.) Emi-Ireland (talk) 19:15, 31 December 2014 (UTC)
    • For WT:LDLs a single citation will suffice, even a mention (i.e. from a dictionary). I am of the opinion that in a wiki, "mommy told me" is not an acceptable reference. The risk of perpetrating hoaxes is too high to allow that. Keφr 06:07, 1 January 2015 (UTC)

You raise an important point, and I address it below. (I will think about your other points above and respond after speaking to some colleagues.)

Understanding Constraints of Documenting Languages that have no Tradition of Writing[edit]

Existing references in Wauja[edit]

I respect your desire to avoid any risk of invalid entries. That certainly is important, not only for this Wauja Wiktionary, but for the future of Wiktionary as a whole. With this in mind, the issue we must address together is that, for this endangered language, with its rich oral tradition, there are — as yet — no dictionaries or published works to cite. There are none. (Except perhaps for one 4-5 page pamphlet for health workers, written by native speakers, which I am using as a reference mainly for anatomical vocabulary. There are some problems even with that thin publication, as it was written in Portuguese by Brazilian doctors, and translated to Wauja by young Wauja health workers. I recall how difficult and awkward it was for them to translate bizarre concepts such as "the food pyramid" and "carbohydrates" into Wauja, which does not need or use these concepts, much less have vocabulary to describe them. The result, overall, is not an authoritative reference for the Wauja language.)

So although I would gladly follow your very reasonable advice ("a single citation will suffice, even a mention (i.e. from a dictionary)"), alas, I cannot, at least until the Wauja publish in their language. Does that mean the language should not be documented on Wiktionary? Of course not! All the more reason to document it. I am sure we agree on that. In fact, I believe that having a robust Wauja Wiktionary will make it more likely that the Wauja will publish in their language, and that we will be able to cite published works someday soon.

Let's agree on what we can use for references, given the issue we are facing – a rich oral tradition, but the complete absence of published works authored by native speakers. Until now, I have relied mostly on:

  1. transcribed oral communications that have been recorded during public performances by skilled and linguistically authoritative storytellers and other elders and,
  2. written electronic communications produced by the first generation of literate native speakers of Wauja. These are written by young Wauja adults whose first language is Wauja, but who have learned Portuguese well enough to be sent out to the border towns to continue their formal education, whether high school or university.

Which sources are most reliable and authoritative?[edit]

Each of these sources has advantages and disadvantages. The audio recordings of elders speaking in public are very authoritative linguistically, but they represent only one kind of speech (formal). Typically, they are orations, and they underrepresent natural conversational patterns used in daily life. In addition, there is the problem of transcription. When I first transcribed these recordings thirty years ago, there were no literate Wauja at all, and so the transcription had to be done by a non-native speaker (me). I speak Wauja, but I am not a native speaker. Therefore, although I consider my transcripts worthy references (they have all been checked phonetically by native speakers), I consider emails from young literate Wauja in some ways more authoritative than anything I personally transcribe.

That's because even if I transcribe the sounds perfectly and translate the meaning perfectly (a tall order), I might well put the breaks between words in different places than a native speaker would do. In fact, because the current generation of Wauja speakers is the FIRST ever to read and write, I find that there is wide latitude in how different native-Wauja-speaking writers put the sounds of their language into written letters. Perhaps somewhat like the authors of old English texts, who often were not uniform in how they spelled words, so are the Wauja today in a protean stage of adopting literacy. After all, they don't even have a dictionary. (I'm trying to help them remedy that.)

Yet, despite these minor inconsistencies, every written text by an articulate and knowledgeable native speaker is valuable because it is unfiltered by non-native speakers. I realize that people typically do not write emails carefully, and often misspell words. That's why I never use such communications unless I have specifically checked the spelling and the utterance with the author. Each utterance that has been checked is associated with a named person in the published lemma, and they know that. (The only exceptions are statements that must be anonymous because of their content, to avoid embarrasing any individual. I have not needed to do that yet, but I know that such instances will arise.)

Request for suggestions I can implement[edit]

I have responded in detail to your comment because I want to assure you that I fully share your concern that each Wiktionary entry be reliably and responsibly referenced. I hope that you now understand some of the constraints I am working under, given that there are no published works by native speakers. If you can suggest specific ways I can improve the way I reference entries, I would be grateful. Specifically, if carefully-checked written communications that I receive from articulate native speakers are not acceptable, please let me know what is. I feel uncomfortable giving greater priority to my own transcriptions than to written communications authored by native speakers. They are both valuable for documenting the language, but surely written artifacts produced entirely by native speakers and unfiltered by outsiders should not be discarded.

Why Wiktionary is particularly important for endangered language communities[edit]

I am very committed to documenting Wauja in Wiktionary, precisely because I want the Wauja themselves (and the world) to have full global access and fully shared opportunities to participate in this project. I want to abide by Wiktionary community norms, and I also hope the Wiktionary community will welcome the Wauja project, and allow the Wauja language to be documented, despite the lack of published works. Some of my academic colleagues are skeptical that Wiktionary is a good platform for what I am trying to do, saying that Wiktionary cannot accommodate the special requirements of documenting an endangered language. But I don't want another proprietary project on a proprietary platform that ultimately does not give native speakers full participation in building their own lexicon, and ultimately is inaccessible to the people who need it most. I want the three planned Wauja Wiktionary sites to show that you can have a globally-accessible open-source platform, full language-community participation from anywhere in the world, and outstanding scholarship.

I have been skyping the Wauja over the holidays, and they are very excited about the three-part project that we are planning: (1) the Wauja-English dictionary, already underway; (2) a Wauja-Portuguese dictionary (to be launched in 2015), and (3) a Wauja-Wauja dictionary with a Wauja interface (perhaps launched in 2016). Too often, despite the best intentions of outside linguists, speakers of endangered languages have not had access to tools that would allow them to participate fully in building their own lexicons. Wiktionary is the antidote to that. But to make this possible, the Wiktionary community must understand that building a lexicon for an endangered language with an exclusively oral tradition requires methodologies that may be somewhat different from those suitable for languages with an established tradition of literacy.

In the summer of 2015, I will travel to Brazil to train a team of young lexicographers on Wiktionary norms and on how to post lemmas on the Wauja-Portuguese site. This project will be led by a Wauja village schoolteacher who is also a third-year university student in Language and Literature at the state university near his reserve. He and his colleagues, on their own initiative, have already convened a two-day open meeting on orthography, attended by members of all three villages, who patiently sat through two days of discussions on spelling, all while crammed into a single schoolroom in the sweltering heat. These people are committed! This meeting was convened precisely because newly-literate people are spelling things inconsistently. He and the other young schoolteachers are thrilled that they will be trained to manage the Wauja-Portuguese site, and ultimately build a Wauja-Wauja one (perhaps starting 2016). They are amazed and delighted that there will be no printing and distribution costs as they create and maintain a digital dictionary that can be used in their bilingual curriculum.

How Wiktionary and Endangered Languages can strengthen each other[edit]

I am doing my very best with each entry I post, but I know that the Wauja eventually will suggest corrections and improvements to my contributions. That is as it should be. As the community gradually works out the details of their orthography, the spellings of certain entries will change. Those corrections will make the current site even better. In the meantime, if you see ways I can improve my methods, please give me specific suggestions that will contribute to the success of this project, and that take account of the special constraints of documenting a language that does not have a written tradition. Wauja has a magnificent oral literary tradition, by the way. It's a shame I have to frame the discussion around whether they "lack writing," or whether their first hard-won attempts at writing, using email and other digital tools, are "acceptable" (to us) or not in documenting their own language. Instead, I respectfully request that we find ways to carefully and responsibly document everything that they do have, and in so doing, perhaps help them keep it.

Emi-Ireland (talk) 22:26, 2 January 2015 (UTC)

You have good reasons for the choices you've made, but I have to admit that the whole Wauja project is technically in violation of our Criteria for inclusion. This isn't, by itself, a fatal problem: the CFI are the result of community consensus, and can be modified by the community. So far, those of us who have been watching this unfold have simply held back on enforcing some of the rules, but someone of a perverse/contrarian disposition (we have a few) could request verification for any or all of the Wauja entries, and the admins would be forced to delete them.
I think it may be time to bring this to the Beer parlour and make it official. It would probably also require a formal vote. I would guess the simplest route to take would be declaring that we would accept certain sources for verification even though they don't meet the requirements of our CFI. This might require some extra safeguards to avoid leaving us open to future fraud, as Kephir alluded to. It would also help to minimize the possibility of w:Wikipedia:Randy from Boise-type issues with people who know nothing about Wauja saying "if they can make edits not based on published sources, why can't I?"
The main obstacle I see is the "no original research" principle, which is one of the philosophical pillars of all Wikimedia projects. We allow more leeway in that respect due to the highly condensed and synthesized nature of dictionary entries, but we can't ignore it entirely.
Even if we end up with the worst-case scenario of the Wauja project being rejected, the data would be retrievable for a while as part of the XML dumps, and the wikimedia software can be used by non-Wikimedia sites, so it might be possible to revive it somewhere else. I hope it doesn't come to that, though. Chuck Entz (talk) 23:57, 2 January 2015 (UTC)
Thank you, Chuck, for your very reasonable, thoughtful, and constructive comments. I agree with all of them, and was just about to ask you whether I could post my comments in the Beer Parlour. If I cut out the details, people may not fully understand why the current inclusion criteria, though perfectly reasonable for languages with a written tradition, effectively prevent languages that have an exclusively oral tradition (such as Wauja), from being included in any meaningful way. Let me know whether I should point to this page, or copy and paste my comments to the Beer Parlour.
Off the top of my head, I think we can set up general criteria that would avoid any "Randy from Boise" issues. For instance, in the absence of published materials written by native speakers, it seems reasonable for Wiktionary to ask for documentation supporting a certain percentage of entries, randomly chosen by an admin serving a quality control or auditing function. The contributor would comply by providing documentation for the entries under consideration. This could include original audio and video recordings in the endangered language, transcripts, digital manuscripts and communications. I have written up some additional ideas, but will sleep on it before posting them here tomorrow. Thanks again for your candid and very helpful comments. Emi-Ireland (talk) 05:20, 3 January 2015 (UTC)

Chuck, I'm thinking through the distinction between LDLs and Languages without a Written Tradition (which I'll call LWTs). LWTs are a subset of LDLs. Many LDLs have an extensive written tradition, and can indeed be referenced with published sources. LWTs, on the other hand, by definition, do not have a written tradition and therefore do not have a body of publications authored by native speakers of that language. I think the current LDL criteria are probably appropriate for any LDL language with an establish written tradition. The issue is LWTs. Obviously, we cannot ask languages without publications to reference only publications. I am thinking through the issues and possible solutions and will post my suggestions in the next day or so. Emi-Ireland (talk) 22:03, 3 January 2015 (UTC)

Request to Add New Subcategory "LWT" Within LDL[edit]

This is a request for the Wiktionary Community to consider adding a new subcategory, Languages without a Written Tradition (LWT), under LDL (Less Documented Languages).

What is an "LWT"?[edit]

An LWT is a language that has an oral tradition, but has no tradition of writing and no written publications authored by native speakers. LWTs are a subset of LDLs. (Note that documents authored in other languages by outsiders and merely translated by native speakers, such as the Bible and government documents, are not suitable as sources for documenting a language.)

Why not Simply Call LWTs "Unwritten Languages"?[edit]

The term "unwritten" can be misleading, because the boundary between languages that are "unwritten" and languages that are "written" is actually quite fuzzy. Presumably we can all agree that a language community that has no writing system, no notion of literacy, and has never had its speech transcribed by outsiders can be considered an "unwritten language."

But when that community is visited by linguists who develop an orthography, and (perhaps imperfectly) transcribe some words and phrases from the spoken language into written form, perhaps publishing the results, what then? Is this language "written," even if no one in the language community is literate, and the published "results" contain the errors of a non-native speaker? Some of you might call such a language "written," and others just as reasonably might say it is "unwritten."

Let us now consider a third example. What about a small indigenous language community in Brazil that is completely unfamiliar with writing, and yet, through a process of increasing contact with the national society, develops an orthography and village schools, where children are taught to read and write in both their indigenous language and Portuguese? Obviously, when nearly every child can write words in their own language, the language cannot be considered an "unwritten language." Yet is it a "written" language? Does it have great literature? Yes, in oral form. Poetry? Absolutely, in oral form. Historical narratives, sacred texts, genealogies, song lyrics, compendiums of botanical and zoological knowledge? Yes, all in oral form. What, then, is written in this language? Aside from basic word lists and literacy primers modeled on Portuguese examples, virtually nothing — yet. Today's young adults are the first literate generation.

This is the case with Wauja, an Arawak language spoken by 400 indigenous people in lowland Amazonia. Although Wauja was "unwritten" a generation ago, today it is "written," in the sense the children are taught basic literacy in their village schools. However, as yet — and this doubtless will change — there is no written tradition in this language, no body of publications authored by native speakers. All their literature is still in oral form.

For the purposes of Wiktionary, the key issue is not whether a missionary or professional linguist has phonetically transcribed snippets from the language, but whether there exists a body of work authored by native speakers that is large enough to provide references for every word in the language. For languages like English, Chinese, and all "major" languages, the answer is yes. These languages have extensive written traditions. For thousands of small and endangered languages, the answer is no. These are languages with rich intellectual and literary traditions — in oral form. Such languages may have some (recently-acquired) knowledge of writing, but they have no tradition of writing. This presence or lack of native-speaker-authored published references is the distinction that matters for the Wiktionary community, at least in reference to inclusion criteria.

Why is the Subcategory LWT Needed?[edit]

LWTs, by definition, lack a body of published sources authored by native speakers. As a result, it is not possible to use published sources to attest to Wiktionary entries for LWTs. Nevertheless, LWTs are important members of the family of human languages, with rich literary and intellectual traditions, and they deserve to be included in Wiktionary. In fact, these LWTs are typically endangered languages spoken by language communities that are most in need of the permanent, globally accessible, open source, cultural commons platform that only Wiktionary can provide. Therefore, it is proposed that the Wiktionary community define this limited category of languages (LWTs) and agree upon attestation criteria that are sensible and appropriate for such languages.

Can LWTs Meet Current Attestation Standards?[edit]

Current Wiktionary attestation standards call for verification either through widespread use (hard to verify for a language without publications) or "use in permanently recorded media, conveying meaning, in at least three independent instances spanning at least a year (different requirements apply for certain languages)." For spoken languages that are living [but not well documented on the Internet], only one use or mention is adequate, subject to the following requirements:

  • the community of editors for that language should maintain a list of materials deemed appropriate as the only sources for entries based on a single mention,
  • each entry should have its source(s) listed on the entry or citation page, and
  • a box explaining that a low number of citations were used should be included on the entry page (such as by using the LDL template).

Assuming that the first bulleted requirement above refers to a list of materials that are permanently available online, probably most LWTs cannot meet this requirement. For example, in the case of the Wauja language, spoken as a first language by 400 people in the Amazonian rainforest, there are hundreds of audio recordings, and several dozen carefully transcribed traditional stories, but none of them currently are available online. (Though they could be made available to Wiktionary admins upon request.)

Before these stories are posted online, the community must agree that they are correctly transcribed. That's because they were first recorded and transcribed several decades ago by an anthropologist (myself, in this case), at a time before any Wauja were able to read and write. Today, there is a cadre of young university-educated Wauja bilingual schoolteachers who are deeply committed to standardizing their orthography and documenting their language. However, this process takes time, because it is not decided by fiat. Instead, the Wauja, like many communities that speak LWTs, take time to reach decisions through building consensus. It's a chicken-and-egg situation. Without a standard orthography, it's hard to build a dictionary, but without a dictionary, it's hard to standardize the orthography.

Proposed Attestation Standards for LWTs[edit]

To allow responsible documentation to proceed within Wiktionary while members of LWT communities increasingly move toward standard orthography, publications by native speakers, and full compliance with Wiktionary LDL attestation standards, the following interim attestation standards for LWTs are proposed:

  • The community of editors for that language should maintain a list of materials deemed appropriate as the only currently existing sources for entries.
  • These sources may include audio or video recordings of native speakers, and transcripts of such recordings.
  • Sources also may include direct quotes from letters and written messages produced by literate native speakers, provided that the quoted material is archived online and annotated as described below.
  • All sources must include mention of the date of the recording or transcription, names of the native speakers recorded, the location of the recording, the name of the person making the recording, and location where the source is archived, if not online.
  • Once the transcript has been authorized by the language community as a faithful transcription, the names of community members involved in verifying the transcription also must be noted, and a copy must be posted to a permanent online location, such as Wikisource.
  • If Wiktionary admins find any reason to doubt the authenticity of the sources cited, they shall be allowed to examine the source material.

The overall goal of attestation standards for LWTs is to ensure responsible and reliable attestation for LWT entries, while making Wiktionary the best platform for documenting the world's many LWTs.

Honoring the "No Original Research" Principle[edit]

For a language with a written tradition, it is appropriate to refer to published sources written in that language. However, for a language that consists of an exclusively oral tradition, it is appropriate to refer to authoritative oral sources that have been recorded and transcribed. To ensure that the "no original research" principal is honored, transcriptions of traditional stories, historical narratives, public oratory, and sacred incantations performed by elders before an audience can be given priority as sources, since these linguistic sources are particularly authoritative and reliable for LWTs.

Proposed Standard for Transitioning from LWTs to LDLs[edit]

When a language has a sufficient body of publications (authored by native speakers) so that every word in the language can be referenced to a published work authored by native speakers, that language is no longer an LWT.

In practical terms, there is no hard and fast cut-off point, but perhaps we can say that once an LWT community has achieved a minimum threshold of 3,000 entries in Wiktionary, the community will have become aware of the importance of lexicography and its methods, and it will have benefited greatly from using Wiktionary to document, analyze, and teach literacy in their language. The language community will have had an opportunity to standardize their orthography, properly review transcriptions of older recordings of traditional oral literature, have native speakers produce new publications based on new recordings, and permanently archive online all such transcripts and publications. As a result, this language community will be considered capable of meeting LDL attestation standards going forward.

"There are two words currently in the Wauja Wiktionary that I know to be incorrect"[edit]

Please post an explanatory note on the Talk pages for those two words, so that anybody working on Wauja in future will be alerted to the problem. Thanks! Equinox 22:56, 3 February 2015 (UTC)

I would be glad to do so, but I don't think I have a recording of the first word, and given the issues regarding attestation, I want to be meticulous about that. The second word would surely be in a recording of a sacred story that I am currently having converted from tape to digital format. In the meantime, I could ask one of the Wauja schoolteachers, who is a native speaker and a leader of the team who is developing a standard Wauja orthography, to write the word in an email, but Kephir has objected to my referencing email communications from native speakers, so I will avoid that unless I hear otherwise from the Wiktionary community. When I return to Brazil in September 2015, I can record the Wauja pronouncing the word and writing it on the blackboard of their village school, and use that as a source, if you think it would be accepted. At this time, I am adding only words that are recorded from orations by native speakers, transcribed, with the transcriptions verified by native speakers. Emi-Ireland (talk) 23:33, 3 February 2015 (UTC)
You can basically say what you want on Talk pages. They are for discussion; they don't have the status of entries. Equinox 23:36, 3 February 2015 (UTC)
OK, then, I will be glad to post a note. I just posted notes on three pages. Emi-Ireland (talk) 00:47, 4 February 2015 (UTC)