Wiktionary:Beer parlour/2017/August

Definition from Wiktionary, the free dictionary
Jump to: navigation, search
discussion rooms: Tea roomEtym. scr.Info deskBeer parlourGrease pit ← July 2017 · August 2017 · September 2017 → · (current)

Contents

travel game[edit]

Would travel game (a board game or card game that was modified to be playable by passengers during a trip) be considered a SoP? W3ird N3rd (talk) 05:34, 1 August 2017 (UTC)

Looks good to me. I'd class I spy and the number plate game as my favourite travel games from when I was a kid in a car. --WF on Holiday (talk) 23:04, 1 August 2017 (UTC)
That's actually not even the definition I meant. I meant games like chess or Ludo that have been modified (e.g. with magnetic game pieces) to be played in a car or on a train. Amazon link to clarify. W3ird N3rd (talk) 00:10, 2 August 2017 (UTC)
I thought about it and the definition I was originally thinking of (and yours as well) is SoP after all because there are other "travel" things. But travel didn't have an adjective section yet. It does now.
  1. (in a compound) An object or activity that has been designed or reworked for use while travelling.
    (object) I've packed the chess travel game in my travel bag and I've got my travel cup in the cupholder, I'm ready to go!
    (activity) Let's play a travel game. I spy with my little eye..

W3ird N3rd (talk) 04:26, 2 August 2017 (UTC)

Aaaaand it's gone. @SemperBlotto, is there a reason you just chucked the whole thing instead of moving it into an additional definition for the noun? I had looked at running (like "running man") and noticed it had an adjective section, but on closer inspection that is used for other meanings of running.. I think. I'm not even sure. I can't entirely explain why running is an adjective in all meanings mentioned but travel isn't. W3ird N3rd (talk) 06:28, 2 August 2017 (UTC)
(@SemperBlotto. —suzukaze (tc) 06:45, 2 August 2017 (UTC))
Thanks, I'm still learning how these things work. I looked it up: https://en.wikipedia.org/wiki/Attributive_verb. So it appears travel acts as a deverbal adjective. So it seems either SemperBlotto is wrong or somebody needs to remove the adjective section from exciting or I may be losing my marbles. W3ird N3rd (talk) 06:57, 2 August 2017 (UTC)
I dunno, travel in travel game sounds like the noun travel to me: a game used during travel. It's weird to try to think of it as a verb. — Eru·tuon 07:07, 2 August 2017 (UTC)
Right, so it's https://en.wikipedia.org/wiki/Noun_adjunct. So "travel game" can't be added because I suspect it's SoP yet the information in travel and game don't really allow one to figure out what a "travel game" would be. And this information can't be added to travel either. Okay, my marbles are definitively gone. W3ird N3rd (talk) 07:28, 2 August 2017 (UTC)
Were I looking at this naively, it would be ambiguous to me whether this meant "a game suited for travel" or "a game related to travel" or "the travel industry" or "one of a genre of games somehow related to some definition of travel", or ..... I don't think that dictionaries should act as if they are well suited to hold users' hands as they try to figure what a phrase or sentence or larger unit of language unless there is true novelty or obscurity worse than what I have advanced as my own naive view of alternative meaning. DCDuring (talk) 19:41, 2 August 2017 (UTC)
Well, it's not a phrase, it's a compound noun. I think from what you're saying, it's (for a naive reader) not transparent. — Eru·tuon 19:57, 2 August 2017 (UTC)
Gee, lots of people would call it a noun phrase or NP. Do we have a policy about which school of labels we follow? DCDuring (talk) 21:35, 2 August 2017 (UTC)
Not that I'm aware of. The criterion of spacing bothers me because it means that if you happen to add spaces between the parts of a compound, then it suddenly changes to a phrase. So honeybee is a compound, while honey bee is a phrase. Utterly arbitrary. There has to be a more solid criterion than spelling. — Eru·tuon 22:02, 2 August 2017 (UTC)
If a multi-word expression is attestably spelled solid, we have decided that is sufficient evidence to say that phrase, usually a bare NP, is includable. That criterion is intended to shortcut our repetitive, amateurish arguments about including such terms. DCDuring (talk) 22:29, 2 August 2017 (UTC)
Huh. I was talking about criteria for whether something is a compound noun, not CFI. — Eru·tuon 22:36, 2 August 2017 (UTC)
I think that, in practice, we try to avoid academic discussions with only indirect application to Wiktionary. It seems to me a good practice. DCDuring (talk) 23:46, 2 August 2017 (UTC)
You're probably right. I'm quite annoyed by compounds being called phrases, but it isn't particularly useful to discuss. Back to the content of your post, you recognize potential ambiguity with travel game but still don't think it should be included. I find that baffling, given that English Wiktionary is used by lots of people who don't speak English well. I would imagine that at least some of them would misunderstand travel game in the ways you mention. — Eru·tuon 01:59, 3 August 2017 (UTC)
@Erutuon: To me those ambiguities are typical of those that arise in interpreting any NP/compound noun that one hasn't heard before. In normal speech, the context shows one definition to be the most relevant of all of the ones that are possible from the definitions of the component terms. I consider the situation to be illustrative of why we focus on transparency of meaning in the context in which a term is used, given the definitions of the component terms. DCDuring (talk) 07:48, 3 August 2017 (UTC)
It really, really, REALLY wouldn't be the first time I turn to Wiktionary (or any other dictionary) to look up a word that I have no proper context for. For example when something like this happens in a TV show:
So what do you hate most?
-Any travel game.
Why do you hate that so much?
-I JUST DO NOW DROP IT ALRIGHT?
And then the show continues. Maybe it's a running gag. Maybe it's a reference to something in a previous episode that I missed. Maybe it refers to some character trait. Maybe it refers to some event or tradition that I'm not aware of, like some scandal in the country where the show was made. Maybe it's just plain random.
Alternatively, some word will pop up in my head randomly but I can't remember the context I heard it in. Out of curiosity I try looking it up. What it comes down to is as simple as this: Wiktionary is useless to look up any ambiguous SoP so I'll be forced to go elsewhere. If that's your goal I'd say mission accomplished. W3ird N3rd (talk) 23:47, 3 August 2017 (UTC)
Why wouldn't it be arbitrary? Why should you be able to draw a clean line between a compound and a noun phrase? There's no clear line between a "canoe truck", a "turnip truck" and a "fire truck". Certainly, though, English words spelled without spaces are more likely to be organic unions with a unique meaning, whereas noun phrases are more likely to be spelled with spaces and have meanings obvious from the individual words.--Prosfilaes (talk) 23:24, 2 August 2017 (UTC)
I dunno, it seems axiomatic that syntactic categories (word, phrase) should be based on something other than spelling, such as syntactic behavior. If they coincide with spelling, great. Honey bee behaves no differently from honeybee, so it is in the same syntactic category. Maybe there are spaced-out compounds that could with more justification be called phrases. I agree, though, that there is something determining whether a compound can be written with spaces: if it would be too long as a single word, or its meaning is obvious from its constituent parts. At some point on the continuum of each characteristic, it's acceptable to write a word either way. But I don't think either characteristic has anything to do with syntactic category (word or phrase) either. — Eru·tuon 01:59, 3 August 2017 (UTC)
Given my background in computer science, it seems axiomatic that you lex before you parse, and that you have to figure out what a word is before we starting figuring out what stuff means. That's sometimes not possible in computer or human languages, and pauses in audio would be more reliable than spaces in text, but things should be broken into words ideally before we get into syntax.--Prosfilaes (talk) 03:20, 3 August 2017 (UTC)
@Prosfilaes: I don't know anything about computer science or quite what lex and parse mean, but what I mean by syntactic category is word, phrase, clause, or noun, verb, adjective, etc. So which things are words is connected to syntax. Anyway, from what programming I've done (mostly on Wiktionary), programming languages are far more tightly constrained and more straightforward to analyze (if not figure out what their actual purpose is) than human languages, so I don't know how much of the process is similar to analyzing the lexical or syntactic categories of human words. — Eru·tuon 21:22, 4 August 2017 (UTC)
The basic ideas of lexing and parsing used in computer languages were designed by Chomsky for use in human linguistics. The point is, we can't talk about nouns and adjectives before we figure out what words are. In both human and computer languages, you lex (split text into words and specific punctuation marks) and then you parse, and occasionally you're forced to go back and relex the text in light of the parsing. But in both cases, you do the vast majority of breaking stuff into words before you start trying to figure out the meaning. There's a reason why spaces and verbal pauses exist in languages; it's to make it easy to clearly split things into words.--Prosfilaes (talk) 23:04, 4 August 2017 (UTC)
Well, it seems my use of the name syntactic category for word and phrase got you on the tangent of lexing before parsing. I don't know, maybe syntactic category isn't the right term. I have no idea. And I don't see how lexing before parsing relates to whether compounds are words or phrases. — Eru·tuon 23:22, 4 August 2017 (UTC)
I would imagine (but I can't speak for Prosfilaes) that if "travel game" is a word in your dictionary, you can just look it up and you know what it means. If it's not in your dictionary, you will assume it's just two words and you look up travel and game. From that, some systems (like Google translate, Babel Fish, etc) could probably end up being fooled into assuming this is roadkill or a really annoying basketball game. W3ird N3rd (talk) 00:01, 5 August 2017 (UTC)
DCDuring, I hadn't even thought of those interpretations yet. Thinking about that, I realized game also means wild animals hunted for food. It would depend heavily on context whether a non-native speaker could actually make that mistake, but I think it would be funny as hell. In the text "We were very hungry because we didn't pack enough food. But at least while on this trip, we enjoyed some travel game." the "travel game" could actually be interpreted as roadkill. Bon appétit! I'm hoping Wiktionary:Beer_parlour/2017/August#Allow_more_SoP_compounds.2C_similar_to_Dutch_and_German. or another rule change based on that will fix this in the future, but I don't think I'm going to hold my breath. W3ird N3rd (talk) 02:12, 3 August 2017 (UTC)
The MWE is also a synonym of away game. DCDuring (talk) 01:08, 5 August 2017 (UTC)

order Arabic disambiguating entries orthographically, not by verbal forms[edit]

Currently Arabic disambiguating entries are ordered by verbal forms instead of orthographically, which is not the optimal lexicographical approach. Thus, for ease of reference, يُوجدُ should appear just once in the page for يوجد, specifying it could belong to either verbal form I or verbal form IV. --Backinstadiums (talk) 08:47, 1 August 2017 (UTC)

It should be just as easy as modify a line of code --Backinstadiums (talk) 12:39, 4 August 2017 (UTC)

@Backinstadiums: Huh? What line of code? — Eru·tuon 18:14, 4 August 2017 (UTC)
@Erutuon: I mean it cannot be that much of fuss, just a different grouping in a specific case. If anything should be clarified further, please let me know. --Backinstadiums (talk) 20:57, 4 August 2017 (UTC)
@Backinstadiums: To do this, the template {{ar-verb-form}} would have to no longer display the form number and many entries would have to be edited (there are 35,188 entries in Arabic verb forms, some of which will contain homophonous verbs with different Form numbers). The editing part would be a lot of work, and would probably have to be done by bot, as the entries were in large part created by bot. I'm agnostic on whether the change would be helpful or consistent with Wiktionary organizational principles, and no one else has responded: @Atitarev, Wikitiki89, Benwing2? — Eru·tuon 22:22, 4 August 2017 (UTC)

Just like any issue in life, no matter how much is already done, if it's not in accordance to the optimal lexicographical approach which enables ease of reference to improve the user's usability, action must be taken on it as soon as possible not to worsen resources even more --Backinstadiums (talk) 15:39, 10 August 2017 (UTC)

August LexiSession: circus[edit]

Let's go to the circus!

The monthly suggested collective task is to collect words about the circus. I've noticed that Wikisaurus:circus does not exist, and auguste is a kind of clown, so this a great opportunity to look around this topic together!

Let's stop clowning around and juggle some ideas together!

By the way, Lexisession is a collaborative experiment without any guide or direction. You're free to participate however you like and to suggest next month's topic. If you do something this month, please let us know here or on Meta, to let people know that English Wiktionarians are doing something on this topic. I hope there will be some people interested in making some contributions! Face-smile.svg Noé 13:43, 1 August 2017 (UTC)

Here's a good start - to be added to Category:en:Circus if appropriate[edit]

Circus and sideshow attractions[edit]

Maybe this is me being slightly grumpy because some people in another discussion I started don't seem to entirely grasp what I was suggesting, but aren't some of these SoP?
I personally don't have a problem with any of these and luckily I'm not a SoP nazi, but after a few RfDs this project could end up with more red links than it had when it started. W3ird N3rd (talk) 03:30, 4 August 2017 (UTC)
  • Some of these seem lame and/or SoP. On the other hand sources such as Carny Lingo show that there is a large vocabulary of great charm and linguistic interest. I doubt that we will get very far into that highly desirable content this month, but extracting a list of terms from that and similar sources would be useful for Wiktionary, IMO. I'm not at all sure that the terms fit well into the categories suggested, many better assigned to a category based on usage context, eg, Category:English circus slang or similar. Examples, barnstorm, blow a tip, blow one's pipes, build a tip, burn the lot, carry the banner, clean the Midway, cool out, bail the counter, bat away. Unfortunately, I don't know that we have a good system of such categories, instead duplicating encyclopedic-type "topical" categories. DCDuring (talk) 08:29, 4 August 2017 (UTC)
    I suppose much of this would fit in Category:English circus slang. I hope that {{lb|en|circus slang}} or {{lb|en|circus|slang}} would work. DCDuring (talk) 08:36, 4 August 2017 (UTC)
    My hopes are in vain. I hope someone can rectify the operation of {{lb}} so anyone interested can help us play along with this cross-project effort. Note that there is a considerable overlap between criminal slang and circus slang. DCDuring (talk) 08:48, 4 August 2017 (UTC)

Next steps for Wikidata access[edit]

Hello all,

Thanks to @Daniel Carrero there's now a page to centralize all the discussions and information related to accessing Wikidata data from English Wiktionary. I hope we can improve it soon with examples and documentation :)

We also suggest an enabling date for the arbitrary access: September 7th. If you have any question or concern, feel free to ask. Thanks to the people who worked on this! Lea Lacroix (WMDE) (talk) 13:52, 1 August 2017 (UTC)

Thank you. September 7th looks good to me. --Daniel Carrero (talk) 03:18, 2 August 2017 (UTC)

Best practices for Oxford -ise/-ize variants[edit]

I just made Birminghamize. What should be put at Birminghamise? —Justin (koavf)TCM 00:58, 2 August 2017 (UTC)

It is ridiculous that Wiktionary lacks a basic policy on how to handle these variant English spellings: afraid of offending others / nobody willing to take charge / deeming status quo as good enough / etc., ... so there we go - both color and colour can evolve in parallel. Wyang (talk) 06:05, 2 August 2017 (UTC)
The problem is that someone who dares to set a standard will likely get into an edit war. So nobody touches it with a pole. —CodeCat 19:49, 2 August 2017 (UTC)
This would be a perfect application of Wikidata. —Justin (koavf)TCM 23:56, 2 August 2017 (UTC)
I have never heard of Birminghamise, so unless you can find it being used there is no point in making an entry. But normally -ise verbs are labelled "British spelling" so they can appear in Category:British English forms. DonnanZ (talk) 23:45, 5 August 2017 (UTC)

Etymology giving me problems[edit]

Can someone review:

for the etymologies that I've added? All of these words are directly taken from Spanish but I've clearly not made them all correctly formatted. Also, I'm not sure if there's a different way of noting a language which is a creole based on [x] versus a language which simply adopts one word from [x]. (E.g. the difference between a Haitian Kreyol word derived from French versus using "facade" in contemporary English). Thanks. —Justin (koavf)TCM 02:24, 2 August 2017 (UTC)

The relation between a creole word and its etymon from the lexifier doesn’t fit very well with the inherited/borrowed dichotomy. We should consider adding templates for other special kinds of derivation like this and substrate “borrowings”, semi-learned borrowings, etc. — Ungoliant (falai) 02:45, 2 August 2017 (UTC)
It's good to see someone else basically ratify that. For creoles/pidgins, it's really a different matter than to go from stages of a language (Old English → Middle English → Modern Englishes) or inheritance in a family (Proto-Germanic → English). —Justin (koavf)TCM 04:43, 2 August 2017 (UTC)

French Wiktionary monthly news - Actualités[edit]

Logo Wiktionnaire-Actualités.svg

Hello!

I am happy to inform you that the 28th issue of Wiktionary Actualités just came out in English!

As usual, Actualités is in English but talk about French Wiktionary and lexicography in general.

In this edition main articles are: a presentation of the Lingua Libre project to record words, a summary of a strange dictionary and a thought about lemmas and grammatical categories. And more: shorts, statistics (including new ones like the number of pages that include a link to a thesaurus) and an explanation about the Linter.

As usual, it is translated in English by non-native speakers, so it is not perfect, but it can be improved by readers (wiki-spirit as usual). Please note that we do not received any money for this publication and we are not supported by any user group or chapter. It is only written by the community. Feel free to leave us comments! Face-smile.svg Noé 09:09, 2 August 2017 (UTC)

Allow more SoP compounds, similar to Dutch and German.[edit]

So there was a discussion last month about deleting SoP compounds in German and Dutch. Now triggered by "travel game", perhaps we could explore pros and cons for the opposite. That is, allowing English SoP compounds in ways similar to the way they would be allowed in German and Dutch.
So exactly what does that mean? Put simply, if some SoP would pass an RfV and is not using any common/universal word (like "brown" or "fan") it would be allowed. This means you still can't create brown leaf or large box, but you could create burger joint and sheep farmer. Also computer chip and lab rat, those already exist but I'm not sure how they could be justified by the current rules. Optionally you could exclude any SoP with a space that is unambiguous. (like sheep farmer)

  • Pro: while it may be possible to figure out the meaning of a SoP by looking up the parts, it's not always easy. The parts may have more than one possible meaning so you need to figure out the correct meaning for all the parts.
  • Pro: in the case of "travel game", travel and game don't really make it clear what a travel game is. Travel game is probably SoP and adding the attributive noun use to travel just results in an instant revert. So basically it's impossible to describe a "travel game" on wiktionary.. in English. I could, however, describe the Dutch word reisspel.
  • Pro: fietshater (bike hater) would pass RfV and should be allowed by the current rules. It wouldn't be allowed with these new rules because hater is universal and can apply to thousands of nouns and verbs.
  • (added august 4) Pro: translations. How would you translate juice extractor (SoP) to Dutch? Juice is sap, but how to translate extractor? The correct answer is sapcentrifuge, but would you have ever guessed it? Wiktionary is useless in this case, and this example wasn't even that ambiguous.
  • (added august 5) Pro: We can delete ex-pilot. (Wiktionary:Requests_for_deletion#ex-pilot).
  • Con: there will be more entries on wiktionary.

I'm not taking a stance on this myself yet, I just think it's worth thinking about. I may not be seeing the whole picture. I haven't made up my mind yet and I think it's a good idea. I wonder what you think. W3ird N3rd (talk) 09:18, 2 August 2017 (UTC)

Sorry W3ird N3rd, strong oppose on that. As I mentioned in the discussion you're referring to, we need to have some sort of quality control around here – having more entries on Wiktionary doesn't necessarily boost our credibility if said entries are redundant. --Robbie SWE (talk) 09:57, 2 August 2017 (UTC)
There would still be a form of quality control. RfV requirements still apply and common words are not allowed. Optionally you could add that if the SoP is fully transparant (like sheep farmer) it is still not allowed, while allowing burger joint and travel game. You talk about quality control, but you actually don't have that control right now as I could create fietshater (bike hater) and you probably couldn't do a thing about it. W3ird N3rd (talk) 10:47, 2 August 2017 (UTC)
I support allowing all attestable English compound words, and no longer making it spelling-dependent. Consequently, WT:COALMINE would be superfluous, as coal mine would no longer depend on the attestability of coalmine for inclusion. —CodeCat 10:53, 2 August 2017 (UTC)


Perhaps we could deal with SOP compounds differently than with other lemmas, effectively soft redirecting them to their constituents while keeping them for consistency, maybe like
(literally) A mine from which coal is dug
or
(literally) An exhibition (tentoonstelling) of Khoekhoe (Hottentot) tents (tent)
While being subject to usual attestation rules and linked from translation tables as hottentottententententoonstelling f where applicable (of course we won't have a page for Khoekhoe tent exhibition to link translations from).
It could get messy with languages with transliteration though. Crom daba (talk) 13:04, 2 August 2017 (UTC)
Why don't we include any rubbish as terms and forget about CFI? Who cares about this dictionary and its reputation, anyway? --Anatoli T. (обсудить/вклад) 13:33, 2 August 2017 (UTC)
We already host all sorts of rubbish, my approach would make it more manageable and invisible in most use cases. Crom daba (talk) 14:00, 2 August 2017 (UTC)
I don't see the connection between being more inclusive of compounds and quality. The more useful lexicographical content we can provide, the better. RFV provides good quality control, alongside making sure our entries are clean and properly formatted. —CodeCat 14:33, 2 August 2017 (UTC)
@Atitarev and @Crom daba, please be aware that hottentottententententoonstelling is a terrible example that isn't even related to this discussion because it is a joke word and tongue-twister. This word is not, will not and has never been used to refer to any kind of actual exposition. I used it in the other discussion to demonstrate how hard it can be to break down Dutch compound words, but hottentottententententoonstelling isn't SoP. W3ird N3rd (talk) 14:42, 2 August 2017 (UTC)
Yes, I've read the entry, I'm merely using it to show how non-idiomatic words could be formatted. Crom daba (talk) 14:49, 2 August 2017 (UTC)
You may know that, but I think Atitarev possibly doesn't and now thinks Wiktionary will be filled with thousands of rubbish words like that. W3ird N3rd (talk) 14:54, 2 August 2017 (UTC)
oppose. There's no need for most multi-word English terms in English, and nobody will look them up.--Prosfilaes (talk) 23:02, 2 August 2017 (UTC)
Wanna bet? How can you be so sure? Nobody has any idea what passive users search for. DonnanZ (talk) 14:16, 3 August 2017 (UTC)
Also, people do look them up. Pageviews for lab rat are similar to minibar. In addition, how can you say nobody will look something up when the thing in question doesn't (or isn't allowed to) exist? And another thing: translations. We can't have a juice extractor because it's SoP. So now translate juice extractor into Dutch. Good luck with that. You will correctly find sap for juice but how are you supposed to translate extractor? Here's the answer: a juice extractor in Dutch is a sapcentrifuge. Which you could have found if you had looked up juicer (it just so happens a single-word synonym exists here, this is not always the case), but you won't find that if you're looking for a juice extractor, which is the term I'm most familiar with. The very fact is that I had to look up this example on Wikipedia: w:Juice extractor which helped me find juicer. And it's just sheer luck that a juice extractor happens to be encyclopedia-worthy. W3ird N3rd (talk) 01:10, 4 August 2017 (UTC)
This is true. Professional translators seldom need ordinary dictionaries (such as collegiate dictionaries), we want dictionaries that are mainly multi-word, such as the French-English Dictionary of Petroleum Technology. Multi-word dictionaries are the gold-standard and they command high prices. My Dictionary of Petroleum Technology cost me $115 in 1980. In my translating company, we virtually never used any of the ordinary dictionaries (such as Websters, OED, Random House, American Heritage), we only purchased and used the very expensive multi-word dictionaries. Even now that I'm retired, I never use the simple word dictionaries. Almost all the terms I ever have to look up are multi-word terms, and Wiktionary does not handle those. Translators have to equip themselves with a pile of very expensive dictionaries, and all of them multi-word. —Stephen (Talk) 03:46, 4 August 2017 (UTC)
  • I find the juice extractor argument convincing. So far our CFI mainly cover "does anyone want to look it up?" and less "might anyone want to translate it?" Been on a treasure hunt within Wiktionary for translations myself in the past. Korn [kʰũːɘ̃n] (talk) 14:33, 4 August 2017 (UTC)

General question: what does SoP compound even mean?

Compounds have a continuum of transparency of meaning, but they generally do not have a single possible meaning. If they are formed from two nouns (as for instance travel game), there are several possibilities. I'm somewhat rusty on them, but I gather that travel game is a tatpurusha, where travel is added to game to signify a particular type of game, and travel has the meaning of a particular prepositional phrase (or in Sanskrit a grammatical case. Putting aside the other forms of compound, the relationship of travel to game is unknown when you're newly encountering the word. The actual relationship, in terms of grammatical cases, is locative: "a game played during travel". But there are other possible interpretations, such as "a game consisting of travel" (like, I dunno, a long-range treasure hunt?). Meh, it's not a very good example, or I'm not very good at brainstorming about possible meanings.

There are compounds that might be clearer: for instance, bike-hater. A noun combined with hater is often the object of hater (the thing that is hated). But even there, theoretically it could mean "a hater who is on a bike".

So I don't think compounds can be SoP in the same way that regular phrases or sentences are, like "some people hate bikers". There isn't one predictable semantic relationship between the elements of a compound the way there is with phrases. In the previous case, some people is the subject of hate, and bikers is the direct object of hate: that's the only way it can go down.

So what is a SoP compound? I have no idea. I think it should be well-defined for it to serve as a CFI. — Eru·tuon 21:15, 4 August 2017 (UTC)

  • "We were very hungry because we didn't pack enough food. But at least while on this trip, we enjoyed some travel game." roadkill!
  • travel game: A game to play on a journey (like I spy or punch buggy)
  • a physical game, like chess, designed for use on a journey (magnetic pieces etc)
  • geocaching / geohashing
  • travel business (Doug Parker is a big name in the travel game)
  • Something that resembles a game with rules, despite not being designed: in the travel game, being held up for security checks is becoming less of a drag and more of a routine nowadays
  • The ability to seduce someone, usually by strategy:
Watch him. He's got a great travel game.
-He's got a WHAT?
Travel game. Basically he just takes any chick he picks up to Paris. Guaranteed success.
  • The travel game that is used by airlines where they offer cheap tickets but charge extra for additional luggage, meals, toilet visits and use of the oxygen mask is really disgusting.
  • (basketball) I'm getting tired of these travel games. They just travel for most of the game time. It's not funny anymore.
  • (childbirth) There's nothing fun about the travel game, but all that is forgotten when the mother is holding her newborn baby.
Will this suffice? ;-) W3ird N3rd (talk) 23:45, 4 August 2017 (UTC)
Then again, all of these sound valid to me, which could be an argument that compounds really are a sum of parts, or rather a product. Crom daba (talk) 16:03, 5 August 2017 (UTC)
That's the trick. They may sound valid, but most of them are completely invalid.
Unlikely to pass RfV:
  • roadkill
  • a game of basketball with lots of travelling
  • the ability to seduce someone
  • game that involves travelling
  • a questionable or unethical practice
  • childbirth
Using a universal part:
  • travel business (possibly won't pass RfV either)
  • something that resembles a game with rules, despite not being designed (possibly won't pass RfV either)
Valid:
  • a game to play on a journey
  • a physical game, like chess, designed for use on a journey
So by the proposed guideline, only the last two definitions would be included. But that's not final, you could argue about exactly what should and should not be included. For example, you could argue that if a valid entry for travel game already exists, it's acceptable to add travel business (if that would pass RfV) while at the same time not allowing an entry to be created solely for travel business. W3ird N3rd (talk) 16:56, 5 August 2017 (UTC)

What about if I would word it like this:

  • SoP compounds with an irregular translation in another language are allowed. (or at least their translation section would be) This will allow juice extractor because of sapcentrifuge.
  • SoP compounds with a space or hyphen that have no irregular translations in another language, have only one meaning and this meaning can be reasonably obtained by looking up the first definition of the seperate words are not allowed. This would possibly cover sheep farmer assuming there are no irregular translations in another language.
  • SoP compounds using parts that can be universally applied (the parts are not related in any way) are not allowed, unless they are idiomatic. This excludes "brown leaf", "large box", "luxury boat" and ex-pilot but allows more cowbell. (w:More Cowbell)
  • (added august 6) Compounds without a space or hyphen that have only one non-universal part (like sockless) are only allowed if their usage is vast - far beyond the current three-independent-durably-backed-up-sources rule. Common words like hopeless or pointless should be kept, but exactly how much sense does an entry for boatless make?
  • Any entry still needs to be able to pass an RfV.

Maybe this is more clear? W3ird N3rd (talk) 16:56, 5 August 2017 (UTC)

I believe we should have some rules similar to WT:COALMINE in order to avoid unproductive RFD discussions and give contributors a chance to to predict if their new SOP entries will pass RFD. I guess many editors don't feel liked spending time on creating entries that later on might get deleted. This will slow down the rate of coumpound term entry growth which IMHO are necessary for a usable multiligual dictionary. I dont't believe that a vote to allow all attested terms currently has any chance to pass. In the past some such rules have been proposed:
  • Including all terms with lesser common single-word synonyms.
  • The lemmings priciple which would grant inclusion if a term is covered by a list of trusted dictionaries (which still have to be specified).
  • We already have some translations-only entries (Category:English non-idiomatic translation targets), however there is yet no rule to prevent their deletion. We probably want to keep them if they have idiomatic translations for a number of languages.

Matthias Buchmeier (talk) 23:32, 5 August 2017 (UTC)

From what I understand, I would have to create a vote at Wiktionary:Votes. I would probably need some help to have any chance of getting that right. You say such a vote would have no chance to pass, but if there's anything I learned from politics it's this:
  • If you want something to pass, bring it up for voting when everybody who is against it is on vacation.
  • If you want something to pass, just attach it to another bill that is being voted on that will pass. (not possible on Wiktionary)
  • If neither of those are feasible but at least a third of eligable voters are in favor, just bring it up for voting again and again and again and again. Sooner or later it'll pass because either those who are against it missed the vote, those who are against it don't vote because they figure it'll never pass anyway (that's one of the reasons Trump was able to win) or some current event or hype changes what people think and the vote passes.
There are more strategies, but these are the big ones. Once it has passed, it'll be virtually impossible to take it off the books again. W3ird N3rd (talk) 03:38, 6 August 2017 (UTC)

I just came across the following: towelless, fishless, bikeless, streetless, boxless, fireless, woodless, barless, magazineless, goldless, bronzeless, schoolless, cardless, mapless, pantless, sockless, appleless, watchless, morningless, kingless, bossless, condomless, monitorless... (this goes on endlessly) Next time you see somebody saying Dutch or German needs to be treated differently on Wiktionary, slap them in the face with this list. W3ird N3rd (talk) 07:00, 6 August 2017 (UTC)

The problem is that a lot of contributors have the justified fear that allowing all attested multiword compounds would flood the database with low quality entries. I believe that the best way to overcome this problem would be some set of well-designed inclusion rules. Matthias Buchmeier (talk) 17:48, 6 August 2017 (UTC)
I think the load of -less variants is low quality. A bunch of these wouldn't even pass RfV. So be it German, multiword compounds or just plain English -less variants of words: we need better inclusion rules. The current inclusion rules allow rubbish like boatless while prohibiting travel game. They will also allow fietshater and perhaps even bike-hater while nothing prevents lab rat from being deleted. I think the five-bullet point list I made above is at the very least a good start. But if there's no chance of any change ever becoming policy, I might as well give up. In that case a completely new wiktionary needs to be started, which would be a downright shame. W3ird N3rd (talk) 06:30, 7 August 2017 (UTC)
boatless would easily pass RfV, and is a translation of an Egyptian term (iww, with a hook above the i) that would pass your translation terms argument. I recall an old dictionary has a page of un- compounds without definitions; it hardly hurts us to give stuff like that boilerplate entries.--Prosfilaes (talk) 09:23, 7 August 2017 (UTC)
Looks like boatless is a bit of an odd duck. It's not used a lot on websites (which is what I initially checked for), but quite a few books use the word. As for the translation, I wasn't aware of that and was only referring to the RfV. I don't terribly mind having such entries around, but it just feels like insanity to have those while not allowing entries that are not nearly as obvious "because SoP". W3ird N3rd (talk) 13:30, 7 August 2017 (UTC)

Kajkavian – language, dialect or something inbetween?[edit]

Recent changes to Kajkavian prove that there is a dispute in the linguistic community as to the classification of this dialect/language. I hate to see this entry be turned into a political battlefield, so let's decide once and for all – is it a dialect or language, and should this page be protected to avoid any future disputes? --Robbie SWE (talk) 10:12, 2 August 2017 (UTC)

I'd stick with the conservative option and call it a dialect, at least until it gets an army and a navy. Crom daba (talk) 10:52, 2 August 2017 (UTC)
It's still not settled. Its status has been disputed for a long time, but it has been classified as a dialect of Serbo-Croatian since about 1950 or so. Many of the Yugoslavs get very worked up about it, one way or another. I agree with Crom daba, I think we should keep it as a dialect until there is something closer to a consensus that it's a separate language. I thought about getting an opinion from User:Ivan Štambuk, but Ivan seems to be absent. I think it's been over a year since Ivan's last serious edit. —Stephen (Talk) 11:17, 2 August 2017 (UTC)
If you're interested in opinion of other Yugos, @Vorziblix, Biblbroks might respond. Crom daba (talk) 11:31, 2 August 2017 (UTC)
Opinions from other Yugos might be helpful, but only if they are linguists and are philosophically moderate. The last time we asked for Yugoslav opinions, everybody from the Serbian, Croatian, and Bosnian Wikipedias came here and we almost had a shooting war. With User:Ivan Štambuk, we knew his education and philosophy, so he was very helpful in things such as this. Ethnologue does not recognize it yet. SIL mentions it only as a literary language. I don't know what to make of that. —Stephen (Talk) 13:35, 2 August 2017 (UTC)
I don’t have a strong opinion either way, as I’m not knowledgeable enough about Kajkavian to say whether it would be more convenient to keep it merged or split. For reference, however, here’s an old discussion of this same subject with Ivan Štambuk. — Vorziblix (talk · contribs) 21:33, 2 August 2017 (UTC)
"at least until it gets an army and a navy" Wait, all I need to have my own language is an army and a navy? Why has no one told me this before! **starts gathering troops**
On a more serious note, you may want to look at and compare the West Frisian language, a dialect that relatively recently became recognized as a language. W3ird N3rd (talk) 15:18, 2 August 2017 (UTC)
We are completely indifferent to official "recognition". We consider things separate languages (and give them separate codes) based on linguistic considerations, though admittedly our results are not always consistent: we treat all Serbo-Croatian and Chinese varieties as a single language (each), but we treat Bokmaal and Nynorsk as separate languages. —Aɴɢʀ (talk) 16:23, 2 August 2017 (UTC)
In considering these types of questions, I would like us to put more emphasis on lexicographic convenience and less on "linguistic considerations"- that is, will splitting or merging these languages make it easier to maintain the dictionary? Will it make it easier for users to find information that they want? DTLHS (talk) 16:58, 2 August 2017 (UTC)
I think the Frisian case is still interesting to look at. People who only speak Dutch can barely if at all understand Frisian, but for a long time they were (for example) not allowed to use evidence in Frisian in court. It was not until 1980 that Frisian got the status of a required subject in primary schools. I think it also took a while before they got their own Wikipedia. And they are very, very, very proud of their language and it sounds like that is a factor with Kajkavian as well. If you are curious how different it really is, try https://www.youtube.com/watch?v=m1WTTX_ITIE. The narrator is speaking Frisian, the man who appears after 14 seconds into the video is speaking regular Dutch. For written text, try https://nl.wikipedia.org/ versus https://fy.wikipedia.org/. For a long time this wasn't recognized as a seperate language. W3ird N3rd (talk) 17:08, 2 August 2017 (UTC)
We're already led by convenience, Serbo-Croatian wouldn't have won out were it not massively inconvenient to quadruple our work here. Crom daba (talk) 18:28, 2 August 2017 (UTC)
Only in some cases. We have both Scots and English and two different varieties of Norwegian (as well as just "Norwegian"). DTLHS (talk) 18:36, 2 August 2017 (UTC)

New competition[edit]

Hello. If anyone wants to play Emoji-Pictionary, I set up a game at User:WF on Holiday/Comp. As with most games I started in Wiktionary, there are probably loads of mistakes, loopholes, spellos, bad grammars and confusing instructions. But once we've got used to them, we can play happily. On a side note, I'm sure some of our Previous games could be modified by some tech-savvy folks in such a way as to allow normal people to play them. --WF on Holiday (talk) 23:18, 2 August 2017 (UTC)

Arbitrary behavior of certain administrators[edit]

There is an administrator being completely arbitrary on certain entires, as you might see here for example: [1] where he eliminates a translation of a word on the basis that he does not feel that it is a good translation, and yet the example he leaves in place about an LGBT film festival contradicts his assertion. This is, sadly, a consistent pattern and not merely one example; originally I had added "queer" as a translation while citing a specific example of it being translated that way in the name of an Israeli organization, and he eliminated it on the basis that he personally felt it did not fit and was offensive. His behavior is despotic; instead of requesting verifications he just acts as an absolute authority and is nothing but combative when I ask for simple things like justifications for his actions.

It's bad for the project because there are processes. He does not seem to be holding himself to the standards that other wiktionary users are held to, but acting as if it's his personal dictionary. He disagrees with a translation so instead of putting a RFV template on it, he just deletes it and locks the page.

Furthermore he's projecting a considerable amount in his responses, acting as if I am trying to impose my personal views when I am citing specific examples and he is citing no examples other than "I speak Hebrew," which i don't think is the way things normally go on Wiktionary? Like I speak Esperanto but I still have to justify my work on Esperanto terms, as 99% of Wikimedia users have to do.

I don't think Wiktionary was created so that certain people could impose their opinions without justifying them, and people who justify their edits by giving specific examples are treated as if they are troublemakers. I think it was created for the opposite reason and that fairness and transparency are still supposed to be important. Ligata (talk) 14:04, 3 August 2017 (UTC)

I recommend people engaging in a discussion over this take a look at the respective admin's talk page and think of the fact that Wiki-projects are known to prevent new users from joining by stubborn aggressive culture of long-term users. I also strongly advocate that the discussion here not get derailed by a smokescreen (talking about Hebrew definitions) but instead stay on topic (proper conduct and bureaucracy). Korn [kʰũːɘ̃n] (talk) 14:48, 4 August 2017 (UTC)
I had a similar issue with this travel edit. It may have been the wrong place, but I think those were some good examples. Instead of correcting it or requesting a fix/cleanup he just chucked it. In most cases that would be the end of it, but I mentioned him on this page asking to explain this. I don't expect most new users to be that assertive or to even notice their edit has been undone. He still hasn't shown up here and I thought he ignored it, but only just now do I see he did do something in response to that (or so the timelines would suggest): https://en.wiktionary.org/w/index.php?title=travel&diff=47159186&oldid=47158420 which is nice, but I think that would still benefit from the examples I had written. But I can't risk putting something back in that was removed by an administrator. I can understand his time is limited and he can't properly fix every mistake he finds. I get that. But isn't that what Wiktionary:Requests_for_cleanup would be for? W3ird N3rd (talk) 20:30, 4 August 2017 (UTC)
We don't even have time to resolve everything at WT:RFC as it is now (see all the archived unresolved requests). --WikiTiki89 20:38, 4 August 2017 (UTC)
Is that a valid argument for deleting/reverting edits that aren't perfect? The idea behind a wiki is that a valueable contribution doesn't have to be complete or perfect. But by reverting edits that are not perfect, you can quickly discourage any new users from hanging around. In the long term, you will indeed not have enough manpower to verify and clean edits. The cleanup request page isn't very well advertised, that may also contribute to this. W3ird N3rd (talk) 21:16, 4 August 2017 (UTC)
Some badly formatted entries are found many years after they are created. Thus, dealing with them as soon as they are noted is essential. —CodeCat 21:18, 4 August 2017 (UTC)
Some - so you just delete everything before anyone could even have a chance to fix it. If mice keep getting into your house, the solution is not to burn down your house. W3ird N3rd (talk) 01:40, 5 August 2017 (UTC)
If your house could do with some new furniture, but you can't afford any, the solution is not to fill it with mice... Equinox 10:25, 5 August 2017 (UTC)
But if many of your friends are carpenters, you might fill it with not-quite-perfect furniture and put a post-it on it to remind you something needs to be done about it, instead of sitting around in an empty house. And possibly chuck the nonperfect furniture anyway if it's still not fixed after a month. The very least IMHO is that the user who made the edit is (could possibly be partially automated) informed about what was wrong and what needs to be changed before putting that content back. Right now it's just "POOF, it's gone, and if you put it back you risk a ban". Like my examples for travel, I think they would now fit perfectly below the usage note, but I feel like it's a risk to put them back in because SemperBlotto is an administrator. I would have already done it had SemperBlotto been a regular user.
Obviously edits that you would consider mice (vandalism) are not what I'm talking about here. W3ird N3rd (talk) 13:05, 5 August 2017 (UTC)
I do sort of take your point. It's bad that we automatically revert every mess when some (10%? who knows?) messes contain something good. But the entries are public-facing. It suggests that maybe we need some kind of "limbo" or intermediate edit-o-space that allows stuff to exist before it's shown to every random visitor. I can't be the first wikidork to think of this. For now, although it's annoying, I think our approach is as good as it gets. Equinox 00:29, 7 August 2017 (UTC)
Wikipedia uses "Wikipedia:Pending changes" on controversial pages so that edits don't go live until they have been reviewed. We could perhaps apply it to all pages here, and patrol the log of pending changes needing review, instead of our current system of "patrolling" Special:RecentChanges, which some changes slip through. But the actual result might be an extremely large backlog of pending changes awaiting review. This was discussed at least once before; I don't recall many people having strong opinions, but enough opposed it that it wasn't implemented. - -sche (discuss) 06:01, 15 August 2017 (UTC)
  • Since the actions complained of are not administrative in nature, perhaps it would be better to title this section "Arbitrary behavior of certain editors". Cheers! bd2412 T 14:58, 5 August 2017 (UTC)
@BD2412 But there's a difference. If an administrator removes something, you can't put it back. Even if you slightly alter it and believe that is sufficient to fix it, you can't put it back because the user who removed it happens to be an administrator. If you do it anyway you risk a ban. This wouldn't trouble me nearly as much if a regular user had deleted it, I would just fix it and put it back without having to worry about it. W3ird N3rd (talk) 03:03, 6 August 2017 (UTC)
I don't think that's true at all. It would be a substantial misuse of administrative authority to use that authority in connection with one's own editing dispute. bd2412 T 03:06, 6 August 2017 (UTC)
This.__Gamren (talk) 08:38, 6 August 2017 (UTC)
@BD2412 User_talk:Stephen_G._Brown#Abuse_of_blocking_and_page-deleting_powers_by_SemperBlotto.3B_de-cratting_and_de-sysopping_required feels too much like that for me to risk it. While the user in question was wrong (and making silly demands), it makes it clear to me that putting back any content deleted by an administrator is risky. W3ird N3rd (talk) 09:18, 6 August 2017 (UTC)

Extinct species[edit]

Are there any categories for extinct species, or do they go in other categories? I just unearthed Kangaroo Island emu. DonnanZ (talk) 16:02, 4 August 2017 (UTC)

A taxonomic approach would just put them in existing categories (where they exist) alongside extant species. A language-centered approach would favour putting them somewhere else, and not mixing them with extant species. —CodeCat 16:12, 4 August 2017 (UTC)
There is a convention in taxonomic names to place the symbol "" before the name unless such a symbol is not necessary due to context. (See practice on Wikispecies.) We have begun implementing the practice of putting the "" on the inflection line for entries of extinct taxa and elsewhere if the word extint is not already in a label.
English vernacular names do not use the symbol, so it is arguable that a categorical distinction might be useful for some purposes. For many purposes, however, the presence or absence of the word extinct together with the capabilities of search would be sufficient. DCDuring (talk) 22:12, 4 August 2017 (UTC)
There is no value lexical value in saying if a particular species is extinct or not, anymore than if a particular institution is defunct, or a person is deceased. All that matters is if the term still has some kind of usage or currency. —Justin (koavf)TCM 00:07, 5 August 2017 (UTC)
By what definition of lexical? Does lexical exclude definitions, ie, semantics? We have the word extinct in so many definitions. DCDuring (talk) 01:04, 5 August 2017 (UTC)

Can anyone get through to User:Jeff Weskamp?[edit]

They are adding Cherokee entries with manual transliterations, even though automatic transliterations work perfect for Cherokee. This isn't really a big issue, but it's silly and so I left a message on their talk page. They don't seem to have noticed it at all, though, even after I sent another message. Is anyone able to get through to them? A user that ignores their talk is bad, even if they aren't currently causing trouble. —CodeCat 17:10, 4 August 2017 (UTC)

Jeff also edits other Native American languages, including Navajo. I have not checked all of his edits, but quite a few of them.Those I've checked always seem good, even if he adds transliterations unnecessarily. I have attempted to talk with him a time or two, but I don't believe he has ever replied to anyone. I've known other editors who try to avoid interpersonal communication, so it does not seem all that odd. Jeff just takes it to an extreme level. —Stephen (Talk) 22:11, 5 August 2017 (UTC)
Jeff is now adding improper categories to entries, so I hope they will start listening. —CodeCat 19:10, 19 August 2017 (UTC)

Languages distinguishing dotted and undotted i[edit]

Recently I added some code to distinguish dotted and undotted i (Iı, İi) in Turkish and Azeri sortkeys . Till now, they were merged by being converted to lowercase (→ iı, ii) and then uppercase (→ II, II) using English rules (mw.ustring.upper). Thus, words beginning with both i and ı were sorted under I when they were categorized using templates.

Currently the fix only applies to Turkish and Azeri. Are there any other languages currently on Wiktionary that distinguish dotted and undotted i? — Eru·tuon 20:42, 4 August 2017 (UTC)

The following languages have entries with both dotted and undotted i's: Azeri, Crimean Tatar, Egyptian, English, Gagauz, German, Italian, Karakalpak, Tatar, Translingual, Turkish, Zazaki. DTLHS (talk) 21:10, 4 August 2017 (UTC)
Egyptian, German, Italian? And even English? Really? —CodeCat 21:14, 4 August 2017 (UTC)
Italian: dımlı, German: homurdanmayı, Egyptian: ḥtrı͗, English: Category:English terms spelled with ı. DTLHS (talk) 21:26, 4 August 2017 (UTC)
The Italian and German look like errors. The Egyptian is used with a combining diacritic, and it should just use a regular i. As for the English, most of them are probably better attested with a regular i and therefore should probably moved to those spellings. Regardless, English speakers would not treat i and ı as different letters, so sorting them together is correct. —CodeCat 21:31, 4 August 2017 (UTC)
@DTLHS I don't think the German entry you fixed is correct, still. In the lemma entry, the inflection table says it's the definite accusative form. —CodeCat 21:54, 4 August 2017 (UTC)
I guess, then, what I'm really asking is for which languages would we actually want the sortkeys to distinguish the two? — Eru·tuon 21:30, 4 August 2017 (UTC)

I'm going to guess that all the Turkic (and Turkic-influenced) languages in the list should have dotted and dotless i distinguished: in addition to Turkish and Azeri, Crimean Tatar, Gagauz, Karakalpak, Tatar, Zazaki. — Eru·tuon 21:51, 4 August 2017 (UTC)

(edit conflict) Judging by w:Dotted and dotless i, there's the potential in Turkic languages that use the Latin script, even as an alternative, but nowhere else except for ad-hoc use in romanization. Our entry for ı lists only Azeri, Crimean Tatar, Gagauz, and Turkish. Of course, texts in other languages can have names attested in their original spelling, but such cases are so rare that I doubt there are many (if any, at all) with dotting determining their order in any confusing way. Chuck Entz (talk) 21:57, 4 August 2017 (UTC)
I've put the languages that I listed above in a table in Module:languages. I should verify that each one actually has a regular orthographic system that uses the letters, though. — Eru·tuon 00:05, 5 August 2017 (UTC)
Okay, I looked at Wikipedia articles and Wiktionary categories, and Crimean Tatar, Gagauz, Karakalpak, Tatar, and Zazaki all seem to either regularly use dotted and dotless i, or have entries that use them. — Eru·tuon 00:31, 5 August 2017 (UTC)

Category name: "words pseudosuffixed with" or "words ending in"[edit]

Which naming convention should be used for suffixlike endings: "words pseudosuffixed with" or "words ending in"? Examples for both: Category:Esperanto words pseudosuffixed with -acio; Category:Esperanto words pseudosuffixed with -enco; Category:Hungarian words ending in -ikus. --Panda10 (talk) 23:58, 4 August 2017 (UTC)

I prefer "ending with", because I haven't heard "pseudosuffix" before, but I wonder how we could prevent the creation of ridiculous categories for every sequence of letters at the end of the word: like for naming, ending with -g, ending with -ng, ending with -ing (though that's a suffix), ending with ming, ending with -aming. That is, what counts as a "pseudosuffix" or ending such that it gets to have a category? — Eru·tuon 00:03, 5 August 2017 (UTC)
I think these things (pseudosuffixes) are called formatives. Crom daba (talk) 00:26, 5 August 2017 (UTC)
Also desinence. --Vriullop (talk) 07:41, 5 August 2017 (UTC)
See also: previous discussion in July at Etymology Scriptorium.
"Desinence" means typically "inflectional" rather than derivational. With some ovelap with "formative", there's also "formant", used to refers to endings that are not known to be certainly segmentable at all (so e.g. ölyv would have a "formant" -v). "Ending in" is probably good enough a starting point, provided that we craft descriptions for these that clarifies that they are not pseudo-rhyme categories (e.g. we would not want sing in a category "English words ending in -ing").
Something that specifies the etymological origin, such as "ending in Latinate -ikus" might work. This also prevents the risk of bloat through people starting to add "ending in -X" as useless "wrapper" categories for every "suffixed with -X" category.
I'm not sure how these categories should be meshed with the pre-existing suffix categories, though. Do we put them in parallel, or as a parent category for the corresponding proper suffix category? I would lean towards the former, with crosslinks from the category description, but I'm open to arguments in other directions. --Tropylium (talk) 07:48, 6 August 2017 (UTC)

Sanskrit vs. Old Indo-Aryan[edit]

Currently, Module:languages lists only Sauraseni Prakrit as a direct descendants of Sanskrit. This is IMO completely misleading because there is nothing to prove that Sauraseni is any more a descendant of the Vedic dialect of Old Indo-Aryan than any other Prakrit. A simple example is Sanskrit क्षेत्र (kṣetra, region), from Proto-Indo-Iranian *ĉšáytram. The regular outcome of *ĉš in Middle Indo-Aryan is "ch". This is found in all of the Dramatic Prakrits as "chetta" (alongside a "kh" form, that likely came later as part of artificial alignment with Sanskrit), including Sauraseni. Indeed, where Sanskrit simplifies Proto-Indo-Iranian clusters to क्ष (kṣa), the Middle Indo-Aryan languages preserve the original cluster. If Shauraseni was a direct descendants of Vedic Sanskrit we would see only "khetta", no "chetta". So, that being said, we have two options.

  1. Remove Sauraseni as a Sanskrit descendant – Note that CAT:Terms inherited from Sanskrit has been cleared out with Wyang's help, so no module errors will occur. This is keeping in line with our treatment of Sanskrit as only Vedic Sanskrit (+Classical Sanskrit), not all Old Indo-Aryan.
  2. List all of the Dramatic Prakrits (Sauraseni, Maharastri, Ardhamagadhi) as direct Sanskrit descendant – This was suggested at Category talk:Hindi Tadbhava, and would involve treating Sanskrit as a dialect continuum of all Old Indo-Aryan + Classical Sanskrit. WT:ASA would have to be modified accordingly.

Personally, I think either option is better than the status quo. —Aryaman (मुझसे बात करो) 04:00, 6 August 2017 (UTC)

Pinging @JohnC5, माधवपंडित, DerekWinters. —Aryaman (मुझसे बात करो) 04:01, 6 August 2017 (UTC)
I would prefer option #2, ie, considering Sanskrit to be the entire group of mutually intelligible dialects, for the sake of convenience. Wiktionary treats Avestan, Old Norse & Serbo-Croatian as one language while in reality they're all two or more dialects. We can do the same for Sanskrit. ɱɑɗɦɑѵ (talk) 04:46, 6 August 2017 (UTC)
Not to mention none of the non-Vedic dialects are (well-)attested. And we could always have a reconstructed entry *च्शेत्र/*च्षेत्र (*cśetra/*cṣetra) if it is needed. —Aryaman (मुझसे बात करो) 06:34, 6 August 2017 (UTC)
There is already dialectal diversity within "Sanskrit". Strictly speaking even Classical Sanskrit does not descend from Vedic Sanskrit precisely, but from a parallel dialect that was not written down until later. This in mind, we could probably treat all Middle Indo-Aryan (and most of New Indo-Aryan) as descendants of "Sanskrit". Where MIA diverges from Classical Sanskrit, it would be possible to create reconstructed Sanskrit forms (similar to Category:Latin reconstructed terms). Perhaps we could outright consider merging "Proto-Indo-Aryan" into Sanskrit? Same deal as how we already equate Latin with Proto-Romance. --Tropylium (talk) 07:58, 6 August 2017 (UTC)
I agree that Sanskrit should be the collection of OIA dialects put together. However we cannot merge it with PIA because we need PIA for the Mitanni language. DerekWinters (talk) 15:11, 6 August 2017 (UTC)

making Tagalog an LDL[edit]

This was supported in WT:RFVN#hagok by @Metaknowledge, Mar vin kaiser, Atitarev, Stephen G. Brown (I think). @Rgt2002, TagaSanPedroAko may also have opinions. Please discuss.__Gamren (talk) 08:19, 6 August 2017 (UTC)

I agree that Tagalog is an LDL. —Stephen (Talk) 08:42, 6 August 2017 (UTC)
I also agree that Tagalog is an LDL. --Mar vin kaiser (talk) 09:22, 6 August 2017 (UTC)
Do we have any quotations of Tagalog in use in Wiktionary? Do we know of any online corpora that we can use? Is http://sealang.net/tagalog/corpus.htm a usable corpus to find quotations in use? Can Tagalog texts be found in Google books? What methods can a third party use to verify that Tagalog is so poorly documented that we should allow single mentions for it? --Dan Polansky (talk) 09:55, 6 August 2017 (UTC)
Yes, Tagalog is very poorly documented here in Wiktionary, but thanks for me being a native speaker of Tagalog, I am making efforts to make Tagalog a largely documented language here, from being a least documented language, or a LDL. I agree that Tagalog is still a LDL, and yes, there will be efforts to add quotations showing sample use of Tagalog words for a certain sense. Maybe finding interesting quotes in Tagalog by notable persons, if not by Tagalog-language publications, may help. -TagaSanPedroAko (talk) 11:23, 6 August 2017 (UTC)
@TagaSanPedroAko: The discussion is not about whether Tagalog is well documented in the English Wiktionary but rather whether it is well enough documented on the Internet, by which the users of the phrase mean, whether there are enough quotations of Tagalog in use (not dictionaries) to be found on the Internet. Since, these quotations of Tatalog in use is what the English Wiktionary uses for verification, per WT:ATTEST. And there is a proposal to allow single mentions in dictionaries to suffice for verification of Tagalog; single mentions do not suffice for English, Spanish, German, and multiple other languages. --Dan Polansky (talk) 11:50, 6 August 2017 (UTC)
There are a very few mainstream Internet sources for use in quotes that use Tagalog. The vast majority of Tagalog sources on the Internet will mostly be self-published, but if you can find one reliable one, like a book in Google Books or a Tagalog news website, then, here we go. I'm aware that there are reliable Tagalog (or Filipino) sources in the Net that attest use of certain words, but that will be difficult with the majority of Philippine Internet media use English. If I can dig through a reliable source, then, good.-TagaSanPedroAko (talk) 11:58, 6 August 2017 (UTC)

Unsolicited Babel requests[edit]

User_talk:Gfarnab#Babel
User_talk:Awewewe#Babel
User_talk:Pedrianaplant#Babel
User_talk:Leonardo_José_Raimundo#Babel
User_talk:ZH8000#Babel
User_talk:LexiphanicLogophile#Babel

"Could you please add {{Babel}} to your user page? I'd appreciate it. --Dan Polansky (talk) 08:41, 5 August 2017 (UTC)"

I suppose @Dan Polansky means well, but in my book this is spam. The biggest problem I have with this is that he makes it look like it's a personal message. He says he re-types it every time he posts it, but it's still the same message every time. I wouldn't mind if he wrote a personal message for every request and explained why it would be so valuable to him to see that user getting a Babel, or if he would make it clear in the message that it's not really personal.

I personally don't appreciate these messages, but maybe it's just me. W3ird N3rd (talk) 10:17, 6 August 2017 (UTC)

  • The primary purpose of user pages it to give other editors an idea of an editor's competence in a particular language. Babel boxes are the best way of achieving this. Please add a babel box to your own user page (if and when you create one). SemperBlotto (talk) 10:22, 6 August 2017 (UTC)
LOL.
I have seen plenty of users with a babel box, I thought about it and decided not to create a user page at this moment. If and when I do, I don't think I'll add a babel box. I don't really like them. W3ird N3rd (talk) 10:34, 6 August 2017 (UTC)
  • Funny how you're complaining about Dan "spamming" talk pages with something useful to the project... by spamming this forum page. —Μετάknowledgediscuss/deeds 00:44, 7 August 2017 (UTC)
  • I think you don't know what spam is. Spam means unsolicited bulk electronic messages. Being useful or not doesn't matter, although useful spam is less likely to be frowned upon. If you are getting e-mail that you didn't ask for from you local supermarket with various offers that you actually like, it's still spam. I have only brought this up here, nowhere else. I don't have any intention of posting this anywhere else either. I'm also not asking anyone to do or buy anything. You may find this pointless and you are entitled to your opinion, but that does not make this forum post spam. In my opinion the babelbox is getting enough exposure as it is. If such messages are accepted, it might lead to a slippery slope. I just wanted the community to be aware of this phenomenon, if the community thinks it's fine I'll say no more. W3ird N3rd (talk) 05:49, 7 August 2017 (UTC)
The Beer Parlour is the place to discuss these things. This discussion is not spam. That said, personally I'm OK with Dan requesting people to use babel boxes. Sometimes we need to know who speaks a certain language, and the boxes make that job easier. --Daniel Carrero (talk) 05:53, 7 August 2017 (UTC)
I think Babel boxes are a good thing,t requesting them is a good thing, and not responding constructively to such a request is a bad thing. DCDuring (talk) 06:06, 7 August 2017 (UTC)
I also think that it pays for such a request to have some explanation of the purposes served. DCDuring (talk) 06:08, 7 August 2017 (UTC)
Adding a Babel table should be our standard policy, if it's not already. A standard {{welcome}} message includes that request. If users refuse to tell other users what languages they know or they don't they should go somewhere else. Not knowing a language doesn't mean that you can't edit in that language but others editors can check your edits accordingly or monitor edits. --Anatoli T. (обсудить/вклад) 06:21, 7 August 2017 (UTC)
Technically you've fulfilled the request. You've added {{Babel}} to your userpage. Wyang (talk) 06:29, 7 August 2017 (UTC)
And now that we know that you can't speak any languages, any of your contributions will be ignored. SemperBlotto (talk) 06:42, 7 August 2017 (UTC)
[2]suzukaze (tc) 06:48, 7 August 2017 (UTC)
But, at an earlier, saner time: [3]. DCDuring (talk) 11:16, 7 August 2017 (UTC)
I don't exactly like that W3ird N3rd doesn't have a Babel box, but if the user doesn't want one don't make them feel forced to have one. Some very contributing members of Wiktionary don't have user pages at all. That said, W3ird N3rd isn't exactly spamming this forum, but I just don't feel like this discussion is appropriate for the beer parlour, especially since it's targeted at one user alone (Dan). PseudoSkull (talk) 01:49, 8 August 2017 (UTC)
It feels a bit out of place indeed, but I've looked around and Wiktionary:Information desk, Wiktionary:Tea room and Wiktionary:Grease pit were clearly the wrong places. Although this post is indeed about one user, my comment was about the phenomenon. I don't know if any other users are doing this, but what I said would apply to them all the same. My biggest issue is probably this line: "I'd appreciate it." which was repeated for all users. Maybe it's because I'm Dutch (the Dutch are known for being direct), but I just can't stand it when someone pretends to care.
Just one more thing. I mentioned the possiblity of a slippery slope. One of the reasons I don't want a babel box is because (depending on how many languages you know) it looks like a unicorn just barfed a rainbow. We all know the average Wikipedia user page looks like a Christmas tree and while it won't happen overnight, it must have started somewhere and the road to hell is paved with good intentions. It may not happen at all - but if users start pushing a template, even if this one now is a useful one, it might. I believe it would be more wise not to allow any users to promote templates this way and if it is believed the babel box isn't getting enough exposure, have the administrators decide on a way to inform users. But clearly, I'm standing alone on this one. W3ird N3rd (talk) 03:26, 8 August 2017 (UTC)
If the issue you have with the babel box is too much unicorn barf on user pages, then you could use a different method to give information on what your native language is and what your levels of proficiency are in other languages. — Eru·tuon 03:53, 8 August 2017 (UTC)
There's no slippery slope here: Wikipedia-style user boxes aren't allowed, with the exception of Babel, time zone, and maybe one or two others that provide useful information. That's the way it's been since long before I started here 5 years ago, and I doubt it will change. Chuck Entz (talk) 04:40, 8 August 2017 (UTC)
  • I also see no slippery slope. The Wiktionary community has been very careful to avoid unicorn barf.
And I also see no disingenuousness on Dan's part. I, too, appreciate it when users add Babel boxes to their user pages -- at least, when those Babel boxes are at least vaguely accurate, as they provide the community with useful and usable information on who understands which languages, and roughly to what degree. For a multilingual dictionary project, this kind of user metadata is very useful.
FWIW, W3ird N3rd's behavior comes across as immature, and willfully disrespectful of Wiktionary norms, albeit on a minor scale that's more of a slight annoyance than anything actionable. I suspect some of his (her?) reticence comes from the Wikipedia culture and a lack of familiarity with the Wiktionary project. On Dan's part, I see no spam, and nothing inappropriate in asking for a Babel box.
I hope W3ird N3rd can learn more about how Wiktionary functions, and grow to be a comfortable and productive member of the community. ‑‑ Eiríkr Útlendi │Tala við mig 06:09, 8 August 2017 (UTC)
If my contributions in the main dictionary space are not productive, I might as well stop contributing. It's not going to be all that much better in the future. I thought I was being productive, but thanks for pointing out to me that I'm not. I know you think this is immature, but why should I care? Either I really am not productive, in which case you should just think "good riddance" or I am but you insult me (at the very least that's how this comes across), in which case why should I stay? W3ird N3rd (talk) 14:21, 8 August 2017 (UTC)
  • My perspective: 1. Yes, distributing the same message electronically to a larger number of people is spam. Textbook definition. 2. I see no harm in every user receiving this spam message once as it is merely a request for a useful addendum. 3. This is a Wiki-project, not Lord of the Flies, Jante or a Catholic School in a Celtic country. Wiki itself is based on and centered around voluntary contributions. Of course the community can come together and regulate things to prevent harmful additions to the project, but demanding any user share any information on himself or add a specific thing, that is: Forcing involuntary contributions, is the fucking opposite of what this project is supposed to be and everyone who entertains that trail of thought is indeed about to open Pandora's Box and pervert Wiktionary (an open project where everyone can partake) into a generic online dictionary run by a junta of seniors. Korn [kʰũːɘ̃n] (talk) 10:11, 8 August 2017 (UTC)
Just the request isn't even what bothers me most. Had it been worded like "Could you please add {{Babel}} to your user page? The Wiktionary community would appreciate it." I wouldn't have been even close to as annoyed as I was now. I know what many here will say: "what am I complaining about, that's hardly any different at all, what sort of moron are you, yadda yadda yadda". To me this would make all the difference. It would make it clear Dan isn't personally asking me to do this, he is asking on behalf of the Wiktionary community. Which also means that if I decide to ignore it, I'm not letting Dan down personally. To me, that's a big difference. Again, I don't expect anyone to side with me. It's just my opinion. Yes it is a stupid opinion. I'm a stupid person and there's no need to further comment on that, I admit it, move on. W3ird N3rd (talk) 14:21, 8 August 2017 (UTC)
Instead of "Could you place Babel to your user page? I'd appreciate it," you wanted "Could you please add Babel to your user page? The Wiktionary community would appreciate it"? I can't see the difference and English is my native language. Dan is Czech and he does not have a perfect command of English. Most of our editors have a different language as their first language. It has never occurred to me to be offended by English comments that are not just so. I think most people write the best they can and they don't mean to offend or confuse. The reader should bear some of the load of communication by showing a more tolerance and understanding. It improves the atmosphere. —Stephen (Talk) 16:03, 8 August 2017 (UTC)
I tried to explain it, I'll do it again knowing full well it won't make a difference. If you say "Please do X, I'd appreciate it." I feel like I'm letting you down when I don't do it. (and the community may or may not care about X) If you say "Please do X, the community would appreciate it." it tells me the community in general would prefer this, I'm not letting you down personally if I don't. I wouldn't even think this difference, or at least what I perceive as a difference, would be language-dependent. I suppose not every individual would recognize this difference though. And maybe somehow I'm the only one. In which case I'm wrong and my faulty interpretation lead to a long and useless argument of misunderstanding and contempt. Well, if my understanding of the English language is that shitty I probably shouldn't be here anyway. Which was another reason I wouldn't want to add a babel box: I can't judge to what degree I master any language. W3ird N3rd (talk) 16:38, 8 August 2017 (UTC)
@W3ird N3rd: I don't think that I would feel like I'm letting anybody down by not adding a Babelbox, no matter how the message asking for it was worded. It's really not that important to discuss this imo. —Aryaman (मुझसे बात करो) 04:55, 11 August 2017 (UTC)
It seems to me your English is just fine. Personally, I disagree that Dan's phrasing was due to him being Czech. I suspect he prefers in general not to speak on behalf of "the community". But I could be wrong. — Eru·tuon 17:23, 8 August 2017 (UTC)
Indeed, I don't like to speak on behalf of community. The Babel practice is common but the appreciation is mine. --Dan Polansky (talk) 10:47, 19 August 2017 (UTC)
For example this sentence: "The reader should bear some of the load of communication by showing a more tolerance and understanding.". To me, this seems wrong. (the most obvious fix to me would seem to be to change "a more" to "a little more") It could be a joke (writing a broken sentence to prove your point), a genuine error (even a native could make mistakes) or (which would seem more likely as English is not my native language) this is correct but I just don't understand it. I also think I don't write text the way most people do today: I don't use any kind of spell checker or autocomplete. That may also result in me looking at language in a different way. W3ird N3rd (talk) 17:01, 8 August 2017 (UTC)
(An academic discussion on what is spam) "distributing the same message electronically to a larger number of people is spam": Not really. In my job, I receive job-related emails from management that are distributed to a larger number of people, and they are obviously not spam; spam filters are not designed to remove these kinds of messages. A message related to Wiktionary purpose posted in multiple instances on Wiktionary is not necessarily a spam. The definition of spam is not so simple as some people think; I don't think I have a good comprehensive definition. Being posted to a larger number of people is a component of being a spam, but that alone does not suffice. By the way, our welcome messages are much more of a spam than these requests for Babel given how long they take to read. --Dan Polansky (talk) 10:47, 19 August 2017 (UTC)

Weird arrow next to uses of {{taxlink}}?[edit]

@Erutuon, DCDuring, Sgconlaw There is a weird arrow that sometimes appears next to the name of species and such that are formatted using {{taxlink}}. What's its purpose? It looks wrong, and is mentioned nowhere in the documentation. Can we get rid of it? For an example, see пога́нка (pogánka). Thanks! Benwing2 (talk) 20:44, 6 August 2017 (UTC)

I have categories to detect the conditions that cause them, which I consulted as soon as I saw "weird arrow" in the alerts. I found поганка in one of the categories and eliminated it. If they occur when you use taxlink, that means we already have an entry for the taxon involved and the template should be removed. Besides the situation of a new use of the templates there can be "many" entries that are affected by adding a new taxon or vernacular name. When I add either type of entry I try to eliminate any uses of the template in linked entries that would generate the "weird arrow". I will add something about this in the documentation for the two templates, though I don't expect it will be consulted, this being the first time it has come up, though I might be wrong. DCDuring (talk) 21:06, 6 August 2017 (UTC)
Also, I watch the category (as well as most other taxon-related categories) and would have detected the entry the next time I checked my watchlist. DCDuring (talk) 21:09, 6 August 2017 (UTC)
I remember seeing the "weird arrow" before. DCDuring, wouldn't it be sufficient for the template to place entries that require your intervention in the category, without the arrow also appearing? — SGconlaw (talk) 21:23, 6 August 2017 (UTC)
We already have such categories, which I aggressively police to keep empty.
The trouble is that it takes me quite a while to find the instances of redundant templates without using ctrl-f on the displayed text to find "=>". It is always at least a bit faster with the "=>". The problem is worst in entries with unusually large Hyponyms or Derived terms sections, with multiple L2 sections, with the use of {{taxlink}} or {{vern}} in the middle of definitions for polysemous terms or in unexpected locations.
If someone knew a way so that something displayed in the entry that optionally only an anointed few (me included) could see, we could eliminate the need for anyone to consult and grasp the documentation to eliminate the offending "=>". DCDuring (talk) 21:40, 6 August 2017 (UTC)
No idea how to do that. Maybe it could be made more understandable by replacing it with some reduced-size text like "needs attention" (compare the "Invalid ISBN" warning generated by {{ISBN}}), but I don't know whether you think that would make the warning too prominent. — SGconlaw (talk) 21:49, 6 August 2017 (UTC)
The offending "=>" can be eliminated with CSS. If we enclose this symbol in a HTML tag with a unique class name (say class="taxlink-redundant"), and create a CSS style rule that vanishes it (display: none;), which can either be placed in the HTML tag or in MediaWiki:Common.css, then the symbol can be un-vanished at will. Putting the style rule in MediaWiki:Common.css requires the help of an admin. Let me know which option you would prefer and I can give further help. — Eru·tuon 21:56, 6 August 2017 (UTC)
@Erutuon:'s solution seems great. I'm an admin. I would just need to be instruction as to what to put where so that I could still see the "=>" (which has the advantage of being easy to type and rarely used except for this purpose). The name for the style could be something like "redundant template finding aid" or a comprehensible abbreviation of that. DCDuring (talk) 22:16, 6 August 2017 (UTC)
I guess its value as a recruitment tool for proper (non-redundant) use of {{taxlink}} and {{vern}} is not much of a consideration. DCDuring (talk) 22:18, 6 August 2017 (UTC)
Why shouldn't I be taking the approach of having some red text telling folks that they should remove the offending template? I think there is precedent for that. It might even be in continuing use. DCDuring (talk) 22:21, 6 August 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── (edit conflict) Well, I suggested the class name taxlink-redundant, but you can choose a different one. Another idea: redundant-taxlink-mark? Whatever you choose, it should be made up of basic Latin and hyphens. Spaces will be misinterpreted. The code to add to MediaWiki:Common.css (the period . indicates that what follows is a class name):

.taxlink-redundant {
display: none;
}

And the code to add to your Special:MyPage/common.css:

.taxlink-redundant {
display: inline;
}

And then, in the template code for {{taxlink}}, replace <sup>=></sup> with <sup class="taxlink-redundant">=></sup>.

If you want to use a different class name, just replace taxlink-redundant in each of the three code snippets with whatever class name you choose. — Eru·tuon 22:33, 6 August 2017 (UTC)

If you want to keep the mark, how about changing it to a message with instructions that only displays in preview mode? For example, <sup class="error previewonly"><small>(Replace {{temp|taxlink}} with a regular link.)</small></sup>. Admittedly, that will be even more annoying than the little arrow thingy. — Eru·tuon 22:37, 6 August 2017 (UTC)

I was hoping to use the same class for both {{vern}} and {{taxlink}}. It might be useful for other similar applications though I don't know of any.
We have plenty of instances of the much more annoying technique used to enforce correct use of templates by displaying 80 or more characters of red text, sometimes with incomprehensible messages buried in them.
I will sleep on this before implementing and give others a chance to weigh in, but thanks for the implementation suggestion. It seems to fit the bill perfectly. I take it that CSS is not much more burdensome on server resources than HTML and doesn't raise the risk of latency problems like JS. DCDuring (talk) 23:15, 6 August 2017 (UTC)
@Erutuon: You had mentioned above that we could accomplish the optional display of default-hidden text if we "create a CSS style rule that vanishes it (display: none;), which can either be placed in the HTML tag or [] ". Where exactly would the HTML tag reside? DCDuring (talk) 18:47, 8 August 2017 (UTC)
The HTML tag that I mean is the <sup>=></sup> that appears in the template source code. — Eru·tuon 18:51, 8 August 2017 (UTC)
That seems like a better implementation, since evidently I am the only one using and virtually the only one aware of this. I could include a reference to the decloaking technique in the documentation for {{taxlink}} and {{vern}}. No adminship required either. DCDuring (talk) 20:01, 8 August 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── Well, if you choose that option, you will have to use the following code in your common.css:

.taxlink-redundant {
display: inline !important;
}

The !important makes the style rule overrule the CSS in the HTML tag; otherwise, the CSS in the tag will win out. — Eru·tuon 20:37, 8 August 2017 (UTC)

That is what I will do. Thanks for the help. If I have problems, I will see you on your talk page. DCDuring (talk) 20:53, 8 August 2017 (UTC)

reading out "-"[edit]

In results, such as 2-0, how is the hyphen spelled out? I think such a pronunciation should be added to its entry --Backinstadiums (talk) 21:16, 6 August 2017 (UTC)

Isn't it silent in many cases? The score would just be read as "two nil". I suppose on occasion it would be read "two to nil". (Also, it should really be an en dash.) — SGconlaw (talk) 21:24, 6 August 2017 (UTC)
@Sgconlaw: The singer Tee Grizzley, in his song First Day Out, says "two and o" for 2-0 at min. 3:48 --Backinstadiums (talk) 22:10, 6 August 2017 (UTC)
I agree with SGconlaw - two nil. It can be different in broadcast results, if the home team loses it would be "Team xx nil, Team yy two". DonnanZ (talk) 23:31, 6 August 2017 (UTC)
If we're talking about sports scores and the like, no one would ever say "two nil" or "two to nil" in Canada. It would be one of the following (in rough order of frequency): "two nothing", "two to nothing", "two to zero", or possibly "two zero". Andrew Sheedy (talk) 18:40, 7 August 2017 (UTC)
I just realized this is irrelevant, as the discussion is about the hyphen, not the 0, but oh well. Andrew Sheedy (talk) 19:03, 7 August 2017 (UTC)
:-D — SGconlaw (talk) 03:46, 8 August 2017 (UTC)
It could be either read as "to" or as nothing. When counting game wins rather than the in-game score, it's often read as "and" (in the US at least), as in "My friend and I are five-and-three", although this is more often done for wins-vs-losses of a single party (this is the case in Backinstadiums's song reference above, even though those are trials and not literal "games"). --WikiTiki89 18:55, 7 August 2017 (UTC)

Missing category?[edit]

I can't find any category for Washington, D.C. or District of Columbia, only for the state of Washington. I guess there should be one, but what should the name be? DonnanZ (talk) 23:17, 6 August 2017 (UTC)

  • The main form is at "Washington, D.C." so I made a category for that. There are plenty of words that refer to the District. Good call. —Justin (koavf)TCM 23:29, 6 August 2017 (UTC)
Brilliant, thanks. DonnanZ (talk) 23:37, 6 August 2017 (UTC)

Share your thoughts on the draft strategy direction[edit]

At the beginning of this year, we initiated a broad discussion to form a strategic direction that will unite and inspire people across the entire movement. This direction will be the foundation on which we will build clear plans and set priorities. More than 80 communities and groups have discussed and gave feedback on-wiki, in person, virtually, and through private surveys[strategy 1][strategy 2]. We researched readers and consulted more than 150 experts[strategy 3]. We looked at future trends that will affect our mission, and gathered feedback from partners and donors.

In July, a group of community volunteers and representatives from the strategy team took on a task of synthesizing this feedback into an early version of the strategic direction that the broader movement can review and discuss.

The first draft is ready. Please read, share, and discuss on the talk page. Based on your feedback, the drafting group will refine and finalize this direction through August.

SGrabarczuk (WMF) (talk) 16:11, 8 August 2017 (UTC)

Unsorted formations[edit]

I've seen Unsorted formations in descendant trees formatted as either * Unsorted formations or ; Unsorted formations. Is there a written guideline for this? --Victar (talk) 16:48, 8 August 2017 (UTC)

The standard practice is with *, so that it's listed on the same level as all other formations. —CodeCat 15:55, 9 August 2017 (UTC)
@CodeCat: Is that outlined in a guide or something somewhere? Like I said, I've seen both, so there doesn't seem be true "standard". @JohnC5? --Victar (talk) 20:47, 11 August 2017 (UTC)
@Victar: I've always used ;. On an unrelated note, Victar, please don't just start moving around entries (specifically the new entries) without discussing with anyone. I'm not convinced that was a good choice and may now have to revert all that. If the is not phonemic then it should not be included; if it is then it shouldn't be subscript. It is extremely frustrating that you just did this. —JohnC5 02:43, 12 August 2017 (UTC)
@JohnC5: I only moved two entries; not the end of the world. Also, very unrelated and should have been discussed elsewhere. --Victar (talk) 03:03, 12 August 2017 (UTC)

Quotations vs. Citations[edit]

I'd like to know the protocol for using them, as well as the differences they are meant to represent --Backinstadiums (talk) 14:43, 9 August 2017 (UTC)

I don't know if we have a standard for the terms, but I have been using the term citation to refer to sources providing evidence for information stated in entries, which are usually placed in "References" or "Further reading" sections. The {{cite}} and {{R:}} groups of templates may be used for this purpose. On the other hand, a quotation is an extract from a source that is provided as an example of the entry in use, and which is placed directly under a definition. The {{quote}} and {{RQ:}} groups of templates is used for them. For example, at merlion, there is one "citation" in the "References" section, and a number of "quotations" under the various definitions. However, note that entry pages have a tab called "Citations" which really contains quotations. — SGconlaw (talk) 15:07, 9 August 2017 (UTC)
I've been confused by this as well. Seems like they are used interchangeably. I've seen plenty of quotations from books that are over a hundred years old, providing no clue of how the entry is used today or how you could use it yourself. Personally I prefer examples. They don't come in a collapsed box, there is no question about proper citing due to copyright issues and they are designed to show how the entry is and can be used without clutter. Personally I put quotes and citations all the same on the citations page. W3ird N3rd (talk) 17:16, 9 August 2017 (UTC)
here are my unpopular opinions: quotations and citations are used interchangeably, I don't think there's a meaningful distinction. "Examples" are made up and may not reflect actual usage, there are no potential copyright issues with quoting from parts of works. The citations page is at best useless and at worst actively harmful and should not be used except to collect evidence for missing words or senses. DTLHS (talk) 17:26, 9 August 2017 (UTC)
If examples don't reflect actual usage they are likely to be bad examples. Copyright issues could arise if a quotation is too long or not properly attributed and laws for this possibly vary around the world. Quotations often are not reflecting actual (common) usage either, so I don't think that's a good reason to have them. W3ird N3rd (talk) 20:18, 9 August 2017 (UTC)
My understanding is that Wiktionary's servers are based in the USA, so it is primarily US law that must be complied with. It is unlikely that the quotations we use would violate copyright for two main reasons. First, all material published before 1923 is in the public domain in the USA and can be freely reproduced. Secondly, most of our quotations are obtained from works available on Google Books and the Internet Archive. If it is possible to view either a snippet or a full page preview of a book on Google Books, then the use of that portion of the book must be fair use under the law. Ergo, quoting an even shorter portion on Wiktionary must also be fair use. — SGconlaw (talk) 11:29, 10 August 2017 (UTC)
I wouldn't go so far as to say that availability on Google Books is indicative of anything, but the amount of text in the kind of quotes we use should fall under fair use. If the quote is too long for fair use, it's way too long for our purposes. Chuck Entz (talk) 14:06, 10 August 2017 (UTC)
In the books I've been reading lately, I've come across at least one to two dozen words in each one that we don't have entries for. Is it safe to take quotations for each of those words from the same book? How many quotations should I limit myself to to avoid violating fair use? Andrew Sheedy (talk) 17:47, 10 August 2017 (UTC)
@Andrew Sheedy First of all you should obviously check those words haven't been made up by the writer and they pass WT:CFI. There is no limit. For each word you should limit yourself to one or two quotes, there is no point anyway in having more quotations from the same work. As for quoting from the same work but on different page entries on Wiktionary, I would say that if the total amount quoted from the work is less than 5% of the entire work you have absolutely nothing to worry about. For a book that means there is no practical limit. For a poem a bit more would be allowed, some poems just might accidentally end up being entirely quoted here in small bits. As long as there's no obvious intention to violate copyright by overquoting a work, you'll be safe. W3ird N3rd (talk) 19:11, 10 August 2017 (UTC)
I'd completely forgotten about the 5% rule--thanks for reminding me. And don't worry, I always make sure to find citations for words before I add them (which is the main reason I haven't gotten around to adding more...). Andrew Sheedy (talk) 17:15, 11 August 2017 (UTC)
@Andrew Sheedy not sure if you are being sarcastic, genuinely grateful, referring to the 5% rule in general or if there really is a 5% rule for quotes/citations. It's just a number I picked, it could have also been 1%, 3%, 7%, etc. The point remains the same however, for fair use (which I think includes the right to quote for the U.S.) it would generally be a reasonable safe threshold. It could be exceeded in various cases, if I wrote a review for a poem that is twice the length of the original poem, there's a good chance I could cite the entire poem in small pieces. But under 5% for all quotes combined you simply don't have to worry about it - which is the majority of the time. W3ird N3rd (talk) 18:33, 11 August 2017 (UTC)
I thought it was an actual rule (i.e. you can legally reproduce 5% of a work). Maybe it is in Canada? I'll have to look that up. Andrew Sheedy (talk) 02:40, 12 August 2017 (UTC)
Yes, Wiktionary servers are in the U.S., but Wiktionary content might be reused by people in other countries without fair use. W3ird N3rd (talk) 19:11, 10 August 2017 (UTC)
Not that it is terribly relevant to the conversation, but Wikimedia servers are not all based in the US, nor should we expect that they will reside in the US exclusively in the future. - TheDaveRoss 12:53, 11 August 2017 (UTC)
Indeed irrelevant to the discussion at hand. I suspect servers outside the U.S. are caching servers, caches have different rules, but if someone wants to know more they should start a new discussion. W3ird N3rd (talk) 05:06, 12 August 2017 (UTC)
  • I think (but this just my interpretation) that citations and the citation page are perhaps meant for long quotes. "I have a dream", "Yes we can" or "Build a wall" would be a quotes. This would explain why quotes are allowed in the main dictionary space: copyright generally shouldn't apply to a quote. For example:
We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard
This is a quote and there's pretty much no chance this is copyrighted, similar to the moon not being copyrightable. However:
We choose to go to the moon in this decade and do the other things, not because they are easy, but because they are hard, because that goal will serve to organize and measure the best of our energies and skills, because that challenge is one that we are willing to accept, one we are unwilling to postpone, and one which we intend to win, and the others, too.
Is citing Kennedy and likely can be copyrighted, so it requires proper attribution (what proper is depends on the country you are in) and/or be allowed by fair use. W3ird N3rd (talk) 19:11, 10 August 2017 (UTC)
I doubt very much if the latter quotation is a breach of copyright. It is still only a small portion of the entire speech. It might be a different matter if we reproduced, say, a third or half of the speech, but that isn't what we do anyway. I agree with Chuck that the quotations we use here at the Wiktionary are unlikely to raise copyright issues. — SGconlaw (talk) 21:50, 10 August 2017 (UTC)
Fair use is a lot more permissive than the right to quote, but is specific to the United States. W3ird N3rd (talk) 03:25, 11 August 2017 (UTC)
For what it's worth, speeches made by government officials in their official capacity are in the public domain, so none of that speech is copyrighted in the US. - TheDaveRoss 12:58, 11 August 2017 (UTC)
That's true for this example, I should have mentioned that. Thanks. W3ird N3rd (talk) 05:06, 12 August 2017 (UTC)

Idiom[edit]

Do we have a page which lists all entries containing "Idiom" as a headword? If not, can we get one made? I guess we prefer Verb rather than Idiom for things like take an axe to. -WF

  • Before it was declared forbidden, I've used the ====Idioms==== header in the past for set expressions using a particular word, such as at 糞#Idioms. I see that some other JA entries have these expressions listed under ====Derived terms====, which doesn't seem quite right either, as these aren't "terms", sometimes comprising even full sentences.
What is the accepted header for these items now? ===Verb=== is not applicable for most of the Japanese expressions I can think of. ‑‑ Eiríkr Útlendi │Tala við mig 17:24, 9 August 2017 (UTC)
Please limit this to English-only. “Idiom” is the conventional translation for the Chinese part of speech of chengyu. Wyang (talk) 07:42, 10 August 2017 (UTC)
Are we going to have "Haiku" as a part of speech next? —CodeCat 09:45, 10 August 2017 (UTC)
Tangentiality. Are you OK? Wyang (talk) 09:51, 10 August 2017 (UTC)
That doesn't seem a part of speech coordinate with noun, verb, adjective, but I don't know how it would be used. Are they not used as nouns, verbs, adjectives, or something else? It seems like using "Word coined by Shakespeare" as a part of speech header. Admittedly, we have "Proverb", which might be similar. — Eru·tuon 17:32, 10 August 2017 (UTC)
A proverb is generally nothing more than a sentence. —CodeCat 17:40, 10 August 2017 (UTC)
  • Great discussion, but what I wanted was a page listing all entries with {{head|en|idiom}} —This unsigned comment was added by WF back from hols (talkcontribs) at 15:26, 10 August 2017‎ (UTC).
    There's no way to get a single-page listing, but you can search the wikicode for insource:/\{\{head\|en\|idiom/. [edit: Actually, it is a single page, just because there are so few.] — Eru·tuon 20:32, 10 August 2017 (UTC)
    It's also possible to de-list "idioms" from Module:headword/data. Then it would no longer be recognised as a valid POS category and end up in Category:head tracking/unrecognized pos. —CodeCat 20:35, 10 August 2017 (UTC)
    Thanks Erutuon! That's exactly what I wanted. I'm been making my way through those pages. A little cleanup done, and a few of them sent to RFx. --WF back from hols (talk) 23:40, 11 August 2017 (UTC)
    Without a plan on what to do after that, that would be unwise. There are 2,305 entries using {{zh-idiom}} (insource:/\{\{zh-idiom/) and 4,317 entries in the category for Chinese idioms, so that tracking category would become cluttered. And the cooperation of editors who handle Chinese would be needed to get the entries moved to the proper part of speech. — Eru·tuon 21:16, 10 August 2017 (UTC)
    I really hate the mentality that everything that is “improper” in English is assuredly improper in other languages by default, and needs to be “fixed”. Idiom is a perfectly fine part of speech in Chinese, and is in fact the most common translation for Chinese chengyu. Chinese lexicography treats these as a separate category of words, and there are numerous dictionaries compiled just for words belonging to this category. The comprehensive Chinese dictionaries typically do not mark entries by their part of speech, due to the analyticity of the language. In those monolingual and bilingual dictionaries that do, these words are either marked by  成  (cheng, idiom) (primarily in bilingual dictionaries) or unmarked (in Chinese–Chinese dictionaries), in juxtaposition to  名  (noun),  动  (verb),  形  (adjective),  副  (adverb),  惯  (phrase),  谚  (proverb),  歇  (xiehouyu), etc. Examples include the Contemporary Chinese Dictionary, the Comprehensive Standard Chinese Dictionary, the Oxford Chinese Dictionary, the Times New Chinese–English Dictionary and so on. The same idiom can be used as noun, verb, adjective, adverb, etc. depending on the context in the sentence, and their use is different from that of proverbs, phrases, and xiehouyu. Wyang (talk) 22:27, 10 August 2017 (UTC)
    Ahh. If they can be used as multiple other parts of speech, then I can see the lexicographic usefulness of keeping them as they are rather than trying to list all the other parts of speech they can be used as. However, it would be helpful to distinguish them somehow from the concept of idiom in English, which is quite different. The description in the category page Chinese idioms is probably not correct. — Eru·tuon 22:53, 10 August 2017 (UTC)
    I guess those entries should also contain {{lb|en|idiom}} so they show up in Category:English idioms if they are indeed an idiom. Since there are so few that shouldn't be a problem. Some recently disappeared already so this looks like it's getting phased out. W3ird N3rd (talk) 20:47, 10 August 2017 (UTC)
    But is idiom a context in which the word appears? If not, then it might be a misuse of the label template. —CodeCat 20:52, 10 August 2017 (UTC)
    I agree that the POS should be based on how they are used and not where they come from. Thus, I would say "proverbs" should really have the POS "clauses". --WikiTiki89 21:00, 10 August 2017 (UTC)
    Possibly, but in that case the English idiom category will have to be populated in some different way. insource:/\{\{lb\|en\|idiom\}\}/ gives 210 hits. W3ird N3rd (talk) 21:34, 10 August 2017 (UTC)

MW has a new feature to see dates of coinages[edit]

https://www.merriam-webster.com/words-by-first-known-date/1786Justin (koavf)TCM 07:30, 10 August 2017 (UTC)

Very cool. But really, the dates are sense-specific, and hence word-specific only if the word is monosemic. Wyang (talk) 07:39, 10 August 2017 (UTC)
At first I thought you meant MediaWiki, and was worried they were up to another waste of human resources. --WikiTiki89 15:42, 10 August 2017 (UTC)

-градить and other "combining form"s[edit]

A bunch of Russian entries are appearing in Category:head tracking/unrecognized pos, because they use the POS category "verbal combining forms" which is not valid. They are also being categorised as verbs, which is even less correct because these forms don't actually exist. They are only found in compounds, and are thus comparable to creating cran for the first morpheme in cranberry, or liezen for the base verb of verliezen. Something should be done about these. They can't be moved to the reconstruction namespace, they are not reconstructions because they are not conjectured to exist; we know they don't exist. A valid POS should also be used so that they don't clog up cleanup categories anymore. —CodeCat 18:00, 10 August 2017 (UTC)

What's wrong with "Combining form"? Crom daba (talk) 18:54, 10 August 2017 (UTC)
Perhaps they should be recategorized as "combining forms" or have that category added. It is a recognized lemma type in Module:headword/data. I agree they don't really count as verbs in a sense. But I think you are against the idea of a combining form, because you've recategorized combining form entries that I've created. — Eru·tuon 19:03, 10 August 2017 (UTC)
A combining form is a non-lemma form that is used when combined with another morpheme. That's very different from this. —CodeCat 19:05, 10 August 2017 (UTC)
Why is it not a lemma? I see how you can say it's not a real word, but it is a lemma in that it is a form representative of a paradigm of related forms (i.e. the conjugated forms given in the conjugation table). --WikiTiki89 19:30, 10 August 2017 (UTC)
These are lemmas, I'm not disputing that. I'm saying that combining forms aren't lemmas. Most of the categories in Category:Combining forms by language contain nonlemmas, even though the categories themselves are categorised as lemmas. —CodeCat 20:15, 10 August 2017 (UTC)
Oh. Yeah, it looks like what we use "combining form" to mean is completely different from what these are. In fact I was actually in favor of removing the hyphen from these entry names. I think we need to come up with a special name for these. Something like "unused base verb". --WikiTiki89 20:20, 10 August 2017 (UTC)
There's also things like Judeo- which are a bit in between. They are of course combining forms of nouns in Ancient Greek, but in English they don't really belong to anything. Or do they? —CodeCat 20:28, 10 August 2017 (UTC)
I think that's a separate unrelated issue. **гради́ть (**gradítʹ) morphologically could have stood on its own if it existed, but it just so happens that it only exists with prefixes (although it's quite possible that it did exist in Proto-Slavic or earlier). Judeo- I would say is a combining form whose uncombined forms don't exist. --WikiTiki89 20:41, 10 August 2017 (UTC)

Wiktionary: a translation dictionary only?[edit]

Should we stop pretending to be a good monolingual dictionary, for the achievement of which the wiki way ("wisdom of crowds") seems ill-suited? Would we be better of playing to what seems to be our strength: translation. This would mean "translation target" would be an automatic justification for any English entry and would upgrade the importance of phrasebook entries and common collocations. It would probably benefit from simplification of complex polysemous entries like those for technical, let alone really polysemous terms. DCDuring (talk) 19:21, 11 August 2017 (UTC)

The project should probably be forked, to support the deletionist and never-delete-anything-ist camps. Equinox 19:29, 11 August 2017 (UTC)
A "good translation dictionary" necessarily describes complex polysemous English words, so no. DTLHS (talk) 23:12, 11 August 2017 (UTC)
How can we a good translation dictionary now, then? DCDuring (talk) 04:23, 12 August 2017 (UTC)
I suspect this has been triggered (although probably not initiated) by the revert of my edit on "technical". Wiktionary:Requests_for_cleanup#technical W3ird N3rd (talk) 23:45, 11 August 2017 (UTC)
Don't take it too hard. I know that basic nouns, verbs and adjectives with multiple senses are hard and the most basic function words are harder yet. If cleaning up technical were easy, then I would have done it myself. I'm out of practice and never successfully tackled any basic function words. DCDuring (talk) 04:29, 12 August 2017 (UTC)
I think the solution, to appease both camps, is to actually allow the oft-discussed collocations section/namespace/whatever. This would allow us to be a far better translations dictionary because each collocation would have a translation section and we wouldn't have to resort to the controversial "translation target argument." The inclusionists would also be able to include far more, since much of what is hotly debated in RFD could be kept as a collocation. Deletionists could also be satisfed because there would be less pressure from inclusionists to keep SOP collocations in the mainspace. Andrew Sheedy (talk) 04:53, 12 August 2017 (UTC)
Wiktionary has included languages other than English for a long time, if not from day 1. So it's only right that Wiktionary should be translation-oriented. One inconsistency I have found regarding SoP terms is that there are entries for vegetable soup and pea soup, yet none for tomato soup, and there's bound to be translations for that. One nice touch I have just found is translations for soft-boiled egg and hard-boiled egg listed under boiled egg. DonnanZ (talk) 13:36, 12 August 2017 (UTC)
Inclusion of collocations is at most half of the solution. If, as User:DTLHS notes, "[a] 'good translation dictionary' necessarily describes complex polysemous English words", how do we improve our entries for such terms? Or is the current state of these entries good enough for translation work and for ESL learners, with native speakers mostly ignoring such entries anyway?
If we include more collocations, can we rely on the entries for collocations to share the burden of the definitions for verbs like go (go clubbing) and "particles" like abox and away?
Is it reasonable to admit that we can't really help those users who take a component-oriented approach to looking at sentences? Just as we say that users need help in determining where morphemes break in German and other compound nouns, we should also say that users can't be expected to know which meanings are only fully captured in collocations. Expecting users to wade through derived terms in go to find go clubbing does not seem very realistic. If a user knows to go to clubbing, that user probably doesn't need the go clubbing entry at all. DCDuring (talk) 16:22, 12 August 2017 (UTC)
I don't really object to making translation our focus. However, in order to be a truly comprehensive translation dictionary, I think we also have to be a comprehensive monolingual dictionary. And I don't think we're really doing as badly as you feel. Yes, we're a long way from being another OED, but we're also good enough that I'm able to use Wiktionary as my primary dictionary. Conversely, we actually suck at translations from English into other languages (even common ones like French and Spanish). I'm really not convinced it's our strength. FL to English translations tend to be much better, but even these are often lacking. The reality is that we're a work in progress on all fronts, and always will be.
Now, if we include collocations, I think we should handle them more or less as follows:
  1. Do not move definitions from main entries over to collocation entries (some duplication is fine, and people should still be able to find what the want in the main entry);
  2. Create separate entries for them, rather than hosting them in another mainspace or on the same page as any of their component words (we could potentially treat collocations like "forget about" differently from "piece of furniture", the latter having its own entry, the former sharing a page with "forget");
  3. Label them with a banner just as we do for phrasebook entries, to mark them as SOP and allow us to continue to function as a monolingual dictionary, regardless of our focus;
  4. Include full definitions in collocation entries, for clarity;
  5. Eliminate obvious information like pronunciation or etymology from collocation entries, but retain things like translations and synonyms;
  6. Link to collocations from the entries of each of their component words (excluding really basic words, like articles);
  7. Use either "Derived terms" or "Related terms" (possibly renamed) or a new "Collocations" section to host lists of collocations, subdividing the list into different categories as necessary;
  8. Allow collocations in all languages so that we can truly function as a translation dictionary: someone translating from French should be able to look up "pointe de pizza" or "se faire tuer" (or find these in the entries for pointe and pizza / faire, and tuer) and find the corresponding English collocations, "piece of pizza" and "get oneself killed".
I doubt we'll ever solve the problem of people taking a component-oriented approach to looking up multiword terms or collocations. But that doesn't mean we shouldn't try to be helpful to those who do know how to identify multiword expressions. The best we can do is list multiword terms and collocations in the entries for each of the constituent parts, and make long lists easier to navigate by splitting them up by category. Andrew Sheedy (talk) 17:46, 12 August 2017 (UTC)
I'd prefer it if we hosted collocations but they were not listed at constituent lemma pages and generally had close to zero incoming links to them. Crom daba (talk) 18:48, 12 August 2017 (UTC)
How would a person find them all then? Andrew Sheedy (talk) 21:50, 12 August 2017 (UTC)
  • One interesting aspect of mass inclusion of collocations as a new class of entry is that we would be substituting two boundaries that needed some kind of policing for one. Instead of a single include/exclude decision, we would need to decide whether to include or exclude and whether something was a collocation or an idiom. I am not confident that we would achieve any more agreement in total on these two decisions than we do now on one. Are we imagining that any collocation at all would be entered, subject to current RfV? live free or die? parlare con tono di condiscendenza? Wouldn't we be increasing the number of truly offensive items? Should we exclude full sentences that are not proverbs and not phrasebook entries? (More decisions!!!) DCDuring (talk) 19:31, 12 August 2017 (UTC)
Very true, although maybe we would be able to create a stricter set of criteria for determining whether something is SOP or not? I think a lot of RFD debates would be mostly solved if entries could be kept as collocations: those where terms are technically SOP, but not transparently so (sometimes because they use obscure senses of a word), and are not necessarily easily understood (e.g. "nature preserve"); those where an expression uses a more or less consistent word order and has become a fixed phrase; and those where the only justification for keeping an entry is its value as a translation target. I think most people could agree to keep such entries, but label them as collocations. Andrew Sheedy (talk) 21:50, 12 August 2017 (UTC)

IPA ≠ audio[edit]

Entries where the pronunciation is different from that in the audio, as for example in polemic, should be automatically detected and listed --Backinstadiums (talk) 20:25, 11 August 2017 (UTC)

How do you propose we do that? DTLHS (talk) 20:28, 11 August 2017 (UTC)
@DTLHS: Auto-generated subtitles could be created using some software and then compare both columns of data. 90% of the job would be done that way, the rest could be manually reported individually as @Wyang has proposed --Backinstadiums (talk) 07:12, 12 August 2017 (UTC)
You vastly overestimate the ease of generating written transcriptions, much less IPA, from audio files. Others can probably explain better why this is so difficult. See e.g. [4]. DTLHS (talk) 07:32, 12 August 2017 (UTC)
Not automatically, but perhaps via a more accessible feedback system: “Saw an error on the page? Report it here.” Wyang (talk) 21:52, 11 August 2017 (UTC)

Merging Category:Chinese language and Category:Sinitic languages to a single category (Category:Chinese language(s)?)[edit]

Sinitic languages is just another name for the Chinese languages. It is confusing to have both categories on Wiktionary. It seems there is room for improvement in the category system for macrolanguages; there are categories such as Category:Mandarin terms derived from Sinitic languages which really should be renamed to Category:Mandarin terms derived from other Chinese languages. Wyang (talk) 07:14, 12 August 2017 (UTC)

I agree that the current situation, in which we have two sets of categories for what is basically the same entity, is confusing. It would be hard to merge the two categories, however. x language is a category created by {{langcatboiler}} that uses data from Module:languages, while x languages is a category created by {{famcatboiler}} that uses data from Module:families. And currently only a language with a data file can have entries; a family cannot. I'm not sure how to merge the two in the existing system. And what code would we use for the combined entity? How can we make something be simultaneously a language and a family? — Eru·tuon 23:41, 12 August 2017 (UTC)

Language request: Old Kannada[edit]

Old Kannada (Kannada: ಹಳೆಗನ್ನಡ (haḷegannaḍa)) needs to be included. Proposed code: okn. It is a Dravidian language. Immediate ancestor: Proto-Tamil-Kannada. Scripts: Brahmi, Kadamba, Kannada. Descendants: Middle Kannada -> Modern Kannada. ɱɑɗɦɑѵ (talk) 07:28, 12 August 2017 (UTC)

That seems like a reasonable language to add. Can you give any examples or indication of how different it is from Kannada kn? (Other notes: Exceptional codes need to be formatted differently, so it would have to be dra-okn. We cannot add Kadamba script because it seems that it is not in Unicode. Proto-Tamil-Kannada is also not registered as a language.) —Μετάknowledgediscuss/deeds 07:33, 12 August 2017 (UTC)
@Metaknowledge: There's a significant difference between Old Kannada & its modern descendant. It's barely intelligible with modern Kannada. Some sound changes (like the transformation of Proto-Dravidian *p to Kannada [h]) are not present in Old Kannada. The case-suffixes are also different. As for the script, I hope it'll be acceptable to create lemmas in Old Kannada in the brahmi or the kannada script. -- ɱɑɗɦɑѵ (talk) 12:26, 12 August 2017 (UTC)
@माधवपंडित There seems to be a distinction between Old Kannada and Purva Halegannada. Should we encode them separately? DerekWinters (talk) 13:27, 12 August 2017 (UTC)
@DerekWinters: I saw that as well. About 500 years of time gap. I think Pūrva-Halegannada is what we'd call pre-Old Kannada or Proto-Kannada. But the matter source is small... as it is, Halegannada is poorly documented on the internet. If i'm not wrong Proto Kannada attestations are from just a few oldest inscriptions. Perhaps we can make Proto Kannada an etymology only language, used in etymology but cannot have lemmas of its own. -- ɱɑɗɦɑѵ (talk) 13:35, 12 August 2017 (UTC)
It can be like Primitive Irish or Pictish, attested from very few sources. Personally I think it better to add it separately. DerekWinters (talk) 13:39, 12 August 2017 (UTC)

A quick update on changes of translation adder[edit]

I have updated the gadget to fetch language scripts from the module. Also, it fails (gracefully of course) if the input script is not in the list of scripts from module. So, you may notice some functionality changes. Let me know if the changes are for the worse. Dixtosa (talk) 19:21, 12 August 2017 (UTC)

Is it anything to do with the annoying little +- signs that have popped up in translations sections? DonnanZ (talk) 19:30, 12 August 2017 (UTC)
Those were always there, but the spacing is off now ([5]) (Chrome) DTLHS (talk) 19:37, 12 August 2017 (UTC)
Yes. Fixed. Dixtosa (talk) 20:29, 12 August 2017 (UTC)
Also added the ability to hide the transliteration input if the language has automatic transliteration that overrides manual.--Dixtosa (talk) 13:08, 20 August 2017 (UTC)

Distinction between derived and related terms[edit]

It's been a long while since I did any serious editing here. I've been updating some of the derived words sections. I noticed that the section for "language#Derived terms" looked very sparse, so I added some more entries. After doing so, I saw that many of them were already in the Related terms section.

Has policy changed lately? My understanding has always been that Derived terms is for words formed by appending affixes ("metalanguage") and compounds ("dead language", "language lab") and that Related terms was reserved for words that are etymologically related in some other way ("linguistics", "lingua franca").

I notice that the Wiktionary:Entry_layout page doesn't make this very clear and doesn't give any examples. Perhaps it could be updated?

In the meantime, I'll tidy up Derived terms and Related terms for language, but please revert if this is no longer the way things are done.

Paul G (talk) 19:38, 12 August 2017 (UTC)

Technically a derived term is also a related term. So sometimes there are lists of terms that people have just put all together under "related terms" without distinction. Your understanding matches mine and your edits to language look fine. DTLHS (talk) 19:43, 12 August 2017 (UTC)
That, too, is my understanding of the distinction between the two terms. — SGconlaw (talk) 20:06, 12 August 2017 (UTC)
Confusion can also be caused be placing some derived terms under hyponyms and others under derived terms or related terms. Personally I would like to see hyponyms done away with - I can hear the protests already. DonnanZ (talk) 20:38, 12 August 2017 (UTC)
Thanks for the responses. There seem to be a number of pages where derived terms are words formed with affixes and related terms are compounds — rock, for example — so some editors at least seem to have thought this is what the sections are for. — Paul G (talk) 20:47, 12 August 2017 (UTC)

IPA policy[edit]

The (phonemic) English pronunciation keys in most of the major dictionaries (as well as the associated Wikipedia article use ⟨r⟩ as a standard phoneme. I feel that if this is the common usage it ought to be a standard policy across Wiktionary pronunciation sections. In many articles people have replaced ⟨r⟩ with ⟨ɹ⟩, ⟨ɚ⟩ et al. and while this is phonetically correct, it goes against the phenomic standard, and had created a disparate mess with little to no consistency. The best solution in my opinion is to just have both phonetic and phonemic pronunciations wherever possible, and make it a policy that ⟨r⟩ belongs in /r/ and ⟨ɹ⟩ belongs in [r], etc. This has the advantage of giving the maximum amount of information, while remaining in standard with M-W, Collins, etc. Any input would be appreciated. --Pariah24 05:07, 13 August 2017 (UTC)

I wonder if it is possible to create {{en-IPA}} to standardise the generation of IPA for English (and represent the dialectal variation; cf. International Phonetic Alphabet chart for English dialects). Having manual IPA on all 480,000+ English lemmas would be a logistical nightmare. Wyang (talk) 05:17, 13 August 2017 (UTC)
We had a discussion about this many years ago. At first I was in favor of using /r/ in the phonemic representation of English, but eventually I came around to the idea of using /ɹ/, chiefly because we are not an English-only dictionary. If we were, if English Wiktionary had only English entries, I still would prefer /r/; but because we have entries in thousands of languages, including ones where /r/ really does stand for [r], I think it's ultimately less misleading to use /ɹ/ for English. —Aɴɢʀ (talk) 07:39, 13 August 2017 (UTC)
I agree with Angr. If we use /r/ for [ɹ], readers seeing /r/ in other languages might mistakenly believe that they represent the same, or similar sounds. We fill a different niche than other English dictionaries, and as a result, our policies might differ in some areas. Andrew Sheedy (talk) 17:14, 13 August 2017 (UTC)
I agree with Angr and Andrew. Since most English speakers pronounce ⟨r⟩ as [ɹ], it's appropriate to use /ɹ/. I'd second Wyang on creating an English IPA template. — justin(r)leung (t...) | c=› } 19:28, 13 August 2017 (UTC)
There's no need for a separate template- normalization can take place in the IPA module. DTLHS (talk) 19:29, 13 August 2017 (UTC)
How can one implement a module without a template, exactly? —Aryaman (मुझसे बात करो) 20:39, 13 August 2017 (UTC)
Huh? All IPA is already processed through Module:IPA. All we would need to do is implement specific rules for English. DTLHS (talk) 20:47, 13 August 2017 (UTC)
@DTLHS: I would assume we would make MOD:en-IPA and implement it in {{en-IPA}}, just like every other language with an IPA module. —Aryaman (मुझसे बात करो) 23:03, 14 August 2017 (UTC)
What about English dialects? How do we ensure the correct symbols are used and symbols are used in a consistent manner, for say, RP? Wyang (talk) 21:34, 13 August 2017 (UTC)

──────────────────────────────────────────────────────────────────────────────────────────────────── I'm more concerned with just having a standard to go by than what that standard is. I guess I'll start changing /r/ when I see it, although I still think it would be helpful—especially on pronunciations that differ significantly from the standard phonemes—to have separate /phoneme/ and [phone] pronunciations. Pariah24 (talk) 23:41, 13 August 2017 (UTC)

What do you mean by "pronunciations that differ from the standard phonemes"? — Eru·tuon 23:55, 13 August 2017 (UTC)
Sorry if that's an awkward way to put it...I mean pronunciation differences in accents/dialects, and loanwords, and cases like pun and spun whose pronunciations are both /(s)pʌn/ but phonetically are [pʰʌn] and [spʌn]. It would be helpful to someone learning English to have both versions. Pariah24 (talk) 00:11, 14 August 2017 (UTC)
Ahh, I see. Phonetic transcriptions showing the exact pronunciation of stops are welcome. I think there are some transcriptions like that already. As for accents, keep in mind that many dialectal features are phonemic, because dialects do not all share the same phonological system, and so we show them in the phonemic transcriptions. You can see examples in Appendix:English pronunciation. Not shown on that page are the phonemic transcriptions for American English dialects without the horsehoarse merger. (See hoarse for an example.) — Eru·tuon 00:38, 14 August 2017 (UTC)

Regarding whether to create a separate module and template for English IPA transcriptions: I think it would be much neater than adding a lot of English-specific stuff to Module:IPA. I say a lot, because I think it would be a good idea to automatically convert between different transcription systems for RP, if we could get someone who knows enough about them. For instance, automatically displaying both the OED's more old-fashioned transcription of lot, /lɒt/, and Geoff Lindsey's more modern one, /lɔt/. @Mr KEBAB proposed something like this, but I haven't done anything with the idea yet. — Eru·tuon 00:02, 14 August 2017 (UTC)

@Erutuon: Yes I did, but if we're going to use Lindsey system here we should fully follow it, not cherry-pick some of the symbols and not others (I'm saying this because I believe I proposed a mixed system a year ago, this is not a good idea for several reasons). Mr KEBAB (talk) 00:40, 14 August 2017 (UTC)

I think it would be very nice to have something similar to what we have for Ancient Greek and Latin, with different regional or period pronunciations all indicated. The template input for English would obviously have to be the broad phonemic transcription, though, rather than the spelling of the word. We'd have to be careful, however, of cases where pronunciation variants actually represent different phonemes rather than differences of realization; in such a case we'd probably need to call the template multiple times on the page, each time with a separate phonemic transcription and corresponding generated phonetic transcriptions for various dialects – there would be parameters for which dialects/variants to include or not include. – Krun (talk) 00:51, 14 August 2017 (UTC)

Hi,
An input from outside. In French Wiktionary, we had a large discussion on pronunciation and neutrality two years ago and we renewed our policy. We started by defining that phonetic information have to be based on audio recordings and have to be several to describe variety (in space, time, social groups). A phonemic information have to be based on a specific analysis, made on a specific dialect and can't stand for a whole language. There is a diversity of phonemic representation. To be neutral on this perspective is not to select and promote one analysis (equal one variety) but to give the different analysis, with sources. So: phonetic with audio sources, phonology with written sources (linguistics piece of work).
Finally, we consider the needs of the readers, and we consider they do not need dozen of phonetics and dozen of phonological representations. They want a short information, giving a usual way of pronouncing a word, consensual, as unmarked as possible, and we created a third way to indicate this specific information, with backslash signs like \θis\. This last one is provided in the first part of the page, and the other ones on the second part of the page, for people eager to have more precise information. It was quite not a huge change, but a great improvement in the frame it offer for people to add new information without colliding with existing ones. Less controversies on "false phonological representation" and more accurate descriptions. If you want to know more about this, I can help you, or translate some pieces of French Wiktionary policy Face-smile.svg Noé 10:16, 15 August 2017 (UTC)
I think Wiktionnaire has a good system. I find it interesting how the very broad pronunciation is included in the header, but I don't like how more detailed pronunciation is relegated to the bottom of the entry and often neglected. Having a very broad transcription with everything else in a collapsable box might be a good solution. On the other hand, it would be hard to decide what transcription to use when a word has been affected by a merger or split in many dialects. Andrew Sheedy (talk) 16:07, 18 August 2017 (UTC)

Words with uncertain reading[edit]

Recently the Egyptian entry jsqꜣrwnj

i s q A rw
n
y
T14
N25

was added alongside our previously existing entry jsqꜣrnj

i s q A r
n
y
T14
N25

But these aren’t in fact two different attestations with two different spellings; they’re both representing a single attestation from the Merneptah Stele, where the original engraver inscribed a hieroglyph very poorly and modern authors have proposed two different readings of what it was intended to be. Do we have any policy about what to do in such a case — keep only the more plausible/widely accepted entry? Keep both? (And, if so, what would they be marked as? Alternative forms, even though they really aren’t?) — Vorziblix (talk · contribs) 10:01, 14 August 2017 (UTC)

Perhaps create an "alternative reading" template, and use it in the entry for the less widely accepted reading. Then list the less widely accepted reading in the Alternative forms section for the more widely accepted form. — Eru·tuon 16:30, 14 August 2017 (UTC)
Sounds good. For now I’ll just do {{form of|Alternative reading}} rather than an altogether new template, but if more of these start cropping up, so that categorizing them becomes useful, I’ll go for a separate template. — Vorziblix (talk · contribs) 23:18, 14 August 2017 (UTC)
Thanks, that clears things up a lot. — Vorziblix (talk · contribs) 23:18, 14 August 2017 (UTC)
Another example is ᚐᚆᚓᚆᚆᚈᚈᚋᚅᚅᚅ / ᚐᚆᚓᚆᚆᚈᚈᚐᚅᚐᚅ. In most cases, it's possible to be reasonably certain how to read an inscription, but when it's not (in individual cases), the practice does seem to be to have multiple (cross-linked) entries. Whether or not it is sensible for one of the entries to be a "form of" redirect to the other entry depends on whether the difference in reading entails a difference in meaning. - -sche (discuss) 06:30, 15 August 2017 (UTC)

Flag gadget edit request[edit]

Could an admin change the URL for the Ancient Greek flag in MediaWiki:Gadget-WiktCountryFlags.css from Flag_of_Palaeologus_Dynasty.svg to Byzantine_imperial_flag,_14th_century,_square.svg? The file has been moved, and there's been an error message in the browser console because the CSS file tries to load the file using the old name. — Eru·tuon 17:41, 14 August 2017 (UTC)

DoneDixtosa (talk) 17:55, 14 August 2017 (UTC)

What's the deal with the garbage "American Sign Language" entries?[edit]

Is there an editing tool that produces these, maybe with ASL as the first in a list of languages? I don't think it's a single vandal producing all of them. DTLHS (talk) 20:41, 14 August 2017 (UTC)

@DTLHS: Do you have a link or diff? —Justin (koavf)TCM 21:09, 14 August 2017 (UTC)
People often create fully-formed ASL entries with all the usual headings, but with no actual content or definition. Yes, there is a tool that creates these, but I can no longer find it. I've seen it before. Equinox 21:11, 14 August 2017 (UTC)
  • It's the New Entry Creator; one of its defaults is ASL. —Μετάknowledgediscuss/deeds 21:35, 14 August 2017 (UTC)
    Actually, it's the second-from-the-top entry template, on the search results page, not the New Entry Creator. There should really be an AbuseFilter to take care of those. --Yair rand (talk) 01:21, 21 August 2017 (UTC)

@DTLHS: how would you improve them? --Backinstadiums (talk) 22:12, 14 August 2017 (UTC)

I don't think you understand. They're contentless entries that are deleted on sight. —Μετάknowledgediscuss/deeds 22:16, 14 August 2017 (UTC)

Adding language code 'ghc'[edit]

Hi all, I am thinking it might be useful to add the code 'ghc' for the historic common written language of Ireland and Scotland, particularly in cases where it's not clear whether a word derives from Irish or Scottish Gaelic. Gherkinmad (talk) 21:33, 14 August 2017 (UTC)

I don't know what lect you are referring to or when it was used. We have codes for Old Irish (sga) and Middle Irish (mga), and those should suffice. —Μετάknowledgediscuss/deeds 21:37, 14 August 2017 (UTC)
I can kinda see the point. While, technically, Scottish Gaelic can be seen to be differentiating itself from Irish as early as the Book of Deer, for pretty much the entire Middle Ages you can't really tell between them. And everything after 1200 is currently classified as either ga or gd. So a Classical Gaelic could be seen as a useful intermediary step:
  • pgl Primitive Irish (–c.600)
    • sga Old Irish (c.600–c.900)
      • mga Middle Irish (c.900–c.1200)
        • ghc Classical Gaelic (c.1200–c.1800)
          • ga Modern Irish (c.1800–)
          • gd Scottish Gaelic (c.1800–)
That would require some refactoring, though. It would make etymologies slightly less messy: as it is, there appears to be an issue with taking a gd word back through ga to mga. This way, they could both branch from ghc. --Catsidhe (verba, facta) 21:55, 14 August 2017 (UTC)
Do we have a resolution? Gherkinmad (talk) 23:08, 14 August 2017 (UTC)
Resolution? We barely have the start of a discussion! Also, this sort of thing has been suggested before (by me at least once, IIRC) and not happened, so maybe a wider debate will have some impact. --Catsidhe (verba, facta) 23:26, 14 August 2017 (UTC)
(Without expressing an opinion on whether this is a good or bad idea,) it would be possible to add 'ghc' as an "etymology-only language" so that etymologies could refer to it, even if we don't want to add it as a "full language" with its own entries / language sections (which might duplicate many mga and ga entries?). - -sche (discuss) 06:46, 15 August 2017 (UTC)
  • @Angr is the expert, and he hasn't voiced a need for this as far as I've seen. But I'd like his thoughts. —Μετάknowledgediscuss/deeds 03:57, 15 August 2017 (UTC)
    For reference, the code was removed following this discussion in 2013. (I have no great knowledge of the subject and defer to people like Angr and Catsidhe who are familiar with the Irish language(s).) - -sche (discuss) 06:37, 15 August 2017 (UTC)
    My views haven't changed since that 2013 discussion. I think mga, ga, gd, and gv are sufficient to cover all Goidelic lects from the 10th century to today. The problem with making it an etymology-only language is that etymology-only languages are varieties of one particular existing language, but the whole motivation behind ghc is to avoid calling it either Irish or Scottish Gaelic (because it's basically both). —Aɴɢʀ (talk) 08:33, 15 August 2017 (UTC)
    For the purpose Catsidhe is talking about, it seems like it could be considered a variety of Middle Irish... but then, I don't see why branching Scottish Gaelic and Irish from ghc is any better than branching them both from mga, or why branching them from mga like we do now causes "an issue" — Catsidhe, can you explain? - -sche (discuss) 09:39, 15 August 2017 (UTC)
    No one considers Middle Irish going as late as 1800, though. Middle Irish is generally seen as ending around 1200 (much earlier than Middle English, for example), so we consider everything after that to belong to one of the modern languages, even though the literary language (as opposed to the colloquial language) is virtually identical in Ireland and Scotland until around 1800. —Aɴɢʀ (talk) 11:55, 15 August 2017 (UTC)
    Which is why I find the distinction between Early Modern Irish and Early Scottish Gaelic (to 1800) to be annoyingly artificial. There is nothing linguistic which distinguishes just about any given 14C Irish from 14C Scottish. The only way you can tell in most cases is by knowing beforehand where or by whom it was written.
    Also, having ga cover 800 years of history makes it tricky to use for both historical research and for current usage. Unless you're paying attention, it can be easy to miss that one word became moribund in the 16C, and another entered the language in the 1980s. The former case isn't going to help if you're writing a letter to someone in Gaoth Dobhair, the latter isn't going to help if you're doing Mediaeval research. --Catsidhe (verba, facta) 12:16, 15 August 2017 (UTC)
    Yes the motivation is to avoid calling the language either Irish or Gaelic, because in the case of the English word Gael we simply don't know which variety it came from, and we might be a little more honest if we simply said so. The OED has the word first in modern English in 1775/1810 from Scottish Gaelic, but I completely accept that the word might have a longer history in the language, and so I was thinking we could meet Angr halfway by saying it derives from Classical Irish/Gaelic, or otherwise simply that it derives from Middle Irish. Gherkinmad (talk) 16:48, 15 August 2017 (UTC)
    @Angr OK, the matter has been more or less resolved. However I would still advocate the ghc code for cases where there is a further intermediary stage, otherwise there are three words for Gael in modern Irish: Gaoidheal, Gaedheal and Gael, all covering the same time period. Any thoughts? Gherkinmad (talk) 16:03, 16 August 2017 (UTC)
    There will have to be three entries for modern Irish anyway, since the spellings Gaoidheal and Gaedheal were used up until the mid-20th century, long after ghc would be over. That's the main reason for my opposition to ghc: it would increase unnecessary redundancy. If we had it, we would have to have Gaoidheal in both ga and ghc instead of just ga; likewise we would have to have new ghc entries for common words whose spellings haven't changed, like fear, bean, mac, , , athair, máthair, and so on and so forth. It doesn't seem worth it to me to duplicate the effort. —Aɴɢʀ (talk) 16:15, 16 August 2017 (UTC)
    @Angr, @-sche Could we add ghc as an etymology-only language? Because as the matter stands, one would have to say that the modern Irish word is first attested in print in 1567 in Scotland, without further explanation. I just don't see a way to credit this properly without referring to a further intermediary stage of the language which is of course still Irish. Gherkinmad (talk) 16:53, 16 August 2017 (UTC)
    As was said before, etymology-only languages always have a parent language that they belong to. This parent is used, for example, in determining which section should be linked to. So which language does ghc belong to, Irish or Scottish Gaelic? It doesn't solve the problem at all, just moves it. —CodeCat 18:29, 16 August 2017 (UTC)
    It belongs to Middle Irish: Scottish Gaelic and (Late) Modern Irish could both branch from ghc whose parent language is mga. I'm sorry for pressing this so much, I just know you really have to make your case if you want to edit Wiktionary. Gherkinmad (talk) 19:28, 16 August 2017 (UTC)
    If ghc's parent language is mga, then this carries the implication that all ghc terms are mga terms. Every link to a ghc term in fact creates a link to a mga entry because of how the parent of an etymology language works. Since every link should have an entry behind it, it implies that any links in etymologies to ghc terms attested from 1200 to 1800 are implicit requests for Middle Irish entries to be created on those pages. So, you want us to create Middle Irish entries for terms attested as late as 1800? —CodeCat 20:05, 16 August 2017 (UTC)
    If that's what will eventuate no I don't, though this does need further discussion, as it's clear not everyone accepts the current policy as it is. Gherkinmad (talk) 21:03, 16 August 2017 (UTC)

Tagalog enclitic forms[edit]

In Tagalog, any word ending in "a, e, i, o, u, or n" has an enclitic form (sort of). For example, the word "malaki" (big), to say a "big person" one says "malaking tao", adding an "ng" at the end. And that goes for adjectives, nouns, verbs, all words. The question is, do we make an entry of the enclitic form for all the words in Tagalog that has them? --Mar vin kaiser (talk) 10:51, 15 August 2017 (UTC)

It sounds a bit like English -'s or Latin -que, i.e. a clitic that can be added to virtually anything. And we don't have entries for person's or virumque, so I'd say we shouldn't have an entry for malaking either, but just one for malaki and one for -ng. BTW, how do words ending in other sounds behave? —Aɴɢʀ (talk) 11:58, 15 August 2017 (UTC)
@Angr: Well, for example, the word "maliit" (small), to say a "small person" would be "maliit na tao". Actually some see the word "malaking" to be a contraction of the word "malaki" and the word "na" which links words together. One problem is that for example, the word "taong", it could mean four things,
  1. "taong" - a black veil for mourning
  2. "taóng" - water container (we don't write diacritics to indicate stress in Tagalog, so both are under the same entry)
  3. "taong" - the word "tao" (person) + "na"
  4. "taóng" - the word "taón" (year) + "na"
So my point is, shouldn't the last two be in the entry "taong" also? --Mar vin kaiser (talk) 13:23, 15 August 2017 (UTC)
@Mar vin kaiser: Well, look at butcher's: it has several meanings of its own, but the transparent one of butcher + the clitic -'s isn't actually listed. —Aɴɢʀ (talk) 13:42, 15 August 2017 (UTC)
@Angr: Good point. Although, the entry it's has it. But I do see your point. --Mar vin kaiser (talk) 13:45, 15 August 2017 (UTC)
@Angr: The reason why I feel it's important is because for example, any two words that are beside each other, the first one has to be in enclitic form, and think of the number of entries that have two words. For example, "free will" is "malayang loob", but there won't be any entry for "malayang", only "malaya". And that would go for all the other entries that have two words. --Mar vin kaiser (talk) 13:59, 15 August 2017 (UTC)
@Mar vin kaiser: It's probably at it's because in the standard written language, the one thing it's isn't is it + the possessive -'s, but only it + the contracted verb -'s. As for the headword line, that's not a problem. At the entry for malayang loob, just add |head=[[malaya]][[-ng|ng]] [[loob]] to the headword template. —Aɴɢʀ (talk) 14:13, 15 August 2017 (UTC)

Checkusers[edit]

Versageek has been inactive for more than a year, so per the WMF policy her checkuser rights have been revoked. The policy requires that any local wiki have two or more checkusers if they have any, my rights have been suspended as well pending our electing another. We can opt not to bother having local checkusers and simply rely on the stewards to take care of requests, or we can nominate one or more new checkusers and have some elections.
From my perspective it is not strictly necessary to have local checkusers, but it is convenient. Almost all of the work these days is keeping track of and blocking the long-term pests, and making sure we are actually blocking Wonderfool when we think we are. - TheDaveRoss 12:59, 15 August 2017 (UTC)

Having local checkusers is definitely a good thing. I'm surprised WF hasn't made any votes to encheckuserify anyone. —Μετάknowledgediscuss/deeds 16:57, 15 August 2017 (UTC)
"Encheckuserify"? Beware, lest you affixiate. — Kleio (t · c) 20:02, 15 August 2017 (UTC)
I thought User:Chuck Entz was a checkuser, since he does a good job of keeping track of the IPs/locations of various vandals. He seems like a good candidate for the position. - -sche (discuss) 19:47, 15 August 2017 (UTC)
Oddly enough, I probably wouldn't have as much to say if I were a checkuser, since I understand there are fairly strict rules about what information obtained with the checkuser tools can be disclosed and when you can use them. Right now, I get pretty much all my information from geolocating just about every IP that does something out of the ordinary and looking for patterns (that and monitoring the abuse filter logs). I'm not sure what I would be allowed to say/do if I spotted an IP that had earlier turned up in a checkuser investigation (though I could probably block them). That said, I'm game, if everyone thinks it's a good idea. Chuck Entz (talk) 02:43, 16 August 2017 (UTC)
I actually think Chuck is a great candidate, but I was under the impression that we had an old (unwritten?) rule that no one user should have all the user rights at en.wikt simultaneously. —Μετάknowledgediscuss/deeds 04:19, 16 August 2017 (UTC)
You both bring up good reasons for pause. Who else wants the job? We could nominate WF; then he'd have to ID himself to the Foundation to get the flag... ;) lol - -sche (discuss) 05:10, 16 August 2017 (UTC)
I think Chuck is a great choice as well. Re "having all the rights", I don't see a problem there. Our 'crats have a fairly limited scope of responsibility which doesn't much change how they might be able to (ab)use the CU tools. This is a different story than other wikis which have roles such as ombudsmen, abrcom, etc.
Re limiting your ability to act, I have not found that to be a problem. In the cases where an anonymous contributor is connected to a previously blocked logged-in account you may have to be somewhat oblique (e.g. not using the name of the blocked account, just saying that they are evading a block) but that is actually a fairly rare situation. - TheDaveRoss 14:56, 16 August 2017 (UTC)
Probably better to have some local ones. Equinox 19:49, 15 August 2017 (UTC)
Local is good, but what about Chuck's stated concerns. Where is it written that checkusers can't disclose publicly available info? Who can be asked about this? DCDuring (talk) 04:27, 16 August 2017 (UTC)
Perhaps we can create a new class of superuser: "Chuckuser". DCDuring (talk) 04:28, 16 August 2017 (UTC)
lol! - -sche (discuss) 05:10, 16 August 2017 (UTC)
@DCDuring: The policy dictating the use of the tool is here, and is also governed by the privacy policy and the access to nonpublic information policy. There are lots of words there, but essentially it is OK to talk about publicly available information, and it only gets tricky when your interpretation of public information is affected by nonpublic information. - TheDaveRoss 15:02, 16 August 2017 (UTC)
So Chuck's concerns are in maintaining a "Caesar's wife" standard, probably appropriate. DCDuring (talk) 22:04, 16 August 2017 (UTC)

For what it's worth, I have these user rights on Wikispecies, so I am already vetted by the WMF. I would be willing to have those tools here. —Justin (koavf)TCM 05:12, 16 August 2017 (UTC)

Metaknowledge started a vote for Koavf, and I made a comment on the discussion page there suggesting that we also vote on admin status at the same time. - TheDaveRoss 12:35, 21 August 2017 (UTC)

I would like to become a checkuser. --Daniel Carrero (talk) 07:13, 16 August 2017 (UTC)

DI CheckUser. PseudoSkull (talk) 22:09, 16 August 2017 (UTC)
@PseudoSkull, I don’t think we accept most of the specialized terminology and abbreviations used by Wikipedia/Wiktionary here, such as CheckUser, RfV, RfD, and so on, but we put them in the Wiktionary:Glossary. —Stephen (Talk) 22:27, 16 August 2017 (UTC)
If there are 3+ external citations, I would disagree. PseudoSkull (talk) 22:28, 16 August 2017 (UTC)
But let's discuss that elsewhere. Perhaps in WT:TR so that the discussion at hand can continue. PseudoSkull (talk) 22:28, 16 August 2017 (UTC)

Review of Ecjklangs (talkcontribs)' contributions[edit]

Most of these sex-related entries appear only in Urban Dictionary (OneLook backs me up on this), but some entries - such as sexcess - are somewhat citable. Not sure how durable they are though (I mean, floorcest anybody?). Anyone in the mood for a look-through? --Robbie SWE (talk) 08:47, 16 August 2017 (UTC)

Translations added by IvanScrooge98 (talkcontribs)[edit]

This erroneous edit by User:IvanScrooge98 in Recent Changes attracted my attention. A quick check of their recent additions of Chinese translations shows that he/she is certainly a non-speaker of Chinese. A large proportion of their added Chinese translations were outright incorrect, others often problematic. Some recent, outright erroneous examples include: diff, diff, diff, diff, diff, diff. It's a shame that such sloppiness was not picked up earlier and was allowed to persist for such a long time. Their additions of translations in other languages also need to be thoroughly checked. Wyang (talk) 10:23, 16 August 2017 (UTC)

Hmmm… excluding the first one (from zh.wiktionary), I based the other edits, as I usually do, on the respective Wikipedia articles. I'm sorry if there's something wrong and willing to fix my mistakes. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 14:32, 16 August 2017 (UTC)
You should be careful when using non-English Wikipedias as a source because they are full of made-up garbage. Every now and then I have to remove a Portuguese translation that you add because it doesn’t meet our attestation criteria or are inaccurate. There’s no harm in using Wikipedia as a starting point when researching translations, but you should at least check Google Books. — Ungoliant (falai) 14:48, 16 August 2017 (UTC)
Guess I should more when I can. Sorry. [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 15:14, 16 August 2017 (UTC)
Most of the errors in Chinese translations are in your inferred Pinyin readings and traditional/simplified forms; these are more serious factual errors. Please see if you can fix the examples above, now knowing that they contain errors. Many of your added Chinese translations are sum-of-part terms which do not warrant inclusion on Wiktionary, but that is less serious of a problem. Wyang (talk) 00:59, 17 August 2017 (UTC)
@Wyang: is diff fine, for instance? [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 13:40, 17 August 2017 (UTC)
It's better, though both terms are SoP and should link to the individual components. Could you please also fix the six others? Wyang (talk) 21:54, 17 August 2017 (UTC)
@Wyang: would you mind check if my attempts are correct? Also, should I undo my additions at Warwick and Portoferraio? [ˌiˑvã̠n̪ˑˈs̪kr̺ud͡ʒʔˌn̺ovã̠n̪ˑˈt̪ɔ̟t̪ːo] (parla con me) 10:22, 18 August 2017 (UTC)
Not really, you haven't fixed the factual errors on those pages. It's all right- I will fix them. Please do not add other Chinese translations. Wyang (talk) 10:26, 18 August 2017 (UTC)

definitions vs. predicates[edit]

The entry for alt-right and the Tea Room/2017/August#alt-right discussion are recent manifestations of a failure to respect the concept of a definition. We do know how to do so, but sometimes some contributors act as if they believe that any predicate about a definiendum that they or someone else puts down in writing is a potential definition.

"Headquarters of US military imperialism" is not a definition of Pentagon, whether or not you believe the truth of "The Pentagon is the headquarters of US military imperialism".

What do we have to do to see to it that this basic notion of lexicography is respected? Would voting on a policy help? A definition style guide? DCDuring (talk) 22:02, 16 August 2017 (UTC)

Statistics for numbers of etymologies[edit]

For those of you who wanted to know, as of August 16, 2017, the largest number of etymologies any entries on the English Wiktionary has is 15. That entry is zꜣ. (Wouldn't it be amazing if you ever saw "Etymology 27", "Etymology 9432", etc. LOL ROFL LMAO) PseudoSkull (talk) 01:57, 17 August 2017 (UTC)

"Etymology 4320603, Etymology 4320604" PseudoSkull (talk) 01:58, 17 August 2017 (UTC)
You're ready for Wikidata Face-smile.svg Noé 15:29, 17 August 2017 (UTC)
Quite impressive! — Eru·tuon 19:11, 17 August 2017 (UTC)

AWB Rights[edit]

I'd like to use AWB on Wiktionary, mainly to run typo fixes, and a regex I made to split long See Also/Related terms/etc sections into columns, and to be able to search inside templates, and to dump Wiktionary data offline for faster searching and whatnot. I already have rights on English Wikipedia. Pariah24 (talk) 19:08, 17 August 2017 (UTC)

I notice that no one has even nominated you for autopatroller status, which means that all your edits are marked for review. It seems silly to have people not trusting your edits enough to stop checking all of them, but at the same time giving you the ability to make them in bulk. Chuck Entz (talk) 02:36, 18 August 2017 (UTC)
DI AWB (wiki sense). PseudoSkull (talk) 04:44, 18 August 2017 (UTC)
I really don't care if people review my edits, and the AWB policy makes no mention of that as a prerequisite. I've been editing Wikipedia for quite a while longer than I've had this account, and this "we don't trust your edits" business sounds pretty anti-AGF to me. Never had an admin say something like that to me before; do you speak this way to everyone? A simple no would have sufficed. Pariah24 (talk) 08:53, 18 August 2017 (UTC)
I am sure Chuck was not intending to offend. We have the edit patrol feature enabled here, and the general practice is that once someone has been editing for a little while the people who patrol edits notice that they make reasonable edits and don't need to be patrolled any longer. If you are not yet set to autopatrolled status it may indicate that you have not edited here sufficiently (or, sometimes, sufficiently well) to have been noticed and flagged by a patroller. I would suggest that you just continue making good edits and I am sure you will be autopatrolled an eligible for AWB in no time. - TheDaveRoss 12:21, 18 August 2017 (UTC)
Thank you. Somehow I managed to give the impression that we're so suspicious of them that we have them under surveillance, or that we think there's something wrong with their edits, or that we only talk to the cool people who already know the secret handshake... The simple fact is that AWB access requires that we know the contributor in question well enough to be sure they know Wiktionary's standards and practices well enough to avoid making mistakes, since those mistakes would be propagated much more widely using the AWB tool, and that we just don't know them well enough- yet. Chuck Entz (talk) 08:41, 20 August 2017 (UTC)

Tsolyáni language[edit]

Do we include words in this fictional language? I'm working my way through some missing French nouns and came across zaqé which our French friends define as "Troisième jour de la semaine dans le calendrier tsolyáni". SemperBlotto (talk) 05:48, 18 August 2017 (UTC)

No, see Wiktionary:Criteria_for_inclusion#Constructed_languages. DTLHS (talk) 05:50, 18 August 2017 (UTC)

Using HTML attributes instead of classes for WT:ACCEL[edit]

Currently, WT:ACCEL has data passed to it using CSS classes, so that the resulting Wikicode looks like this on bar: <span class="form-of lang-en plural-form-of"><b class="Latn" lang="en">[[bars]]</b></span>. There's a few points to note about this.

  1. There's two wrapping HTML elements, span and b, even though these could easily be combined into a single b element, as long as WT:ACCEL is modified to recognise not just span elements.
  2. If step 1 is done, then there is no more need for the lang-en CSS class, because WT:ACCEL can extract it directly from the b element's lang= attribute.
  3. HTML allows you to specify custom attributes named data- followed by any text. We can use this, rather than CSS classes, to specify the inflectional data.

All in all, the line above would end up looking like this: <b class="Latn" lang="en" data-accel-form="plural">[[bars]]</b>.

What do people think of this change? @Dixtosa in particular. —CodeCat 12:54, 19 August 2017 (UTC)

Looks a bit cleaner. Equinox 12:56, 19 August 2017 (UTC)
Looks cleaner, yes, but I do not see any other benefit... yet. --Dixtosa (talk) 13:21, 19 August 2017 (UTC)
I very much like this, even if there are no benefits besides the neatness. — Eru·tuon 23:46, 19 August 2017 (UTC)
@Dixtosa Can MediaWiki:Gadget-WiktAccFormCreation.js line 20 be modified to $('.form-of a.new').each(function(){, and line 23 to var formof_classnames = $(this).closest(".form-of")[0].className.split(' ');? This will allow elements other than span to contain the acceleration information, which facilitates step 1. —CodeCat 11:22, 20 August 2017 (UTC)
Step 1 has been completed, and the link on bar now looks like this: <b class="Latn form-of lang-en plural-form-of" lang="en">[[bars]]</b>. Step 2 can now be implemented. It might be as simple as putting something else in line 76 of MediaWiki:Gadget-WiktAccFormCreation.js, but I'm not sure what. The way the code is currently written, it only passes the classes to the function (the details parameter), not the wrapper element itself. Would replacing line 76 with lang: $(link).closest("[lang]").attr("lang"), be sufficient? Perhaps the code should be restructured so that the element itself is passed around instead of only the classes, but I will leave that to Dixtosa to implement. —CodeCat 18:34, 20 August 2017 (UTC)

Disambiguate WS entries by language[edit]

So wikisaurus:juoppo -> wikisaurus:fi:drunkard, wikisaurus:drunkard/Finnish or wikisaurus:drunkard/fi and

wikisaurus:insane -> wikisaurus:en:insane, wikisaurus:insane/en or wikisaurus:insane/English

Whether we use English or native words in the pagetitle, collisions would quickly happen as soon as someone added non-English words (which they may have refrained from out of uncertainty). I personally prefer the first scheme, because it is similar to what we use for topical categories like Category:de:Graph theory and because it does not imply the existence of useless superpages (parent page? root page? the opposite of a subpage). As for using native versus English words: the WS entry is tied to meaning seen as abstract from specific words, so I do not see why we should not use English. Are there any large synonym groups that cannot be succinctly expressed in English?__Gamren (talk) 13:23, 19 August 2017 (UTC)

I prefer Wikisaurus:English/drunkard, following the same scheme as Rhymes pages. —CodeCat 13:26, 19 August 2017 (UTC)
I prefer to keep the current Wikisaurus setup for its simplicity until it becomes obvious that collisions are an actual problem. --Dan Polansky (talk) 14:02, 19 August 2017 (UTC)
Here are some strings that might be expected to have many synonyms in more than one langauge: god (Danish/English), person, gut (Nynorsk/German), pen (Welsh/Norwegian/Mindiri/Mapudungun). Is it obvious yet?__Gamren (talk) 14:18, 19 August 2017 (UTC)
From what I have seen, collisions have not become an actual problem yet. Currently, we cater for collisions by being setup for multiple languages per Wikisaurus page, on the model of the mainspace. If you start expanding Danish part of Wikisaurus and you run into obstacles preventing you from productively expanding that part, we can see how to best remove them. --Dan Polansky (talk) 14:27, 19 August 2017 (UTC)
For reference, one of the subject home pages: Wiktionary:Wikisaurus#Multilingualism. One past discussion: Wiktionary:Beer_parlour/2009/March#Wikisaurus_-_non-English_entries - here, a suggestion was made that would lead to wikisaurus:fi:juoppo. --Dan Polansky (talk) 14:39, 19 August 2017 (UTC)
I have now edited WS:god and created WS:da:beautiful (I would be fine with CodeCat's suggestion above, as well).__Gamren (talk) 18:25, 19 August 2017 (UTC)
I support having pages like Wikisaurus:English/drunkard, per CodeCat. It would be consistent with rhymes and reconstruction pages.
--Daniel Carrero (talk) 12:42, 21 August 2017 (UTC)

employment category?[edit]

Do we have a category for employment related terms like job title, trade union, severance pay, etc.? This would eminently useful IMO. ---> Tooironic (talk) 02:16, 20 August 2017 (UTC)

English names for letters of the Arabic language[edit]

Do we include these? Our page Arabic script has a table of them, but they link to the Arabic letters themselves. SemperBlotto (talk) 04:47, 20 August 2017 (UTC) (I've just added the French zhâl - hope it's OK)

Category:en:Arabic letter names DTLHS (talk) 04:48, 20 August 2017 (UTC)
So my French term seems to be wrong - I can't figure out how to correct it. SemperBlotto (talk) 04:51, 20 August 2017 (UTC)
What do you mean wrong? DTLHS (talk) 04:54, 20 August 2017 (UTC)
We're probably missing the English names of some letters if you're concerned that it's a red link. DTLHS (talk) 04:56, 20 August 2017 (UTC)
Also some of the entries currently in Category:en:Arabic letter names are not letters; they are Arabic diacritics. Wyang (talk) 04:57, 20 August 2017 (UTC)
OK, I'leave it alone - totally outside my comfort zone. SemperBlotto (talk) 05:00, 20 August 2017 (UTC)

Edits by 217.76.10.22[edit]

This IP user has been adding "Ancient Armenian" (= Old Armenian) terms as etymons for Modern or Ancient Greek terms, while deleting the old etymologies, as well as some other things. The etymologies are dubious: for all of them, because the terms were attested before the time of Old Armenian, and doubly so for some, because they are phonologically implausible (առասպել (aṙaspel) supposedly yielding μῦθος) or the etymology actually goes the other way (ῥινόκερως was calqued by ռնգեղջիւր (ṙngełǰiwr)). Not sure what to do here besides revert the edits, which I've done. It would probably be better manners to explain to him or her, but I don't feel like it. — Eru·tuon 06:42, 20 August 2017 (UTC)

I reverted an edit by this same idiot (using a slightly different IP) that added "ancient Armenian" to the etymology for an Old Armenian term- they're even further out there than you give them credit for. They're changing their IP, so I doubt they'll read anything you leave on their talk page, but it never hurts to try, I guess. That said, feel free to revert them- as far as I'm concerned, they're only one step removed from the vandals who randomly replace language headers with the names of their own languages. Chuck Entz (talk) 08:13, 20 August 2017 (UTC)

break[edit]

Hi there. I'm taking a month off Wiktionary to concentrate on things IRL. You won't be hearing from me at all. So, in the unlikely case that you see someone here who you think might be me, who's following my edit patterns or whatever, it won't be. Thanks. . --WF on Holiday (talk) 17:49, 20 August 2017 (UTC)