Talk:can'idate

RFD discussion: September–October 2022

Latest comment: 1 year ago28 comments10 people in discussion

The following information passed a request for deletion (permalink).

This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.

~~can'idate~~

Equinox ◑ 07:08, 21 September 2022 (UTC)Reply

(((Romanophile))) ♞ (contributions) 08:17, 21 September 2022 (UTC)Reply

@Romanophile Was there meant to be a vote here? PseudoSkull (talk) 12:32, 22 September 2022 (UTC)Reply

(((Romanophile))) ♞ (contributions) 22:06, 24 September 2022 (UTC)Reply

Keep. We allow pronunciation spellings, and this one's cited, so what's the problem? Binarystep (talk) 10:07, 21 September 2022 (UTC)Reply

The problem is that when you are deliberately writing down non-standard speech, such as hiccups or dialect, you may, oh fuck it, I now remember we also have ze as "how a mysterious French villain says the". Pearls and swine. Equinox ◑ 12:27, 21 September 2022 (UTC)Reply

Keep. As above. Overlordnat1 (talk) 10:31, 21 September 2022 (UTC)Reply

Delete, unhelpful. - TheDaveRoss 12:33, 21 September 2022 (UTC)Reply

Delete, inane. Equinox ◑ 17:57, 21 September 2022 (UTC)Reply

Does anyone have a writeup about how to deal with Category:English eye dialect? Which should be kept, which deleted and why? --Dan Polansky (talk) 18:23, 21 September 2022 (UTC)Reply

I don't know if there is a writeup, but my opinion is that almost all eye dialect should be out-of-scope. There are likely rare terms which might merit inclusion if they are used widely, especially outside of eye dialect passages. These are not actually words, they are transformations of other words, like "leet" spellings, contractions, etc. - TheDaveRoss 12:25, 22 September 2022 (UTC)Reply

I think the key is that they're also systematic (within a given pronunciation), while most contractions aren't. Unless we're going to add these systematically, I don't really think having a smattering of the few that happen to have been used 3 times is particularly useful, as there's no staying power to them. Theknightwho (talk) 14:08, 22 September 2022 (UTC)Reply

I don't see how can'idate is in any way formulaically predictable. For example, would a pronunciation spelling form for predictable be predic'able or predic'ble or predict'ble or pred'table? Could multiple of these be valid? Why? What formula did we use to determine that? I honestly don't know how to answer that question without looking up book evidence for each form. I don't think leetspeak is quite the same; as far as I know, leetspeak has a specific formula for switching letters with numbers, very much like the r->w formula I talked about below with childish speak. And yes, I do think there should be a massive effort to add these forms. That would be great. Fill that category. We do it for a lot of other topics; entry-mining through some Twain would be fun. PseudoSkull (talk) 15:05, 22 September 2022 (UTC)Reply

The formula is: if a sound is replaced with a different sound, replace instances of the sound with the other sounds (e.g. r -> w, dragon to dwagon). If a sound is omitted, replace the sound with an apostrophe (e.g. can'idate). If a sound is added, add the sound (e.g.grrreat). Boom, eye dialect in three easy steps. In your example, predictable could be some of those, depending on the accent being represented, though I am not sure whether anyone would use any but the first. - TheDaveRoss 13:01, 23 September 2022 (UTC)Reply

@TheDaveRoss I still am failing to understand how that makes can'idate formulaic. In other words, could I write a script in less than 20 lines that parsed 10,000 English words the same way, including candidate, and they'd all come out the same and be correct and (even at least mostly) attestable? No. BUT I could do that for the dragon->dwagon case, because as I have said and as it seems like the opposition keeps missing, that is a totally different thing than can'idate and a separate discussion entirely. That's literally just letter-swapping with a clear and obvious formula that can be determined with written rules, and doesn't require human thinking far beyond those rules to comprehend. What you're talking about with dialects is a lot more arbitrary, and arbitration of this nature does require human thought based on experience, or whatever the case may be, to determine which letter is dropped.

As you say, there are many dialects, but you left out that there are many representations of those individual dialects. They're most often as I've seen them representations of dialect, not even necessarily accurate representations, because the writers of 19th-early 20th century books didn't often speak those dialects themselves, and they're not proper forms anyway so there's really no standardization of them (which is another point against them being formulaic). Take the misrepresentations of Germanic or Irish dialect or African American vernacular in books back in the day. And because of this I've seen (and I should take credit for being an active Wikisource contributor who has worked on multiple works with a lot of dialect in them; I can name some if you'd like), that different sources using the same supposed "dialect" very often don't even agree on how many words in that dialect should be represented in writing.

So no, written representations of dialects are not at all straightforward formulaic things; they're pretty complicated. A good rule of thumb: if no one can write a script in less than 20 lines to represent multiple entire dialects consistently and without failure, the words should probably be allowed to be entries here, assuming proper attestation. And beyond that, they would be helpful to many users, and it just so happens this deletionist rhetoric against the obscure but useful always ends up undermining the value of wiki projects. PseudoSkull (talk) 15:27, 23 September 2022 (UTC)Reply

I would argue the opposite, excessive inclusionism undermines the value of the project; there is far more dross than gold out there, and the more dross which is included the harder it is to find the gold.

To your scripting point, I don't see how that is relevant, just because something is hard to write a program around that doesn't mean it isn't formulaic, I gave you a simple algorithm. Also, I bet that such a script is not all that difficult to implement, assuming you had a library that parsed words phonetically, and could rigorously define what constituted the accent or dialect you were trying to represent. At the end of the day there is lots of license in how various authors choose to represent eye dialect terms, and there is no lexical value to the terms themselves since they are merely alternative representations of other words. It would be like including Golf and GOLF rather than merely golf, there is a reason why the two former terms exist, but they are not meaningfully distinct from the final term lexically. - TheDaveRoss 15:40, 23 September 2022 (UTC)Reply

I would argue the opposite, excessive inclusionism undermines the value of the project; there is far more dross than gold out there, and the more dross which is included the harder it is to find the gold.

How? The only way you'd encounter any of these pronunciation spellings is if you go out of your way to look for them. Including them on the site doesn't reduce functionality for anyone, but deleting them certainly would. Binarystep (talk) 04:31, 24 September 2022 (UTC)Reply

That seems to presume that the primary or exclusive way that Wiktionary data is/will be used is by people coming to Wiktionary.org and looking up a word they would like the definition of. That is certainly not the exclusive way the content is used, and my guess is that it is the minority use-case. Imagine that you maintain a third-party application which would like to leverage lexical content. Wiktionary has the advantage of being free and permissive, however it is an incredible pain to try and use the data from Wiktionary because it is full of noise, and only moderately well structured. I would like it to be easy for someone who is, say, making a word game app or an open-source spell checker to be able to make use of all of the work which has been put into this project, but including misspellings and pronunciation spellings and the like make the content less and less useful for those types of uses. Increased surface area also requires increased upkeep, and increased resources. While those are not high-priority concerns, they are still real. - TheDaveRoss 13:07, 26 September 2022 (UTC)Reply

Realistically, that would need us to use a structured data model (which I support a gradual transition towards). Commons uses a mixed approach at the moment (traditional wiki + structured data), as an example of how it can be done. One advantage of that would be that it would be trivial for anyone who doesn’t want them to exclude these, so long as they are actually marked as misspellings or eye dialect. Theknightwho (talk) 14:53, 26 September 2022 (UTC)Reply

Structured data would be great, but I have used Wiktionary data in its current form for outside projects, and I know there are lots of others doing the same thing. - TheDaveRoss 17:10, 26 September 2022 (UTC)Reply

@PseudoSkull The formula is that a word-medial ⟨nd⟩ becomes ⟨n'⟩. Theknightwho (talk) 15:00, 26 September 2022 (UTC)Reply

We have Category:English leet and search in WT:VOTES and WT:BP finds support for leet. I won't post the search results to make it short. A vote to add explicit support for leet into CFI would be preferable.

Category:English pronunciation spellings is full of -in' spellings such as bakin'. These are per consensus in Wiktionary:Beer_parlour/2008/March#-in'_forms. this diff shows more keeps for an -in' form.

The -in' forms are perfectly predictable and numerous. If we want to delete predictable pronunciation spellings, we should start with -in' forms and do so via a policy change. They can then be deleted without RFD in volume: these are going to be hundreds of entries, per Category:English pronunciation spellings.

Keep per above. --Dan Polansky (talk) 16:03, 23 September 2022 (UTC)Reply

I archived two RFDs to Talk:bein' and Talk:frontin'. --Dan Polansky (talk) 07:17, 24 September 2022 (UTC)Reply

For tracking, this is per User:Dan Polansky/IA § Pronunciation spellings. --Dan Polansky (talk) 08:17, 9 October 2022 (UTC)Reply

Keep, it's not inane at all to help someone understand the meaning of a word, which is what a dictionary is for. Someone, especially a non-native speaker, could have a difficult time discerning forms like this. There's no clear formula to dropping letters or groups of letters and adding apostrophes, like there was for dwagon (the rs are always replaced with ws in these representations of childish language), so it's not tantamount to SOP. If you don't like eye dialect/pronunciation spelling forms at Wiktionary, bring it to a CFI vote, to which I'll happily oppose too. PseudoSkull (talk) 19:02, 21 September 2022 (UTC)Reply

Keep. The current CFI policy does not provide any basis for excluding this term. The only alt forms that seem to be categorically disallowed are typos. Moreover, there are over 1700 entries in Category:English pronunciation spellings, and this one doesn't seem substantially worse than average. Deleting this would seem to imply that many/most of the others should go and that we should ban such words in future; I think a decision of that magnitude deserves a formal vote. All in all, I think these entries could be mildly useful or interesting, and I don't see much downside, so I'd support lax inclusion for (attested) pronunciation spellings. 70.172.194.25 04:45, 24 September 2022 (UTC)Reply

RFD-kept: pretty much snowball by now. --Dan Polansky (talk) 09:52, 9 October 2022 (UTC)Reply

As a late comment, I am also inclined to keep this. "Changing the spelling of a word to indicate pronunciation" is a spectrum, and on one end I don't think anyone would propose to delete things like bosun, and on the other end we've agreed to redirect things like baaaaaaaaad. Personally, I think this is on the includable end of the spectrum, because the apostrophe could be anything (here it's d, but in 'alf it's h, and in fo'c'sle it's r in one place and a in another), so someone who doesn't already know the word can't reconstruct the original spelling to look it up, whereas someone who sees bad can be expected to figure it out (notably, the possibilities are pretty limited: it either represents one a, or maybe two, the way you could draw soon out to sooooon). I also don't know that it's right to say it's formulaic: you say "The formula is: if a sound is replaced with a different sound, replace instances of the sound [...]", but in this very word, not all instances of the sound have been replaced. Basically, I see no benefit to excluding this, and marginal utility to including it. - -sche (discuss) 21:43, 9 October 2022 (UTC)Reply

Talk:can'idate

RFD discussion: September–October 2022

Navigation menu

Search