Category talk:Taos lemmas
Latest comment: 8 months ago by ExcarnateSojourner in topic RFM discussion: May 2017–November 2023
The following discussion has been moved from Wiktionary:Requests for moves, mergers and splits (permalink).
This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.
Entries in CAT:Taos lemmas with curly apostrophes
Many Taos entries use curly apostrophes to represent glottal stops. They should either use the easy-to-type straight apostrophe ' that many other languages use, or the apostrophe letter ʼ that Navajo and a few other languages use. - -sche (discuss) 21:36, 20 May 2017 (UTC)
- I agree. The headword template interprets the curly apostrophe as a punctuation mark (because it is), and automatically links words such as adùbi’íne as adùbi’íne. (Personally, I think the apostrophe letter looks better, but there may be other considerations.) — Eru·tuon 21:45, 20 May 2017 (UTC)
- Oh, and I just learned of the Unicode character ꞌ for the saltillo. But no entries use it, and I am averse to introducing yet another visually-almost-identical symbol to represent the glottal stop, next to the three (counting the curly apostrophe) mentioned above that are already in use, plus the ˀ that some entries use. - -sche (discuss) 02:23, 21 May 2017 (UTC)
- I'm in favor of standardizing on U+02BC MODIFIER LETTER APOSTROPHE for any language that uses an apostrophe-looking thing as a letter. —Aɴɢʀ (talk) 13:52, 21 May 2017 (UTC)
- Probably reasonable for glottalizationy apostrophes. At least Skolt Sami uses ʹ U+02B9 MODIFIER LETTER PRIME for suprasegmental palatalization though, which should likely be kept separate. --Tropylium (talk) 16:55, 21 May 2017 (UTC)
- I'm in favor of standardizing on U+02BC MODIFIER LETTER APOSTROPHE for any language that uses an apostrophe-looking thing as a letter. —Aɴɢʀ (talk) 13:52, 21 May 2017 (UTC)
- Oh, and I just learned of the Unicode character ꞌ for the saltillo. But no entries use it, and I am averse to introducing yet another visually-almost-identical symbol to represent the glottal stop, next to the three (counting the curly apostrophe) mentioned above that are already in use, plus the ˀ that some entries use. - -sche (discuss) 02:23, 21 May 2017 (UTC)
- I've moved quite a few of these; about 140 remain to be moved. - -sche (discuss) 04:49, 24 July 2018 (UTC)
- @-sche Is there an easier way to tell which entries still need to be moved then opening each of them individually? I've tried using Ctrl+F on the category page, but apparently Mozilla thought it would be helpful for all these apostrophe-like characters to match each other. — excarnateSojourner (talk · contrib) 04:58, 8 September 2023 (UTC)
- @ExcarnateSojourner If you Ctrl+F and then select "match diacritics" it matches only the character you want. Another option: if you download AWB you can use it to pull the contents of a category and then to filter and keep only titles containing the curly quote, which yields this list. The AWB software enforces the rule that you can't actually change pages with AWB unless you're approved, but you can use it to search database dumps and filter categories and generate lists like that regardless of whether you're approved or not. I pulled that list from the "Taos lemmas" category; the few non-lemma forms we have seem to use the modifier letter apostrophe already, so unless there are forms which are neither categorized as lemmas nor as non-lemmas, that should be all the entries ... but note that there may be occurrences in translations tables, links in one Taos entry to another, etc, which also need changing. - -sche (discuss) 19:13, 8 September 2023 (UTC)
- Well that was silly of me not to see "Match Diacritics". Thanks. — excarnateSojourner (talk · contrib) 21:17, 8 September 2023 (UTC)
- @ExcarnateSojourner If you Ctrl+F and then select "match diacritics" it matches only the character you want. Another option: if you download AWB you can use it to pull the contents of a category and then to filter and keep only titles containing the curly quote, which yields this list. The AWB software enforces the rule that you can't actually change pages with AWB unless you're approved, but you can use it to search database dumps and filter categories and generate lists like that regardless of whether you're approved or not. I pulled that list from the "Taos lemmas" category; the few non-lemma forms we have seem to use the modifier letter apostrophe already, so unless there are forms which are neither categorized as lemmas nor as non-lemmas, that should be all the entries ... but note that there may be occurrences in translations tables, links in one Taos entry to another, etc, which also need changing. - -sche (discuss) 19:13, 8 September 2023 (UTC)
- @-sche Is there an easier way to tell which entries still need to be moved then opening each of them individually? I've tried using Ctrl+F on the category page, but apparently Mozilla thought it would be helpful for all these apostrophe-like characters to match each other. — excarnateSojourner (talk · contrib) 04:58, 8 September 2023 (UTC)
- Done: I (as ExcarnateSojournerBot) have used a Python script to replace curly apostrophes (U+2019) with modifier letter apostrophes (U+02BC) in the titles and text of all entries in cat:Taos lemmas, cat:Taos non-lemma forms, and the sole subcategory of the latter. — excarnateSojourner (talk · contrib) 23:45, 9 September 2023 (UTC)
- Thanks. Are you able to run a script to clean up translations? E.g. river/translations and stream still have the form with the curly single quote mark. Plausibly there could also be mentions of Taos words in etymology sections. I think searching a database dump for instances of {{t|twf|...}}, or the equivalent with
{{t+}}
,{{t-check}}
,{{t+check}}
,{{tt}}
,{{tt+}}
,{{tt-check}}
,{{tt+check}}
,{{m}}
,{{m-lite}}
,{{l}}
, and{{l-lite}}
, or any etymology template ({{bor|FOO|twf|...}}
, etc) where FOO is any language code (or any string of 2-11 a-z or - characters) and the ... is any number of characters other than | or }} that includes one or more curly apostrophes, would find relevant instances. If I could work out how to write that as a regex string, I could search the database dump myself with AWB and provide the list. - -sche (discuss) 06:24, 11 September 2023 (UTC)- I have started working on this, but am not sure when I'll be done. — excarnateSojourner (talk · contrib) 06:17, 28 September 2023 (UTC)
- Done: I believe I have replaced every instance in a template which includes
|twf|
everywhere in mainspace. See my bot project for details. — excarnateSojourner (talk · contrib) 02:42, 26 November 2023 (UTC)
- Thanks. Are you able to run a script to clean up translations? E.g. river/translations and stream still have the form with the curly single quote mark. Plausibly there could also be mentions of Taos words in etymology sections. I think searching a database dump for instances of {{t|twf|...}}, or the equivalent with