Module talk:term etymology

From Wiktionary, the free dictionary
Latest comment: 7 years ago by Victar in topic More examples
Jump to navigation Jump to search

@Erutuon, I'm trying to write a module to grab the etymology of a term, based on {{Module:descendants tree}}, but it doesn't seem to be working. Could you have a look at what I'm doing wrong? Much obliged. --Victar (talk) 18:24, 18 May 2017 (UTC)Reply

I think you needed to use match rather than find: find returns the first index of the content as its first value, while match returns the content. I changed the regex as well because I didn't understand it. You can change it back if you like. — Eru·tuon 18:35, 18 May 2017 (UTC)Reply
@Erutuon: Ah, thanks! That would have taken me forever to figure out. How can I get the rendered content instead of the raw code? --Victar (talk) 18:40, 18 May 2017 (UTC)Reply
@Victar: I think you use the frame:preprocess(text) function. You'll have to use mw.getCurrentFrame to return the current frame object first. — Eru·tuon 18:44, 18 May 2017 (UTC)Reply
@Erutuon:, again, you just saved me at least 30 mins. =) --Victar (talk) 18:51, 18 May 2017 (UTC)Reply
Christ, why is it so difficult to make the first character in a string lower case?! My PHP roots are failing me. --Victar (talk) 19:27, 18 May 2017 (UTC)Reply
You can use mw.ustring.gsub to do this... though there's also a mw.language:lcfirst that I haven't used yet. — Eru·tuon 19:37, 18 May 2017 (UTC)Reply
Weird. I guess Lua and I just don't sit eye to eye yet. --Victar (talk) 20:01, 18 May 2017 (UTC)Reply
@Erutuon, on another note, what do you think of the concept of nesting etymologies? --Victar (talk) 18:51, 18 May 2017 (UTC)Reply
@Victar: I don't know. What do you mean? — Eru·tuon 18:56, 18 May 2017 (UTC)Reply
@Erutuon: I'll put together an example and show you. --Victar (talk) 19:01, 18 May 2017 (UTC)Reply
@Erutuon: Have a look at this, albeit dirty, example (ignore the template loop error). --Victar (talk) 20:01, 18 May 2017 (UTC)Reply
@Victar: Ahh. It's kind of cool, but not sure what I think of it. (The template loop error can be fixed in the same way as in {{desctree}}.) It puts one etymology at the mercy of another, and it might not work when the etymology is oddly formatted or complicated. Some people might object to automatically adding a full derivational chain to every entry, no matter how it's done. Actually, one problem is that the first language code in the transcluded etymology is not changed, so apple will display as a Middle English word derived from Old English, an Old English word derived from Proto-Germanic, etc. That could be fixed, I suppose. One would also have to make it fix the types of derivation (inheritance, borrowing). — Eru·tuon 21:35, 18 May 2017 (UTC)Reply
@Erutuon: Yeah, it's all sorts of buggy and has a long ways to go. I think though it's the direction things should move, if we want to cut down duplication and make Wiktionary accurate and consistent. It's really pretty godawful how so many parent and child entries have conflicting etymologies. I'm not following. Do you mean to say that you think we'll have looping issues from {{der}}, {{bor}}, etc as well? I can also see it possibly being integrated into such templates, as |etyl=1. --Victar (talk) 02:17, 19 May 2017 (UTC)Reply
@Victar: No, it's not about template looping errors. It's about the language codes inside the templates and the etymological categories. If you transclude {{bor|fr|la|word}} from a French entry to an English one, then the English entry will have the category of "French word borrowed from Latin", rather than the correct category of "English word derived from Latin". — Eru·tuon 02:26, 19 May 2017 (UTC)Reply
@Erutuon: I'm not seeing the problem, ex. User:Victar/duvet. I tried fixing the looping issue, but no luck. --Victar (talk) 03:27, 19 May 2017 (UTC)Reply
@Victar: That's because the categories aren't added unless you're in the main or Reconstruction namespace. — Eru·tuon 03:34, 19 May 2017 (UTC)Reply
@Erutuon: OH, OK, I understand now. --Victar (talk) 03:46, 19 May 2017 (UTC)Reply
I thought etymology = mw.ustring.gsub(etymology, "{{termetyl|", "{{#invoke:User:Victar/term etymology/templates|show|") would fix the looping issue, but no such luck. --Victar (talk) 04:08, 19 May 2017 (UTC)Reply
Me too. It worked for Module:descendants tree. I'll look at it at some point and see if I can figure out why it isn't working here. — Eru·tuon 04:50, 19 May 2017 (UTC)Reply
Cool, thanks. If that gets fixed, I can present it to everyone and see what the hive can come up with. --Victar (talk) 04:59, 19 May 2017 (UTC)Reply
@CodeCat, this module is based on your code. Any idea what's causing the loop? --Victar (talk) 04:56, 20 May 2017 (UTC)Reply
@Erutuon: Doh! It's always the little things that drive you crazy! --Victar (talk) 01:30, 23 May 2017 (UTC)Reply

borrowings

[edit]

@Victar, unlike {{inh}}, {{bor}} can only be used once in an entry and only as the first etymological template. So this template should always convert all instance of {{bor}} to {{der}}. I just noticed you're doing a lot of work for something that isn't necessary. —JohnC5 02:00, 24 May 2017 (UTC)Reply

@JohnC5: HAH, I suppose you're right... --Victar (talk) 02:15, 24 May 2017 (UTC)Reply
@Victar: Yeeeah, I feel bad for not noticing earlier. —JohnC5 02:23, 24 May 2017 (UTC)Reply
@JohnC5: No worries. --Victar (talk) 02:55, 24 May 2017 (UTC)Reply

PIE root

[edit]

@Victar, just an FYI, the {{PIE root}} can be more complex than just two positional parameters. Also, you're intending to pull it through, not delete it, right? —JohnC5 00:07, 24 May 2017 (UTC)Reply

@JohnC5 Hmm, I hadn't really thought it through, other than removing it from the loop, which it was disrupting. It would be smart to have it re-inserted in again though. Maybe at the end?
Could work. —JohnC5 00:17, 24 May 2017 (UTC)Reply

More examples

[edit]

@Victar, could I entreat you to make some more example pages, like pikake? KTHXBAI! —JohnC5 23:03, 24 May 2017 (UTC)Reply

@JohnC5: sure. Did you want me to make them in Appendix, or just edit the actual entries? --Victar (talk) 23:42, 24 May 2017 (UTC)Reply
@Victar: Appendix, I think. I think we need some more examples in order to convince people. —JohnC5 23:44, 24 May 2017 (UTC)Reply
@JohnC5: Hmm, Appendix:pikake is getting placed into Category:Hawaiian_lemmas and Category:Hawaiian_nouns for some reason. --Victar (talk) 00:34, 25 May 2017 (UTC)Reply
@Victar: You forgot to put a period at the end of the etymology for Appendix:pīkake. We should probably beef up the regex for detecting where the Etymology section ends.
@JohnC5: Ah! Yeah, a period or linebreak would be a good start.
@JohnC5, could you have a look at Appendix:pīkake again? I'm not sure why from is being repeated before {{com}}. --Victar (talk) 17:37, 25 May 2017 (UTC)Reply
@Victar: ;PJohnC5 19:14, 25 May 2017 (UTC)Reply
@JohnC5: Doh! --Victar (talk) 19:21, 25 May 2017 (UTC)Reply

@Erutuon you're good at Lua patterns. Can you convert my PHPified pattern to Lua? mw.ustring.match(content, "([^?!.]*).*", index + 1) I just need it to get the first sentence before period or first line. --Victar (talk) 01:55, 25 May 2017 (UTC)Reply

@Victar: Well, that looks like a valid Lua pattern. Has it been failing? — Eru·tuon 02:00, 25 May 2017 (UTC)Reply
Well I need it to stop at either the first period or the first line break. --Victar (talk) 02:09, 25 May 2017 (UTC)Reply
Woop woop! Looks like @JohnC5 figured it out. Thanks man! --Victar (talk) 02:23, 25 May 2017 (UTC)Reply
Yeah, I got it. I was trying ([^%.\n]*), but this was returning nothing. I realized that the first character after the header is actually a new line, so it was matching an empty string. But if we do ([^%.\n]+), forcing it to find a non-empty result, it matches the correct thing. —JohnC5 02:25, 25 May 2017 (UTC)Reply
Looks good. (The escaping of the period seems to be unnecessary, as period apparently doesn't mean any character when it's inside set notation.) It might run into problems with the nonstandard language codes VL. and LL. for Vulgar and Late Latin. And I wonder if any languages have periods in their orthography. If so, I have a workaround in mind. — Eru·tuon 02:29, 25 May 2017 (UTC)Reply
I also realize this is breaking things with {{PIE root}} in them because they have a new line. I think we should look try to catch the \n=== of the next section. —JohnC5 02:32, 25 May 2017 (UTC)Reply
We could always rescrape for it. --Victar (talk) 02:44, 25 May 2017 (UTC)Reply
So yeah, I now have the everything in the Etymology section (PS, we're gonna need to figure out how to deal with Etymology 1, Etymology 2, etc.). You need to determine a heuristic for picking out only the part of the etymology you want (i.e. first sentence or whatever). Maybe @Eutuon has ideas. —JohnC5 02:52, 25 May 2017 (UTC)Reply
{{senseid}} is how it's "supposed to" work in {{desctree}}. --Victar (talk) 02:57, 25 May 2017 (UTC)Reply
OK, now I'm fairly certain I've got the entire etymology section without the last period if it exists. We may want to figure out a way to trim this down to just the content we want. Also, I'm going to remove any ref tags, because they could be very annoying. —JohnC5 02:41, 25 May 2017 (UTC)Reply
Good call on the <ref> tags! --Victar (talk) 02:44, 25 May 2017 (UTC)Reply