Template talk:ja-usex

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

How many characters can it handle?[edit]

See 海行かば.

Tested a very large usage example, how much characters can this template can handle without reaching the did not match the kana (due to overflow)? Just wondering. --POKéTalker (talk) 05:33, 9 November 2017 (UTC)

@Poketalker: I don't know if I understand this properly, but I'll attempt an answer. The Lua pattern length limit (mw.ustring.maxPatternLength) is 10,000 bytes, so I guess the pattern used internally by the ruby function in Module:ja has to be smaller than or equal to that. The pattern won't be the same length as the text in the template, so who knows what the byte limit or character limit for input to the template are. (The patterns used for your usage example were "(.+)%%(.+) (.+)%%(.+)%%(.+) (.+) (.+)%%(.+)'''(.+)%%(.+) (.+)%%(.+) (.+)%%(.+) (.+)%%(.+)%%(.+) (.+)%%(.+) (.+)%%(.+)%%(.+) (.+) (.+)'''(.+) (.+)%%(.+)%%(.+) (.+)%%(.+)%%(.+)%%(.+)", 180 bytes long, and "(.+) の (.+)%%(.+)の(.+) を (.+)り (.+)へし (.+)'''(.+) (.+)かば (.+)く (.+) (.+) (.+)かば (.+)%%(.+)す (.+) (.+) の (.+) に こそ (.+)なめ かへり(.+) は せじ'''と (.+)%%(.+)て(.+)%%(.+) の さき の (.+)けば (.+)み", 245 bytes long, so they didn't come anywhere close to the limit. The man'yougana and modern spelling in the wikitext are 231 and 280 bytes.) There are other unpredictable factors, like whether Lua will run out of memory or processing time with longer text. — Eru·tuon 06:29, 9 November 2017 (UTC)
@Erutuon: See [1]. I think this is what POKéTalker means. —suzukaze (tc) 07:51, 11 November 2017 (UTC)
@Suzukaze-c: That's strange. I don't understand it, because the pattern isn't over 10,000 bytes, and it looks like it should match. — Eru·tuon 08:41, 11 November 2017 (UTC)
Okay, my new theory is that it was because of characters or sequences of characters that appear more than once in the kana text. The "(.+)" elements in the pattern are greedy and they were grabbing too much text (that is, skipping from the first of the repeated sequences of characters to the second, or something), leaving not enough for the ones later on in the pattern. Hence the pattern didn't match. It's hard to verify this theory, because you can't monitor the matching function in the C or PHP code while it's trying to match and figure out where it fails; you just get the result, match or no match.
I changed these elements to non-greedy "(.-)", and now the problem seems to be fixed. — Eru·tuon 05:14, 14 November 2017 (UTC)
If my willingness may, put the entire chōka here for testing? If not, how about the above link's talk page or in Wikiquote? --POKéTalker (talk) 02:23, 17 November 2017 (UTC)