Wiktionary talk:About Czech

Definition from Wiktionary, the free dictionary
Jump to: navigation, search

Basic entries[edit]

  1. I would place an empty line between {{cs-noun|g=m}} and # [[wind]] (''movement of air''). This is the way most Czech entries and most English entries are formatted.
  2. I would indicate that the most basic entry is one without declensions and conjugations, not one with them. I understand declensions and conjugations are useful. But they take a lot of additional work to create; I think it should be of high priority to have a correctly translated entry in the first place. --Daniel Polansky 11:21, 4 March 2008 (UTC)
Agreed and updated. ThomasWasHere 15:24, 4 March 2008 (UTC)

References[edit]

I think references are not part of the most elementary article. --Daniel Polansky 11:31, 4 March 2008 (UTC)

Agreed and updated. ThomasWasHere 15:24, 4 March 2008 (UTC)

Proprietary resources on Czech[edit]

I would avoid referring to proprietary resources on Czech from this policy page. --Daniel Polansky 11:33, 4 March 2008 (UTC)

Agreed and updated. ThomasWasHere 15:24, 4 March 2008 (UTC)

Relevance of references[edit]

I estimate that dictionaries are not even proper references. There is a policy saying that Wiktionary is a secondary source, for which the primary sources are the texts in which the terms occur. To use another dictionary, a secondary source, as a basis for Wiktionary is to make Wiktionary a tertiary source, which is not wanted. That is at least my understanding of this policy: Wiktionary:Wiktionary is a secondary source.

Based on this consideration, I would propose to drop references from the policy altogether. --Daniel Polansky 16:12, 4 March 2008 (UTC)

OK. Let's just drop this section, it's anyway not a priority for Czech entries. I have read what you pointed out but I have also read Wiktionary:Entry_layout_explained#References that links to Category:Reference_templates where you can see a lot of public domain dictionaries. Maybe in the far future it will have to be considered again or if there is a public domain Czech-English translation dictionary. Thanks for your reviews. ThomasWasHere 16:36, 4 March 2008 (UTC)
I see. Thanks for referring me to Wiktionary:Entry_layout_explained#References. I must have I misunderstood something; it seems that referring to public domain dictionaries is wanted. --Daniel Polansky 18:41, 4 March 2008 (UTC)
A common sense rule seems to be that use of dictionaries are weak references but still useful. See Wiktionary:Referencing dictionaries. ThomasWasHere 11:17, 5 March 2008 (UTC)

Adjectives - feminine and neuter gender[edit]

I am so far not convinced that bílý is a good pattern or model for adjectives. Above all, I would prefer to align the Czech policy for adjectives with multilingual policy for all the languages having gender. Unfortunately, I do not know of such a policy.

So far, I have avoided creating any other than masculine adjectives. And when entering masculine adjectives, I have avoided entering feminine and neuter forms.

So instead of:

==Czech==

===Noun===
{{cs-adj}}, [[bílá]] {{f}}

# [[white]]

I would prefer

==Czech==

===Noun===
{{cs-adj}}

# [[white]]

If you would really want to have masculine and neuter forms in the policy, then I propose that you research into the current use in other languages.

The current proposal, even if accepted, would have to be extended to:

==Czech==

===Noun===
{{cs-adj}}, [[bílá]] {{f}}, [[bílé]] {{n}}

# [[white]]

And it leaves other questions open: what should the entry for bílá look like? Should it also link to bílý and bílé?

Also, the minimal entry should IMHO not require the statement of feminine and neuter forms; the main point should be that there is a correct translation.

--Daniel Polansky 09:03, 5 March 2008 (UTC)

Yes, it was not very clever from me to put this example : /. I have replace the example with malý. The masculine singular nominative form should be the only entry like for noun (gender apart) but I propose to add the comparative and superlative form in the template like in english for small. See also malý in the Czech version and malý in the German version (full table). It will need later a declension template {{cs-decl-adj}}. ThomasWasHere 10:17, 5 March 2008 (UTC)
Thanks. I agree that adding a comparative and superlative to {{cs-adj}} would be valuable, modelled on {{en-adj}}. Still, to provide for automatic formation of comparative and superlative would be a bit more tricky, which is why I have avoided the task so far.
A declension template is still missing; right. --Daniel Polansky 10:42, 5 March 2008 (UTC)

Multiple gender[edit]

From the page text it seems that decision hasn't been made yet of how to handle adjectives with multiple gender like "noční". At Wiktionary:Index to templates the template {{ c }} is listed, standing for "common" - why not use that one? (Or does "common gender" mean something else?)Duncan MacCall 18:41, 15 September 2008 (UTC)

AFAIK common stands from a merging of masculine and feminine genders in certain languages, such as Swedish, judging from W:Grammatical gender, and specifically W:Grammatical_gender#Common_and_neuter. So common and the template {{c}} should better be avoided at Czech entries. --Dan Polansky 08:07, 18 March 2009 (UTC)

Proverbs[edit]

I have created a dedicated section for proverbs, as a more detailed policy is probably needed. It is unclear--to me anyway--where to add literal translations, and what to do in case of not finding a semantically equivalent English proverb. --Daniel Polansky 10:47, 5 March 2008 (UTC)

I haven't find any policy for proverbs in any language but most of them give the equivalent in English if it exists and the literal translation. Here are some ideas:
  1. If there is no equivalent in English we should provide a link to the Czech Wiktionary. Unfortunately there is no proverbs category in the Czech Wiktionary. Maybe we can find a public domain repository of Czech proverbs explained in English to link ?
  2. For the literal translation even if it is useful I would more just wikified the words of the proverb because it is redundant to translate the proverb. The only exception is when there is no English equivalent of the proverb. Yet, there is still an exception = ) if you use the Etymology header where you can give the etymology of the Czech proverb. ThomasWasHere 12:02, 5 March 2008 (UTC)
It seems that providing an explanation of the idiomatic meaing of the proverb is what is wanted, judging from à bon chat, bon rat and falsum in uno, falsum in omnibus, also judging from Connel MacKenzie's comment in his RFC in my tak dlouho se chodí se džbánem pro vodu, až se ucho utrhne. Explanations of idiomatic meanings are what is found in English proverb entries. I do not think that providing a link to Czech Wiktionary solves the problem of information missing in English Wiktionary.
There is Q:Czech_proverbs, explaining some of Czech proverbs. I have added a link to it to Category:Czech proverbs some time ago.
I do not see that it is redundant to translate the proverb. The literal translation is still an interesting piece of information, isn't it?
I still do not know how to format the literal translations. --Daniel Polansky 12:34, 5 March 2008 (UTC)
Yes, providing an explanation of the idiomatic meaning of the proverb is useful. However if there is an equivalent proverb in English, a link to it can be enough. I missed the quotation page, it's a valuable resource, thanks. As for the literal translation it is useful too but if you use the principle of Wiktionary to the extreme you can say that the user should refer to the entry of each word or group of words in the proverb, thus wikilinked the proverb could be enough. For the formatting of the literal translation the nicer form I have seen is for Japanese proverbs that add it after the translation with a Lit. in front. I prefer this to add the literal translation in bracket after the entry header. Below is a prototype you can copy and paste in tak dlouho se chodí se džbánem pro vodu, až se ucho utrhne to see how it looks. And also good point to have put the lemma of the word in the wikilinks. Damn, it's not easy to format this = ) ThomasWasHere 14:41, 5 March 2008 (UTC)
==Czech==

===Proverb===
{{cs-prov|sg=[[tak|Tak]] [[dlouho]] [[se]] [[chodit|chodí]] [[se]]
[[džbán|džbánem]] [[pro]] [[voda|vodu]] [[až]] [[se]] [[ucho]] [[utrhnout|utrhne]].}}

# '''No English equivalent'''. ''Literally'', so long does one walk with
a jug for water, until one day the handle breaks off.
#: Explanation.

From what I now think to understand, the literal translation should not be after "#". And there is no need to state "No English equivalent" explicitly; that is obvious from not providing a link to that equivalent. The only issue open right now is where to put the literal translation; what belongs after "#" is either a link to the English equivalent if there is one or an explanation of the meaning of the proverb.

I do not know what "the principle of Wiktionary" you are referring to. I assume that you simply mean what you say after the invocation of "the principle of Wiktionary", namely that you can click on the single words. But clicking some ten words in some cases is a lot of work; forming a translation from these is still a further task, not necessarily easy one for a non-native speaker of Czech. --Daniel Polansky 17:21, 5 March 2008 (UTC)

Sorry for the the principle of Wiktionary, I should have been tired : ). I have look at every language about page and the talk page that goes with and found nothing for proverb. The same for a search on proverb in all talk pages. I am waiting for the answer of Connel too.

Summary[edit]

To resume:
  1. Wiki links on the words or group of word of the Czech proverb: yes, in lemma form
  2. Translation with the equivalent proverb in English: with a # after the entry, if not just omit
  3. Literal translation of the Czech proverb: on the same line as the entry, at the end, in bracket using the tr= parameter in the template {{infl}} but should only be used for transliteration or at the end of the definition in bracket ?
  4. Explanation of the idiomatic meaning of the Czech proverb: after the translation with #:, in italic ?

Formatting of literal translations[edit]

The formatting of literal translations in Wiktionary is currently inconsistent, as follows from:

Entry Note
a todo cerdo le llega su san Martín Entry line.
adar o'r unlliw hedant i'r unlle Entry line.
man kan inte lära gamla hundar sitta Entry line.
betri er krókur en kelda Etymology section.
Дурак дурака видит издалека Definition line.
船頭多くして船山に登る Definition line.
Bindfäden regnen Definition line.

Connel MacKenzie mentioned Category:zh-cn:Proverbs as a good model. In there, the literal translations, when present, are found in etymology section. I find it a bit strange though, as literal translations do not indicate the origin of the term or how the term came about. Sticking to the convention of putting the literal translations into the etymology section may be an okay temporary solution. --Daniel Polansky 10:42, 8 March 2008 (UTC)

I am not convinced either by using the Etymology section. This is definitely a question to ask in the grease pit putting a summary of our discussions and link to here and Connel talk section. ThomasWasHere 21:37, 8 March 2008 (UTC)

See also[edit]

Phrase punctuation and first letter[edit]

I would rather use upper case for the first letter and an end punctuation. There seems to be no consensus because how are you is with an upper case first letter and how do you do is without. But anyway it seems to me more correct.

Another thing is the format of the entry. An upper case first letter or end punctuation is not necessary because a user would most of the time forgot to type it when doing a search and if the user type it it will be first in the possible results. So no need to add a redirection for the entry with an upper case first letter or an end punctuation. ThomasWasHere 12:22, 7 March 2008 (UTC)

==Czech==

===Phrase===
'''Dobrý den!'''

# [[good day|Good day!]]

[[Category:Czech phrasebook]]
Hi, IMHO the prevailing Wiktionary practice is that phrases start in lowercase. Proverbs start with lowercase too, despite being complete sentences. How are you is an anomaly, entered in a lower-case entry anyway. --Daniel Polansky 19:43, 7 March 2008 (UTC)
There is a policy concerning capitalization that suggest to use uppercase first letter if the phrase is a sentence, however there is no period at the end. --Thomas was here ☻Talk 17:46, 26 March 2008 (UTC)
Okay. But the policy does not match the current common practice, as you can see from Category:Proverbs. --Daniel Polansky 09:10, 27 March 2008 (UTC)

Punctuation[edit]

There is no period at the end of a proverb entry. The punctuation at the end of a phrase could be omitted too. Some English phrase entries have an exclamation mark or a quotation mark at the end of the lemma, while many don't. I would tend to omit the punctuation, based on the proverb model. --Daniel Polansky 20:02, 7 March 2008 (UTC)

I don't want to change the format of the name of the page but the format of the entry line. However, if you think that both should be the same I understand better why it should stay like this. ThomasWasHere 20:54, 8 March 2008 (UTC)
I understand that you want to change the format of the entry line, not the name of the page. I think above all that we should stick to what is a common practice: looking for what is already most common, checking examples and models, seeing what the community of the authors of Wiktionary has already been doing, instead of coming up with our own solutions. --Daniel Polansky 07:34, 9 March 2008 (UTC)

Formatting of disambiguations/gloss[edit]

I have always formatted the disambiguations in brackets in italic. In the policy, you have changed the formatting to roman. What made you change it? Do you think it is common Wiktionary practice to format it in roman? --Daniel Polansky 21:20, 7 March 2008 (UTC)

Oups. Sorry I have changed it to fast. I should have first ask here before to change anything. I have put the italic back in the policy. I have to learn to be more patient, it is so tempting to change a page : /
As for the disambiguation at the end of the definition line called a gloss if I don't mistake like the template {{sense}} but not {{context}} or {{qualifier}}, I have found nothing on the format in the English policy. When I look at the most edited pages like cat and dog the gloss seems mainly non italic. But it is the English entries, for the non-English entries it can be italic or not. There is a template that could be use: {{italbrac}}, it allows people to choose how it looks in modifying their own style sheet. The template {{i}} you are using is a shortcut of {{qualifier}} and is maybe not the best to use for that. I just have seen in the history of the template that italbrac is a split of qualifier so I understand better why you use this one. I propose to use italbrac or even sense because it is better to use semantic template than just formatting template. ThomasWasHere 20:47, 8 March 2008 (UTC)
I have started to use {{i}} when accidentally coming over it, without looking at its documentation, assuming it is equivalent to the longer {{italbrac}}. It seems that {{sense}} would fit the purpose, as the gloss is in a Czech entry to indicate which of the several senses is meant; it is there in an English entry in synonyms section for the same purpose.
It seems okay to me if the Czech policy uses (''...'') instead of using a template. I personally am going to let my practice evolve, as things change here in Wiktionary, codifying only these things at my user page that I feel a need to get codified. --Daniel Polansky 08:05, 9 March 2008 (UTC)
Unfortunately, {{sense}} cannot be use because it adds a colon at the end so only {{italbrac}} can be use or like you do now just (''...''). ThomasWasHere 16:53, 20 March 2008 (UTC)

Model language policy[edit]

Is there any nice, model language policy in Category:Wiktionary language considerations that the Czech policy considerations could be modeled on? That could save some work. --Daniel Polansky 07:48, 9 March 2008 (UTC)

I have formatted this page with Wiktionary:Entry layout explained as model and I find this structure easier to read than the variants you can find in other about language pages. However, we should have only two levels in the content table at the beginning of the page. A new section at the end of the page entitled Czech in non-Czech entries does not seems useful for the moment as the link Wiktionary:Translations given at at the beginning of the page is enough. Wiktionary:About Latin, Wiktionary:About Greek and Wiktionary:About Japanese are the biggest page at this time.
Here are some rules I have followed :
  • Start with basic entries, then more complex
  • For each section: first the example, then short explanations, then long explanations
  • Give example with the source code and a link to the entry each time necessary
  • Do not explain something already explain in Wiktionary:Entry layout explained
ThomasWasHere 16:49, 20 March 2008 (UTC)
Wiktionary:About Hungarian has also grown to considerable size. Since it was written more recently than some of the other pages, it may contain ideas not in the older pages. --EncycloPetey 17:48, 26 March 2008 (UTC)

Desired entries[edit]

In the "Noun form other than singular nominative" section of the policy page, it currently states, "To use only for an already existing page in another language or for a very frequent form."  I think this is backwards.  The rare forms, not the freqent forms, are the ones people are going to go hunting for in the dictionary (I do, anyway).

I guess I think the better answer is to have complete declension tables and have Wiktionary's search be able to find those things, but until that happens, adding any form of a word should be encouraged, not discouraged. — V-ball 08:36, 26 November 2010 (UTC)

Rejzek 2015[edit]

I created the template {{R:Rejzek 2015}}, which is similar to {{R:Rejzek 2007}}. Besides the year there are two differences:

  1. It does not use {{pagename}} in the beginning, because some information can be referenced by entries with a different name than the Wiktionary page name. For example orba is referenced by the Rejzek's entry "orat" (which includes info on the expression "orba", too). There is an optional parameter to be filled with the name of the entry instead.
  2. ISBN and page number were added.

See also the documentation subpabe. --Jan Kameníček (talk) 18:42, 12 July 2015 (UTC)

I have removed the ISBN as excessive for unique identification. It is IMHO visual noise that the reader should not be presented with. Most reference templates in the English Wiktionary do not provide ISBN; I like that practive. --Dan Polansky (talk) 21:31, 12 July 2015 (UTC)
I strongly disagree. It helps the reader to find the book on the Internet. I often use ISBN when I search books and so I suppose that there are other people who do it too. Besides that, it was an optional parameter. So I will put it back. Jan Kameníček (talk) 22:27, 12 July 2015 (UTC)
It is uncustomary in English publications to provide ISBN in references; I checked the references sections of multiple English books including Gödel, Escher, Bach. I am okay with putting ISBN into a tooltip.
For a comparison of ease of finding a book, here's google:978-80-7335-393-3, and here's google:2015 Český etymologický slovník Rejzek. --Dan Polansky (talk) 07:07, 19 July 2015 (UTC)
Paper publications do not enable the reader to make full use of ISBN search: for example the publisher cannot link it to a seaching machine, or the reader cannot copy it by CTRL C/V, so no wonder that paper book publishers do not find it so useful, but this is not our case. I understand that you do not like it, but you do not have to use it, you can still use the other provided information to search the book. But it does not mean that other people, who are accustomed to using it, cannot use it either. It might not be a common pracise at English language paper publications, but it is a common practise at en.wiktionary (and other Wikimedia project including English Wikipedia too). Jan Kameníček (talk) 16:51, 19 July 2015 (UTC)
It is not a common practice in the English Wiktionary to provide ISBN in reference templates. I admit that, in attesting quotations, many editors are given to providing ISBN, and that I probably cannot do much about it. Nonetheless, the overwhelming majority of attesting quotations are provided without ISBN, fortunately.
As to the point that I do not have to use it, that is really irrelevant. The presence of ISBN increases the amount of material the eye has to scan through on a page. It makes the user experience for people like me much worse. --Dan Polansky (talk) 17:04, 19 July 2015 (UTC)
Well, you feel a problem if there are several more digits that your eyes have to scan, but I feel a problem if the digits are not there because I cannot use some common searching methods. I think that my problem is worse. Nevertheless, I asked at Beer parlour if the community could provide here more opinions. Jan Kameníček (talk) 17:33, 19 July 2015 (UTC)
The links I have posted above show that search methods that do not rely on ISBN are entirely adequate. Furthermore, I am okay with providing ISBN in a tooltip. --Dan Polansky (talk) 18:48, 19 July 2015 (UTC)
I include the ISBN in every reference template I create, provided there is one. I cannot think of a single reason to exclude such essential information, especially since MediaWiki automatically creates a link allowing the reader to find the book in a library or online bookseller. If anyone considers it "visual clutter", they're not obligated to look at it; if it isn't customary to include ISBNs at Wiktionary, we need to make it so. If anything, they should be required in both reference templates and citations for any work that has one. —Aɴɢʀ (talk) 17:40, 19 July 2015 (UTC)
I have provided that reason: it is visual noise. The "they're not obligated to look at it" argument is nonsense: people cannot avoid looking at visual noise presented to them. ISBN is not "essential information", and the referencing practice in the books I have checked confirms. --Dan Polansky (talk) 18:46, 19 July 2015 (UTC)

Template:cs-decl-noun[edit]

I added optional parameters to the {{Template:cs-decl-noun}}, which a) enable to add alternative forms and link them to their entries, b) add qualifiers behind the alternative forms. See Template:cs-decl-noun/documentation. If there are no objections, I will also ask some bots if they could add the optional parameters to the templates where needed, as can be seen e. g. at the dative singular of chlap (the two forms are added into a single parameter, which does not enable correct linking). --Jan Kameníček (talk) 19:52, 30 July 2015 (UTC)

I don't object to the above but it seems preferable to luacize the template. With Lua, parameters like "chlapovi, chlapu" could be automatically parsed by Lua and rendered into proper wikilinks, albeit without qualifiers. --Dan Polansky (talk) 21:12, 31 July 2015 (UTC)
I have nothing against this, but I am not able to do it :-( Jan Kameníček (talk) 22:24, 31 July 2015 (UTC)

Audio links categories[edit]

What is the difference between Category:Czech entries with audio links and Category:Czech terms with audio links? --Jan Kameníček (talk) 21:10, 1 August 2015 (UTC)

Category:Czech entries with audio links is an explicit category (not from the audio template) that should be removed. DTLHS (talk) 21:26, 1 August 2015 (UTC)
That is exactly what I thought, but I asked for sure. Thanks for the answer. Jan Kameníček (talk) 21:44, 1 August 2015 (UTC)

I nominated it for deletion. --Jan Kameníček (talk) 22:17, 2 August 2015 (UTC)

Czech "uncountable" nouns[edit]

It seems to me that the Category:Czech uncountable nouns is redundant to the Category:Czech singularia tantum. Czech grammar does not use this term, it uses only terms as collective nouns, hromadná, or material nouns, látková (which do not have a category here yet, but still fall under the broader category singularia tantum). Therefore I suggest to nominate the category Czech uncountable nouns for deletion. Jan Kameníček (talk) 22:37, 2 August 2015 (UTC)

Let us be careful when applying Czech terminology in the English Wiktionary. Trying to limit the grammatical terminology used in the English Wiktionary to describe Czech to English analogues of Czech terms used by Czech grammarians to describe Czech can all too easily do a disservice to the native English speaker. For a native English speaker, "uncountable noun" is a well understood concept: a noun for which a plural cannot be formed. There certainly are such Czech nouns and can be placed into the category. The term "collective noun" refers to the likes of "smečka", as per collective noun, but I am not sure it refers to the likes of "uhlí" or "listí"; maybe it does. Even if it does, "collective noun" is not a hyponym of "uncountable noun" per "smečka". The term "uncountable noun" occurs in Czech: An Essential Grammar, by James Naughton, 2006, and in Legal Translation and the Dictionary, by Marta Chromá, 2004. The following two searches do not suggest to me that "singularia tantum" is unequivocally preferable to "uncountable nouns" in reference to Czech: google books:Czech "uncountable nouns", google books:Czech "singularia tantum". An obvious disadvatange of "singularia tantum" is that it is a Latin term, less accessible than "uncountable nouns". --Dan Polansky (talk) 20:04, 3 August 2015 (UTC)
For the record: now that I have undone a premature category depopulation, it contains the following items: chudina, rákosí, uhlí, člověk. --Dan Polansky (talk) 20:28, 3 August 2015 (UTC)
You are right that the English term "collective nouns" is not the same as the Czech term "hromadná podstatná jména", which I did not realize.
Emptying the category "uncountable nouns" was not my primary goal, originally I only wanted to replace the parameter "uncountable" in the {{template:context}} for "singulare tantum", similarly as "plurale tantum" is used. As a result the category got empty and I suggested to delete it. It does not make sense to me to have both categories: singulare tantum and uncountables, and it does not make sense to me that one of them is a subcategory of the other.
As for the comprehensibility, when you use the parameter "singulare tantum", it shows text saying "singular only", which is very understandable, I think. If words like plavky are accompanied by text "plural only", than doubí should be accompanied by "singular only". This is my main point and if this is fulfilled, I do not care very much, if it is also added into the (imo redundant) category of Czech uncountables, or not (e. g. by adding the category manually at the end of the entry in square brackets). Jan Kameníček (talk) 00:53, 4 August 2015 (UTC)

Czech pronunciation module[edit]

I've created a module, Module:cs-pronunciation, to generate IPA for Czech entries, based on the Czech phonology and Czech orthography articles on Wikipedia. I think it's almost complete. Could someone more knowledgeable about Czech take a look at it and let me know if anything is missing? — Eru·tuon 09:13, 16 February 2017 (UTC)

@Erutuon Hi. I think there is the tie in [t͡ʃ], [t͡s], and [d͡ʒ] missing. For the usage of the tie see e. g. w:Czech phonology or Šimáčková et al.: Czech Spoken in Bohemia and Moravia. I can see no other issues. --Jan Kameníček (talk) 19:06, 16 February 2017 (UTC)
@Jan.Kamenicek: Ah, I removed the tie because I figured it was unnecessary. It is omitted in English transcriptions (for instance, choose is transcribed /tʃuːz/, not /t͡ʃuːz/). But I can add it back. — Eru·tuon 19:39, 16 February 2017 (UTC)
I believe it is useful because in some cases (though not very frequently) they can be pronounced as two separate phonemes, e. g. t and ʃ, such as in the word podšitý (/ˈpotʃɪtiː/). Compare podšít (/ˈpotʃiːt/) and počít (/ˈpot͡ʃiːt/). --Jan Kameníček (talk) 19:48, 16 February 2017 (UTC)
I prefer that there be no tie. The cases of ambiguity without the tie are rare. --Dan Polansky (talk) 13:08, 18 February 2017 (UTC)
And "podšít" can be marked up using syllable divison, like IPA(key): /pot.ʃiːt/. Thus, tʃ would mean č unless syllable division would be used. This would lead to simpler typography. A related discussion is at User_talk:Jan.Kamenicek#Czech IPA for č. --Dan Polansky (talk) 13:19, 18 February 2017 (UTC)
They are not that rare, several examples written in a minute are: podšít (podšitý, podšívka), podšálek, podstavec (podstavit), nadstavit, podsekretář, odstavit (odstávka), odsloužit, Potštát, podsouvat, podseknout, odšťavit, odsunout, odšoupnout and many more... --Jan Kameníček (talk) 16:38, 18 February 2017 (UTC)
My conservative guess is that the ratio of the number of these to the number of uses of š and č is less than 1 in 100; a less conservative guess would be 1 in 1000. In any case, the syllable division proposed above is able to deal with them. --Dan Polansky (talk) 17:11, 18 February 2017 (UTC)
@Dan Polansky: I would be glad to add a syllable division function to Module:cs-pronunciation, and it would be easy to simply add syllable breaks for the sequence /t.s/; but to handle other consonant clusters, I would need a set of rules. — Eru·tuon 21:36, 18 February 2017 (UTC)
I am afraid this is not possible. E. g. with words like vystrčit the break is in front of -str (because of the prefix vy-), while in others it can be inside the cluster of consonants like in kostrbatý. What is more, there are often more ways how to divide the word into individual syllables: hospoda can be divided both ho-spo-da or hos-po-da. I am afraid this cannot be solved automatically. It would also be redundant, because the possible divisions are usually shown using the template {{hyphenation}}. --Jan Kameníček (talk) 23:06, 18 February 2017 (UTC)
Perhaps a hyphen could be added to the respelling to show where the syllable break should be. Then the module would replace it with the syllable break mark . in the output. In that case, {{cs-IPA|vy-strčit}} would yield IPA(key): [ˈvɪ.str̩.t͡ʃɪt]. — Eru·tuon 23:51, 18 February 2017 (UTC)
This would be a technical solution if we decided that we would use the syllable break only in some cases, like when we need to distinguish (with the syllable break between the two phonemes) and č. I do oppose such a solution for the reasons written above, I believe that using the tie is a purer solution. Syllable breaks are shown on a different line of the pronunciation section. It would not be good to use if for all words. E. g. with kostrč we would have to write two different transcriptions IPA(key): [ˈko.str̩t͡ʃ], ˈkos.tr̩t͡ʃ, which is unnecessary, because the actual pronunciation is the same. --Jan Kameníček (talk) 23:59, 18 February 2017 (UTC)
Huh. How do we know that there are two possible syllabifications of kostrč, if the distinction has no effect on the actual pronunciation? — Eru·tuon 00:44, 19 February 2017 (UTC)
We know that there are 2 syllables, but the border between them is simply not clear. If the border has to be specified in such words for some reason, there are sometimes more options where the border can be determined. It is typical for clusters beginning with s or š, the border can be determined voluntarily either before s/š or after s/š. It is discussed for example at CzechEncy where they give an example of čeština: češ-ti-na or če-šti-na. A page of Masaryk University on phonetics and phonology gives an example of hrstka: hrst-ka, hrs-tka or hr-stka. --Jan Kameníček (talk) 01:11, 19 February 2017 (UTC)
Ahh, so it's somewhat like ambisyllabic consonants in English. Butter can be syllabified as /bʌt.ɚ/ (because the checked vowel /ʌ/ is supposed to only occur in closed syllables) or /bʌ.tɚ/ (so that the syllable has an onset). The module could add syllable breaks in clear cases, and omit them in these uncertain cases. (Perhaps the module could also automatically generate hyphenation too.) — Eru·tuon 02:04, 19 February 2017 (UTC)
I personally am not very fond of unsystematic edits, so I would prefer not to use syllable breaks at all.
As for the hyphenation: it should be possible in principle, because there are rules, but they are quite many, see Internetová jazyková příručka. Possible problems could be words like vystrč, which has just one possibility vy-strč (because of the prefix vy-), and kostrč, which can be hyphenated ko-s-trč. --Jan Kameníček (talk) 02:20, 19 February 2017 (UTC)
@Erutuon: In fact the hyphenation rules mentioned at Internetová jazyková příručka are (with some exceptions) the same as the rules for dividing words into syllables, from which it can be seen that implementing syllable divisions into the module would be very difficult. --Jan Kameníček (talk) 13:03, 19 February 2017 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── Let me add that the same problem as in Czech podšít in English batshit (sorry for the vulgar example), where the markup is IPA(key): /ˈbæt.ʃɪt/. Therefore, it seems that Czech does not differ from English as for the existence of the problem (tʃ vs. t.ʃ) and the applicability of the solution. As for how to automate it, it should be quite simple: once the code sees "tš", that is, t and š next to one another, the code should place a syllable break between t and ʃ; if this rule does not work universally, maybe it can be refined to work, or - can be used as proposed above to let the user do manual markup. --Dan Polansky (talk) 13:30, 19 February 2017 (UTC)
And again, my preference is not to worry about syllable division unless it is necessary for disambiguation of tʃ and ts. This seems similar to what is currently being done for English, that is, e.g. there is ˈpəːtɪnənt without syllable break, but there is ˈbæt.ʃɪt with a syllable break, although I have seen some other English entries that do use syllable breaks, e.g. ˈtɛmp.lət. --Dan Polansky (talk) 13:36, 19 February 2017 (UTC)
Other English entries that need disambiguation include courtship, hotshot, nutshell, nutshot, outshine, outshout, outshop, and outshoot. --Dan Polansky (talk) 13:48, 19 February 2017 (UTC)
Looking at what Germans are doing: Klatsch is marked up as "klatʃ", while de:Klatsch is marked up as "klaʧ", so they use the ʧ ligature. I must have read somewhere that the ligature markup was proposed in IPA and then discontinued. Tratsch has tʀaːtʃ. --Dan Polansky (talk) 13:58, 19 February 2017 (UTC)
Yes, the ligature was abandoned in favour of the tie above t and ʃ.
As for the automation: I am afraid it is not that simple as two variants of pronunciation are possible if the phonemes appear between the root and the suffix, see větší [vjɛtʃiː] and [vjɛt͡ʃiː] (applies also for its declensions and other forms), lidský [lɪtskiː] and [lɪt͡skiː] (+ decl.) , lidštější [lɪtʃcɛjʃiː] [lɪt͡ʃcɛjʃiː] (+ decl.), dětský, dětštější, kratší, čistší, studentský, mladší, většina (all including declensions) and others... [1] [2] --Jan Kameníček (talk) 14:11, 19 February 2017 (UTC)
Does any contemporary linguistic literature on Czech phonology distinguish between t͡ʃ and tʃ using the syllable break? I have never seen it anywhere, and the reason is that syllable break is not meant to help distinguish between phonemes. That should be done using proper IPA characters for individual phonemes. --Jan Kameníček (talk) 14:26, 19 February 2017 (UTC)
I don't know what contemporary literature on Czech phonology does, but I know I like what the English Wiktionary does for English. The disambiguation argument you are using would apply to English as well, as I pointed out above. The tie screams "workaround" to me much more than a sporadic use of syllable mark does. --Dan Polansky (talk) 14:59, 19 February 2017 (UTC)
No, you are mistaken. The tie is an official IPA symbol which is meant to do exactly what we use it to do. If we do not want to confuse readers reading entries with Czech phonology, we should follow the customs and rules kept in the literature on phonology of Czech language. --Jan Kameníček (talk) 16:26, 19 February 2017 (UTC)
I report to you, honestly and accurately, that the tie is screaming workaround to me, based on observing my mental state. More to the point: I do not deny that the tie is an official IPA symbol, but it is an ugly fix added later, after using ligature was considered. I do not see why we should follow the literature on phonology of Czech literature rather than what is customary in the English IPA markup (you have not addressed the point that English has the same problem), but even then, Bičan[3] uses a tie that is below, and Bičan 2008[4] uses ligature instead of the tie. It is not obvious that all such literature uses the tie; what search did you do to find as many and as varied items from that literature as possible, and what items did you find? --Dan Polansky (talk) 09:32, 25 February 2017 (UTC)
One another item: Krčmová's chapter 2.3 Transkripce[5] in Fonetika a fonologie, 2009, uses ligature and does not use tie. --Dan Polansky (talk) 09:54, 25 February 2017 (UTC)
I do not know why you keep arguing with your feelings (such as screaming workaround to me) instead of with what is used. In the context of Czech language the syllable marks are not typically used. English Wiktionary accepts using the ties for foreign languages, for example Latin entries use them frequently together with syllable marks, and there are reasons to do the same in Czech entries. Yes, some literature uses ligatures because they were dismissed only a short time ago and some authors may have been using them for some time after that. I am personally not against ligatures too, but the Wikiprojects tend to abandon them.
By the way, there are some words like dštít [tʃciːt] where the syllable mark workaround would not help any way. The only way how to show the difference in pronunciation between dšti [tʃcɪ] and čti [t͡ʃcɪ] is the tie (or the ligature). --Jan Kameníček (talk) 14:50, 25 February 2017 (UTC)
We are not arguing facts about pronunciation but rather a convention for marking it up. Therefore, a personal preference does play a role, whether mine as an English Wiktionary contributor or the authors of the works that use the tie or other means. I have a general disregard for authority as a matter of principle. Of course, if IPA would see tie as mandatory, then we would have to use it to provide what is properly IPA, but that is not the case as far as I know. You keep on repeating that the tie is used without supplying a list of works that actualy use the tie, whereas I provided at least one example above that does not use the tie. As for "dštít", that seems to be an argument I do not know how to deal with right now; I thought it would be pronounced "dš" rather than "tš", but I am not so sure. --Dan Polansky (talk) 15:07, 25 February 2017 (UTC)
As for wikiprojects abandoning ligatures: which wikiprojects used ligatures and then abandonded them? --Dan Polansky (talk) 15:15, 25 February 2017 (UTC)
When talking about conventions, i. e. generally accepted standards and norms, then we have to have a look at what the convention is in our context (IPA transcription of Czech language) and forget about personal feelings.
A quick example where ties (and ligatures) are used: http://fonetika.ff.cuni.cz/o-fonetice/foneticka-transkripce/transkripce-cj-ipa/ . It is clear that ligatures are also still used, although they are not an official IPA symbol anymore. I have nothing against them, although I slightly prefer using only official symbols.
I have often seen replacing ligatures with ties in English Wikipedia and Czech Wiktionary also prefers ties.
The official IPA chart says: Affricates and double articulations can be represented by two symbols joined by a tie bar if necessary. I have shown numerous examples proving that in Czech language it is necessary. IPA chart offers just ties above or below the two characters. Some publications on Czech phonology use the ties and also (dated?) ligatures. Therefore I suggest ties. --Jan Kameníček (talk) 20:11, 25 February 2017 (UTC)

Jan Kameníček: I also dislike having inconsistency in the marking of syllable breaks, but I would rather have some syllabification than no syllabification, because it makes transcriptions easier to read for someone who doesn't speak the language like me. It breaks the transcription into meaningful units, rather than it appearing as a mass of undifferentiated symbols. — Eru·tuon 04:34, 20 February 2017 (UTC)

It might be useful to take the syllabification or hyphenation rules (linked to in Module talk:cs-pronunciation) and to translate them into module code, because it would save the effort of editors to know the rules and apply them. Unfortunately, I can't understand Czech.

I was thinking perhaps the module could show variant syllabifications in a collapsible list, so that they do not clutter up the entry, but they are still available if someone wants to see them. — Eru·tuon 20:42, 21 February 2017 (UTC)

I cannot imagine what it would look like. Could you show an example, please? --Jan Kameníček (talk) 21:07, 21 February 2017 (UTC)
It could look something like the box below. If there is only one possible syllabification (as with words consisting of CV syllables), the syllabification could be displayed as the main transcription, and the collapsible box could be omitted. — Eru·tuon 21:54, 21 February 2017 (UTC)
Pronunciation
Hmmm, looks fine. --Jan Kameníček (talk) 21:57, 21 February 2017 (UTC)
I think two IPAs in one row would not really clutter the entry, certainly no more than the English entries are cluttered with their U.S. and U.K. pronunciations. What I mean is something like this: "IPA(key): [ˈɦos.po.da], [ˈɦo.spo.da]". Are there words that would need more than two IPAs because of syllabification? --Dan Polansky (talk) 12:00, 25 February 2017 (UTC)
Yes, there are, and they are not rare, e. g. [ˈno.stri.fi.ko.vat], [ˈnos.tri.fi.ko.vat] and [ˈnost.ri.fi.ko.vat]. Another one: [ˈhr.stka], [ˈhrs.tka] and [ˈhrst.ka]. A more compplicated one: [ˈpro.sto.pá.šný], [ˈpros.to.pá.šný], [ˈpro.sto.páš.ný] and [ˈpros.to.páš.ný]. Jan Kameníček (talk) 14:27, 25 February 2017 (UTC)
Could this be marked up as [ˈpro.s.to.pá.š.ný]? The implication would be that any two syllable marks separated by a single letter are a pair of alternatives to choose from. How does this existence of a multitude get handled when {{hyphenation}} is used? --Dan Polansky (talk) 14:45, 25 February 2017 (UTC)
I know such a principle is used to indicate hyphenation. Is there any precedent of such syllable indication? --Jan Kameníček (talk) 20:24, 25 February 2017 (UTC)
I think that would imply that the [s] and [ʃ] were syllabic ([s̩, ʃ̍]), something like Mandarin syllabic fricatives. — Eru·tuon 22:16, 25 February 2017 (UTC)

Tie, syllabification and the test module[edit]

There is no agreement on the usage of tie and syllabification, so the test module will invariably fail. Adding automatic syllabification doesn't seem practically possible without manually supplying syllable boundaries, so the tie should be preserved as well.--Anatoli T. (обсудить/вклад) 20:32, 19 February 2017 (UTC)

That's not an argument for tie. The relatively rare case where disambiguation is required can be entered manually by providing a respelling that uses "-" to mark syllable separation. It is not clear what should be done in the absence of consensus, whether omit the tie or preserve it. A minimalist defaulting would lead to no tie, but I am not sure a minimalist defaulting in the absence of consensus is generally acceptable. --Dan Polansky (talk) 09:20, 25 February 2017 (UTC)

Conjugation tables[edit]

There is quite a mess in the conjugation sections of the entries on verbs.

  1. Sometimes they try to explain the whole grammar connected with the specific verb, like in ovládat. I believe this is a wrong attitude, because Wiktionary is a dictionary and not a grammar book. The table is too complicated, which is the reason why the template was not created for other verbs as well. Conjugation of Czech verbs is even much less regular than it may seem and we would either need many templates to cover all possible variations, or we could use just one general template where all the tenses, conditionals and so on would have to be filled manually one by one (extremely exhaustive). Most of the "forms" in the conjugation template used with ovládat are a combination of real forms of the verb with auxiliaries, like byl bych ovládal, meaning "(I) would have controlled". Entries on English verbs do not show conditionals and alike, and so I believe there is no reason to show them in entries on Czech verbs.
    Additional comment: The current conjugation table used e. g. with the verb ovládat is not very consistent either. It lists both past participles and passive participles, but only past participles are listed also combined with auxiliaries, while passive participles are not. Thus the table is full of e. g. various active conditionals (such as "byli bychom ovládali"), while passive conditionals (like "byli bychom ovládáni") are missing completely. The same applies for transgressives: the table offers space only for active transgressives (like "ovládav") but not for passive ones (e. g. "byv ovládán"). There are two ways to make it consistent: 1) blow the table even more, or 2) get rid of all the combinations with auxiliaries. --Jan Kameníček (talk) 00:38, 13 March 2017 (UTC)
  2. Sometimes they have a simple conjugation table like the one in the entry prosit. This is used most often. The main disadvantage is that the table does not show all forms of the verb (like passive forms). Besides that it does not show the difference between some past forms like prosili/prosily/prosila (this could be solved by adding the info into the table, but it would make the table more complicated again).

For these reasons I suggest to change the attitude and to show all existing forms, but not their combinations with auxiliaries. An example what it may look like for the verb psát:

Conjugation

--Jan Kameníček (talk) 19:39, 1 March 2017 (UTC)

I am interested particularly in opinions of the following people: @Dan Polansky, Droigheann. --Jan Kameníček (talk) 23:21, 12 March 2017 (UTC)
Personally I don't mind large tables, as long as they are collapsible (like en-wikt French conjugations) or on a separate page (like fr-wikt French conjugations). Question is, how far do you want to go in the direction of "user-friendliness" and how far in "being academic". Somebody reasonably acquainted with the language indeed only needs to know that e.g. the neuter passive participle of psát is psáno to deduce that the future tense passive mood is bude psáno, but I guess that many a learner would prefer a conjugation table to have the full form bude psáno to its having forms like byli bychom ovládali or psav, which I suspect 99% of native speakers never came across outside high school classrooms. (Frankly, I would expect anybody advanced enough to care about přechodníky to look them up in cs-wikt.)
I would probably prefer something like this, possibly expanded for passive forms, but given I don't need these tables as a reader and don't intend to use them myself as an editor (too much like work whatever the template) I don't advocate this, just giving you my opinion since you've asked for it. --Droigheann (talk) 23:18, 14 March 2017 (UTC)
We have to decide how much grammar should Wiktionary teach. It is the same for English: some people might not be able to deduce the future tense passive mood will be written just from the fact that past participle of write is written. Despite that entries on English verbs do not contain such grammar tables.
As for the table from fr.wikt: it is quite clearer and better arranged, but it does not show some less common forms (transgressives). It offers grammar structures similarly as our tables, but again, some less common ones (past conditionals) are missing, which is confusing. I believe that we should decide what we want to show (forms or grammar) and than show it in as a complete way as possible. The fr.wiki table may mislead the reader that there are no past conditionals in Czech, which is not true. It is better not to show the grammar than showing it incompletely and thus confusing the readers.
I also believe that it should be unified with the attitude of the English part of en.wiktionary. --Jan Kameníček (talk) 00:13, 15 March 2017 (UTC)
I think having five collapsible sections is not so nice. If this can be reduced to two collapsible sections, it could be okay. The first collapsible section could contain the most commonly used forms and information that is supposed to be in the layer 1 of importance, as it were. Transgressives are an example of what does not belong to layer 1, I think, being an archaic feature. Similarly, the section on conditionals now present in psát via {{cs-conj-psát}} basically reuses the past participles, and seems outside of layer 1. By contrast, present forms, imperatives and past participles are all part of layer 1, in my view.
fr:Annexe:Conjugaison en tchèque/psát looks fine; that could be the layer 1 collapsible table. --Dan Polansky (talk) 09:35, 18 March 2017 (UTC)
In my understanding all the forms except the transgressives would go into the layer 1 of the importance (it may seem that passive participle psán is not used very often, which is true, but passive participles of many other verbs are used very frequently). I am also not sure how the two layers should be named. So, if we decided not to have 5 collapsible sections, than we could put them all into one.
I think this solution is also not bad, although I would slightly prefer the 5 separate sections so that the readers could open and close just the section they need. What do you think? --Jan Kameníček (talk) 15:40, 18 March 2017 (UTC)
I prefer the linked French template fr:Annexe:Conjugaison en tchèque/psát to the one you just posted, since yours is very narrow, which it compensates by being rather tall; it is so tall that it does not fit a single screen on my notebook while the French one does. OTOH, one could argue that the narrow template is better for mobile devices. I also like how the French template is formatted as one table and not a series of tables with varying column widths. Furthermore, I like how the French template uses much less boldface. Passive participle psán probably belongs to layer 1, agreed. As for the number of collapsibles, I find 5 collapsibles too many, especially if each collapsible is to contain only a small table.
If we should go for the narrow table version, it can still be improved by being formatted as one table, which removes part of the vertical whitespace. As per my preference, it would be improved by using less boldface, perhaps using italics instead or just a different background color. In fact, the table headings could be just in normal font: they can stand out by having no hyperlinks, unlike the data cells. --Dan Polansky (talk) 16:31, 18 March 2017 (UTC)
I understand. The fr.wiki table includes an incomplete list of various combinations of verb forms with auxiliaries and thus it is not compatible with my suggestion. However, I tried to modify it, so that it included less vertical space. (I also changed the model verb from "psát" to "nedopsat" to show what it would look like with a longer verb.)
Is this better? --Jan Kameníček (talk) 18:57, 18 March 2017 (UTC)
Thank you, I like your above proposal.
One thing I wonder about is the future tense. Is this something to be completely omitted? Or could it be covered in a summary note below the tables, like "The future tense is created by combining budu, budeš, bude, budeme, budete or budou with psát? That would be for the imperfective "psát", not for "dopsat". --Dan Polansky (talk) 11:39, 19 March 2017 (UTC)
Thanks. Now I will have to think about how to make an easy-filling template.
I am not sure about the future tense. I understand the arguments for it, but if it is added, readers might start wondering, why the future active combining "budu" + infinitive is explained, while future passive combining "budu" + passive participle is not. There is also the problem with perfective and imperfective verbs: imperfective verbs (though I am not sure if all of them) use the present forms to express the future tense and in fact do not have the present tense, which would have to be explained too. Beside that there are also numerous present passives, conditionals (including present, passive, past and past passive conditionals) and passive transgressives to be explained too. English entries do not explain the grammar which makes the situation much easier. However, I will keep it in mind, because mentioning these things can be useful, and if I come to some easy solution, I will try to deal it in the next step. --Jan Kameníček (talk) 14:58, 19 March 2017 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── You are right that English entries do not explain the English future tense will, and the tense grammar in general, nor do they explain conditionals and such. That's a good point. However, the question remains whether this is the most user-friendly option. German machen contains a collapsible table "Composed forms of machen", which I find to be a quite a good solution. --Dan Polansky (talk) 15:30, 19 March 2017 (UTC)
I see. As I said, I will think about it too. --Jan Kameníček (talk) 17:46, 19 March 2017 (UTC)

Conjugation template[edit]

I have prepared the conjugation template, see {{Template:cs-conj-forms}}.

I was thinking about two possibilities: either to have several templates for various conjugation classes or patterns (nést, prosit…) or to make one universal template. The first option seemed to have the advantage of lesser parameters, but I am afraid that various irregularities would either recquire adding more parameters anyway or creating other special templates, and I liked none of these options. E. g. skákat should "oficially" follow the class I, pattern péct, but forms following class V (skákají) can in fact be seen too. So I decided for one universal template. One of its advantages is that the user does not have to explore which template to choose, which diminishes the possibilities of some mistakes, which I have seen before. I think the basic usage is simple and only some verbs need some additional parameters. --Jan Kameníček (talk) 00:28, 21 March 2017 (UTC)