User talk:Ivan Štambuk/Archive 6

From Wiktionary, the free dictionary
Latest comment: 14 years ago by Vahagn Petrosyan in topic urnek
Jump to navigation Jump to search
Archive

Archive


1 2 3 4 5 6 7 8 9 10

ratna

One more etymology request: for ratna, which is also one of the five Buddhist principles. --EncycloPetey 05:47, 1 November 2009 (UTC)Reply

intriguing...

Ivan, can you believe Interwicket was adding sh interwiki links - see this edit. When I requæsted from Robert Ullmann some 4 or 5 months ago to add the due functionality so that the bot can handle sh interwikis, it was rejected. Then I believed it needed some adjustment in order to provide sh interwikis, but now it turns out the bot owner has made adjustments in order to discontinue additions of sh interwikis and that the bot had been able to add them in the past until some woeful vicissitude betid it. Notwithstanding, I have no clue as to when this happened. But evidently not only you can change your mind (with reference to a certain section from the BP). The uſer hight Bogorm converſation 18:43, 1 November 2009 (UTC)Reply

LOL, I had no idea that Interwicket ceased adding sh: interwiki at some point. At any case, good to have it back ^_^ --Ivan Štambuk 19:01, 1 November 2009 (UTC)Reply

posco

Another user derived this LAtin verb from Sanskrit, which I assume means it's from PIE. Could you correct the etymology? --EncycloPetey 02:01, 2 November 2009 (UTC)Reply

consensus please

Considering that there is -no- consensus at present to delete Bosnian, Croatian, and Serbian sections when you add Serbo-Croatian, would you please stop doing that? And to anticipate one argument from you, it does not matter if you added the material originally. No one owns edits and in the absence of consensus, what you do does constitutes deletion, not revision. If you were to change your mind again, and consensus has not been achieved in the interim, you'd be just as wrong to delete Serbo=Croatian and replace with Bosnian, Croatian, and/or Serbian sections. If the existing entries are inadequate, by all means bring them up to standard as you do your edits. — Carolina wren discussió 05:03, 2 November 2009 (UTC)Reply

There is consensus among all Serbo-Croatian contributors. It matters great deal who added the material originally. I have abs. no desire to waste my precious time onto updating 8-11 different sections where I can do it in 2. --Ivan Štambuk 05:05, 2 November 2009 (UTC)Reply
This is one Wiktionary, not multiple ones. consensus among a subgroup of editors is not consensus. No one is forcing you to delete Croatian, Bosnian, or Serbian sections. — Carolina wren discussió 05:15, 2 November 2009 (UTC)Reply
Consensus among the relevant group of contributors. Why should I care what trolls think, imagine to think, or pretend to think? I am adding and expanding entries in one language - Serbo-Croatian, and cannot simply pretend that this separate B/C/S/M nonsense "doesn't exist". I expand and merge entries I and other SC contributors supportive of the unified treatment created (that is, 99% of them), and am not touching those created by nationalists. --Ivan Štambuk 05:21, 2 November 2009 (UTC)Reply
Hypothetically, those nationalists were satisfied by existing entries as they were and chose not to touch them, but that's beside the point. No editor or group of editors owns edits. You're not under any obligation to do anything to existing B/C/S sections as you add S-C sections. If there are errors in existing B/C/S sections, fix them if leaving the errors in place bothers you, but deleting the sections is not an option since there is no consensus on the Wiktionary as a whole to do so. — Carolina wren discussió 05:40, 2 November 2009 (UTC)Reply
So I'm supposed to simply play dumb and pretend not to see them? Sorry, they hurt my eyes. As I said - I have no desire cloning my edits to additional 8-11 sections, and I'm not touching entries created by nationalists (as they do not touch SC entries apparently). These cloned entries are worthless and misguiding, and should be obliterated as not to confuse our users. --Ivan Štambuk 05:48, 2 November 2009 (UTC)Reply
Then either try again to gain Wiktionary-wide consensus to obliterate them or send the entries to RFD. Until then obliteration is against policy. — Carolina wren discussió 15:27, 2 November 2009 (UTC)Reply
Consensus is not going to be reached while people like Ullmann spread their noxious fud. Kindly stop wasting Ivan's time and go learn more Catalan. — [ R·I·C ] opiaterein15:40, 2 November 2009 (UTC)Reply
I know this debate seems like it will never end Opiaterein, but please refrain from the snide personal remarks. Carolina has been kind enough to keep things civil so I hope you will too. --Bequw¢τ 20:18, 2 November 2009 (UTC)Reply
With all do respect Bequw, Opiaterein is kinda right; while Carolina wren is not doing the wrong that RU was/is all this "red tape"-ish policy stuff is something we can do without...Ivan has already written paragraph upon paragraph of information here on Wiktionary about why it makes sense to delete B/C/S in favour of SC, and the main problem seems to be that too many people who are ignorant of the matter are for some reason believing that RU is the "right" one even though as Ivan has said he cannot speak Serbo-Croatian or any Slavic language. And likening the actions of Ivan and like minded contributors to (and I quote myself) "Serbian supremacist-related genocides" does not help Ivan's reputation in the eye of the random passerby. 50 Xylophone Players talk 21:27, 2 November 2009 (UTC)Reply

(unindenting) Snide though I may have been, I was quite serious. With all the time all these people spend complaining about Serbo-Croatian, they could be spending time improving their own areas of work. I don't think our Catalan coverage is particularly impressive as of the start of this "discussion". Trying to exert all this influence in areas in which they have no knowledge while they could be doing something productive? But I suppose when Valencians finally succeed in convincing people that their language is different from that of Barcelona, we'll have at least 3 separate headers for Catalan dialects as well. — [ R·I·C ] opiaterein02:13, 3 November 2009 (UTC)Reply

etymology section

Hello, Ivan. I would like to ask you whether there would be any problem, if I add Danish tyst as a cognate to Sanskrit तूष्णीम् (tūṣṇīm)? The source is here and it states explicitly that Old Norse is not akin to tyst (which is akin to the Sanskrit word, ergo ON. is not akin to Sanskrit, right?). Therefore adding Old Norse is not an option. Since there are no other Germanic cognates (ODS would have listed them, if any), I hope that adding Danish would not be a faux pas from my side. What do you think about schwelen and सुरति (surati), Old Norse svalr? There is English swale, derived from ON., and German schwelen, but they are modern terms. So, just Old Norse in सुरति (surati) or ON and Eng. or all three? The uſer hight Bogorm converſation 08:31, 2 November 2009 (UTC)Reply

What Old Norse word that is not cognate to tūṣṇī́m ? I can't read Danish :) Sanskrit tūṣṇī́m has cognates in II, but outside it it's very obscure (possibly Hittite tuḫušii̯ae- "to await, to wait and see" and Old Prussian tusnan "still", but all problematic). I'd rather not postulate I-E cognates at all.
ON svalr, OE swelan etc. - OK to Sanskrit root, but rather link to the spelling स्वरति (svarati) "to shine", this √sur is some kind of obscure variant root that got derived later and with mixed meanings. Also OK to Lithuanian, but not sure about Greek (check what Beekes says on that word).
Your etymologies are getting increasingly complicated lately! :) --Ivan Štambuk 12:25, 2 November 2009 (UTC)Reply
ODS is unambiguous about the kinship between tyst and तूष्णीम् (tūṣṇīm) - besl. med = beslægtet med = akin to. Further is explained that the Danish word vistnok i nogen grad sammenblandet med et ubesl. ord, der foreligger i oldn. tvistr = perhaps to a certain extent blent with a not-akin(unrelated) word, which is præsent (to be found) at ON tvistr. This means that tyst is akin to तूष्णीम् (tūṣṇīm), but underwent ON influence, whose origin is not akin to तूष्णीम् (tūṣṇīm). Well, the source is at hand, the kinship is mentioned in tyst, so if there are only uncertain IE cognates and the Danish word is the only certain one (ODS is the biggest and most authoritative dictionary of the Danish language, in which I have full confidence), why not add it? If you are curious or dubitative about some the etymologies added by me, you may add Template:rfv-etymology and I shall provide them with the source. The uſer hight Bogorm converſation 13:08, 2 November 2009 (UTC)Reply
I don't doubt ODS's veracity when it comes to Danish, but IE etymology has made lots of progress since the time ODS etymologies were compiled. As I said, I cannot find anything to refute the cognation of Danish tyst and Sanskrit tūṣṇī́m as mentioned in ODS. I suppose there is no harm in adding if it it's sourced (and if better etymology eventually pops up by some Danish etymologist, we can always replace it). Feel free add the etymologies without restriction though, I don't think I'll be checking Danish words anytime soon, but perhaps some of the other Scandinavian editors gets interested (I noticed that lots of folks here and on Wikipedia speaking Scandinavian languages are very interested in etymologies.) It's better to have possibly a bit outdated etymology than none at all. --Ivan Štambuk 13:22, 2 November 2009 (UTC)Reply
Thus? It is souced, but I asked you, because once you had advised against diaschronic cognates in the etymology sections of ancient languages. I was asking not about the tyst#Etymology section, but about तूष्णीम्#Etymology, because you are the main contributor of Sanskrit and were particularly disproving of diachronic cognates. I am glad that you permit this rare exception, as no other certain cognates are at hand. The uſer hight Bogorm converſation 13:32, 2 November 2009 (UTC)Reply
Well if that Danish word is the only surviving cognate in the whole Germanic branch, with no ancient cognates at our disposal left to be listed, than there is no harm in mentioning it. Normally we would of course mention modern Scandinavian and Germanic cognates together, and compare Sanskrit lexeme to that of ON, Gothic, OE and similar dinosaurs. --Ivan Štambuk 13:37, 2 November 2009 (UTC)Reply

teli, telani

Do these words exist in Serbo-Croatian? They are supposed to mean “thread”, stolen from Turkish tel, who perhaps stole it from us. I can't find them in my sh dictionaries. --Vahagn Petrosyan 12:19, 3 November 2009 (UTC)Reply

All I could find is teli/тели, terli/терли, borrowed from Turkish telli (wired, stringed, funicular), which in turn might be derived from tel (probably is, but you should better check somewhere). --Ivan Štambuk 14:45, 3 November 2009 (UTC)Reply
Ivan, I think Vahagn is looking for тел/tel - vol. 3, p. 454 in Skok's dictionary. It means wire (at least in Bulgarian, in SC too ? ) and is of Armenian origin, something not mentioned by Skok. Is it current or dialectal? The uſer hight Bogorm converſation 15:24, 3 November 2009 (UTC)Reply
OK, it's recorded, but it doesn't make appearance in the literature in the last century or so (except for some derived terms like above). I'd add it if I could find any quotation, but the volume of Akademijin rječnik for letter <T>, which apparently lists tel (and contains quotations for almost all the headwords) is not unfortunately available at the Internet Archive. I'll make a mental note for this word if I happen to come across it. --Ivan Štambuk 15:43, 3 November 2009 (UTC)Reply
Sweet. Thanks. --Vahagn Petrosyan 08:17, 8 November 2009 (UTC)Reply

cognate, akin

Hello, Ivan. I need some help for the etymology sections in sh wiktionary where I am also contributing. cf. is upor., but how is cognate with or akin to in Serbo-Croatian? Please give it with the subsequent præposition. Old Norse is staronordijski, right? Proto-Slavic - praslovenski? Dutch is nizozemski or holandski? (I am particularly fond of nizozemski, since it is not a loanword and I am a staunch purist ^_^). The uſer hight Bogorm converſation 11:54, 4 November 2009 (UTC)Reply

One more detail: what is in Englishcf. Russian x is in Serbo-Croatian upor. ruski x or upor. rusko x? The uſer hight Bogorm converſation 12:02, 4 November 2009 (UTC)Reply

cognate with or akin to = srodno s(a). Old Norse = staronordijski, correct. Proto-Slavic = praslovenski (B/S/M) or praslavenski (C). nizozemski better than holandski (even though holandski is much more used, even in purist Croatia). IMHO, the best would be however to use abbreviations for language names, i.e. stnord. for ON, psl. for PSl., niz. for Dutch and similar. --Ivan Štambuk 15:52, 4 November 2009 (UTC)Reply

User:Drago

Hi, Ivan. Recently the main SC contributors here agreed to have their entries renamed as SC. However, I noticed that User:Drago who has not turned up here for a long time has also made some such as prati (etymology by me). Do you think he would settle for SC header? Might I change the header? The uſer hight Bogorm converſation 14:14, 9 November 2009 (UTC)Reply

He'd probably agree. I could e-mail him for a confirmation if that would make you feel more comfortable. XW--Ivan Štambuk 16:52, 9 November 2009 (UTC)Reply
This is not necessary. I confide in your judgement. The uſer hight Bogorm converſation 18:40, 9 November 2009 (UTC)Reply
I believe there would not have been much point in such activities anyway; from what I have read Drago was a fairly dubious contributor who added entries in many languages he did not really speak often with many errors. Just have a quick look through his talk page. 50 Xylophone Players talk 21:14, 9 November 2009 (UTC)Reply

doslije, dosli?

Vasmer mentions Serbo-Croatian dosle, doslije, dosli as cognates of Russian доселе and Slovenian doslej (hitherto), but I fail to find them. In Skok's dictionary, the similarity to the past forms of doći complicates the search. Does any of those words exist? If so, are they much rarer than their synonym dosad? Is this the Ijekavian, Ekavian and Ikavian issue as in mliko, mleko and mlijeko? The uſer hight Bogorm converſation 19:40, 9 November 2009 (UTC)Reply

Correct, they're synonymous with dosad ("hitherto"). Ekavian form dosle, Ijekavian doslije or doslje. I suggest that you download Рјечник српскохрватскога књижевног језика which contains plethora of such rather obscure terms with citations and accents (this particular term on p. 741 in the DjVu file). --Ivan Štambuk 20:10, 9 November 2009 (UTC)Reply
Ok, I shall follow your advice. The uſer hight Bogorm converſation 20:43, 9 November 2009 (UTC)Reply

ἐλαία, եւղ (ewł)

Man, you're popular these days. Is this Katičić something serious? I want to replace that ety with the one at ἐλαία.--Vahagn Petrosyan 21:28, 9 November 2009 (UTC)Reply

Seems obscure, there is no such thing as PIE *loywom, and even if it existed it wouldn't have been the ancestor form of neither Latin, Greek nor OCS word. Flibjib8 seems to have an appetite for quite obscure etymologies. I suggest removing that and replacing with some newer stuff (Katičić is old skul and out-of-touch with contemporary IE studies for several decades..) --Ivan Štambuk 21:35, 9 November 2009 (UTC)Reply
Here he is described as one of the most prominent Croatian scholars in the field of humanities. If the source is not misquoted, then why replace it? The uſer hight Bogorm converſation 21:58, 9 November 2009 (UTC)Reply
Ok, now I see the point - is the Ancient Greek word a descendant of PIE or of some Mediterranean source. This reminds me of the hassle about μακεδνός and its descendence from μῆκος or from some substratum... Perhaps both theories should be repræsented as there? Provided that the source was quoted properly of course, if one assumes good faith. The uſer hight Bogorm converſation 22:09, 9 November 2009 (UTC)Reply
It's not about authority, but about currency of scholarship that proposed such theory. If there is only Katičić propounding such (IMHO dubious) PIE etymology in a work from 1976, I don't think it merits inclusion. --Ivan Štambuk 22:25, 9 November 2009 (UTC)Reply

Request for translations

Hi there. You seem to know Serbo-Croatian pretty well, so could I ask you if you could add the hr and sh translations for these words:

leafstorm, snowstorm, hail storm, silver storm, duststorm, windstorm, sandstorm

Thanks in advance! Razorflame 21:48, 9 November 2009 (UTC)Reply

Only 2 of those are directly translatable AFAIK, the others should be translated by some kind of sum-of-parts phrase. --Ivan Štambuk 22:47, 9 November 2009 (UTC)Reply
Ok, thanks anyways for the help! Razorflame 22:48, 9 November 2009 (UTC)Reply

Adjective forms

Hi there. As you just saw, I made a few adjective forms for a few Serbo-Croatian words. Is that how you want them formatted, or would you like them formatted in a different way? Razorflame 08:05, 10 November 2009 (UTC)Reply

No please no more! I plan to add those by bot once I get inflection templates for adjectives running. There are lots of quirks with them so I haven't made them yet. The problem with your edits such as oskudniji is that they don't have case information: oskudniji is 1) nominative singular masculine comparative 2) vocative singular masculine comparative 3) nominative plural masculine comparative 4) vocative plural masculine comparative. I have a large database of inflected SC forms (> 1 million) on my computer and one day (when I feel like we have enough lemmata on Wiktionary) we'll upload them all at once. In the meantime, if you feel masochistic about adding SC inflected forms, you can use this website to generate them (use proba/proba for login), enter lemma (e.g. oskudan) hit "submit" and you'll get machine-readable output. --Ivan Štambuk 08:18, 10 November 2009 (UTC)Reply
Sure thing. I'll stop adding them. I figured that I'd ask you before I added any more. By the way, when you say that you are going to use a bot for inflections, does that mean for the nouns too, because I like adding the nouns :o) Cheers, Razorflame 08:20, 10 November 2009 (UTC)Reply
Yes, for nouns too. You can add inflected forms for nouns manually now if you wish to, because they have complete inflection. And some of the verbs too (but not too many, I have yet to tackle verbal inflection thoroughly). --Ivan Štambuk 08:37, 10 November 2009 (UTC)Reply
Ok. Cheers, Razorflame 08:38, 10 November 2009 (UTC)Reply

pruga

Why is this article entitled pruga and not prúga like it is stated in the declension table? Razorflame 03:01, 12 November 2009 (UTC)Reply

Acute accent mark is used to mark - accents :) See w:Pitch_accent#Serbo-Croatian. These are not normally written at all (except in same special cases, and usually reflecting higher literary register), but are important for proper pronunciation (orthoepy) because accent in SC is not phonologically predictable (i.e. you cannot guess on which syllable it falls, and whether it's rising/falling, short or long...at least not normally, tho it can be guessed in lots of general patterns). Without it you can't write proper IPA transcription (which I machine-generate from accented lemma). We use the same style of writing accent/stress marks only in headwords for a number of other languages: Russian, Bulgarian, Slovenian, Lithuanian, Latvian and Sanskrit (this last one usually only in transcription) that I know of. In declension table it is especially important because of so-called "mobile paradigms" where accents alternate apparently randomly (in both quality/quantity and position) within inflected forms of a word. There is a number of those in SC, but they're worth marking only in a small number of cases (which you memorize and then you can easily guess the accent in other forms if you have knowledge of accent-alternating paradigms). Similar thing happens in Lithuanian and Russian. --Ivan Štambuk 03:11, 12 November 2009 (UTC)Reply
Thanks for the information! Cheers, Razorflame 03:15, 12 November 2009 (UTC)Reply

Albanian roots on the Proto-Indo-European roots page

I will be looking through the Albanian roots again for corrections. I have noticed that Pokorny, Demiraj, Orel, etc. do not always agree on an etymology (such as zot). Any suggestions for me when adding a root of disputed etymology? Thanks Azaleapomp2 06:45, 12 November 2009 (UTC)Reply

I suggest ignoring Pokorny completely when there are newer theories available. When there are several conflicting explanations involving different roots, such as with zot, I suggest adding possibly before the mentioned Albanian word. We don't want to clutter that page too much, and any discussion on the etymology with references should appear at the respective entry. --Ivan Štambuk 06:57, 12 November 2009 (UTC)Reply
Thanks for the suggestion. Azaleapomp2 02:20, 13 November 2009 (UTC)Reply

gišrinnu

Will you add script, blue link and possibly further ety to կշիռ (kšiṙ, scales), please? --Vahagn Petrosyan 18:33, 12 November 2009 (UTC)Reply

Nemzag

At this point, we may want to consider a permanent block. We've been over the same topics so many times, with no signs of any improvement. And unlike some other questionable editors, Nemzag has yet to produce any worthwhile content, as far as I can tell. -Atelaes λάλει ἐμοί 03:03, 14 November 2009 (UTC)Reply

OK, we'll see when/if he gets back in a month. --Ivan Štambuk 04:42, 14 November 2009 (UTC)Reply

my efforts at Serbo-Croatian

I am beginning to learn a little Serbo-Croatian, and find the Rečnik srpskohrvatskoga književnog jezika an invaluable asset. I have also started using it as a resource for contributing here. I've put in dosle/досле and endogamija/ендогамија. Looks good, no? I also just wanted to make sure I'm not making a mistake with the declension (basically just copied from astronomija) or with the pronunciation. – Krun 21:33, 14 November 2009 (UTC)Reply

Looks great! If you want to, leave me your e-mail and I'll send you some more (semi-)illegal books/weblinks that will save you lots of time learning SC and adding Wiktionary entries :D --Ivan Štambuk 22:00, 14 November 2009 (UTC)Reply

ævi

I just added the etymology on ævi (Old Norse); if you know the Indo-European root or the proper Sanskrit word, could you please add it? – Krun 11:31, 18 November 2009 (UTC)Reply

Thanks a lot. B.t.w., what reference work are you using for PIE? – Krun 11:57, 19 November 2009 (UTC)Reply
IEED is your friend (in this case Lubotsky's Indo-Aryan database, search for Sanskrit "ayus"), and for a bit more detailed discussion on quirks of the reconstructed PIE form see p.171 in Mayrhofer's Etymologisches Wörterbuch des Altindoarischen, Band I [1]. --Ivan Štambuk 12:08, 19 November 2009 (UTC)Reply

BCSM extraction

My script development on this project is nearly ready for alpha tests. On 2 November 2009 you stated:

If you need an algorithm for the extraction B/C/S/M metacrap from the merged entries, I can provide it.

I'd very much appreciate if you could provide me that algorithm. Regards, - Amgine/talk 16:47, 18 November 2009 (UTC)Reply

You mean for Wiktionary or some external project? --Ivan Štambuk 23:15, 18 November 2009 (UTC)Reply
Yes. - Amgine/talk 01:03, 21 November 2009 (UTC)Reply
That was or..or, logical exclusivity ^_^. Could you please provide a bit more details on what exactly are are you trying to accomplish with your script? It makes a lot of difference. I assume you are only interested in SC-English translations (and not vice versa), and are trying to build a translation database for such use, where users could select one of the BCSM standards and get their input properly translated? (If that is the case, than what you need is not the entry generation but content elimination algorithm). --Ivan Štambuk 01:33, 21 November 2009 (UTC)Reply
One of the several applications I have is extracting the top definition of a spelling in a given language. It is of course far faster to parse for L2 using regex /==[language pattern]==/. Your crusade destroys this simple and fast method for BCSM for terms you have 'consolidated'.
An additional project is to develop MRD data from Wiktionary dumps, in which every section broken out into relevant data elements and presented non-visibly for ease of processing. This project is still in a formative stage.
A third project is work with external reusers for offline presentation of Wiktionary (and other WMF projects) content, for example using OpenZim and WikiReader, primarily as strategic methods of making WMF content available in regions where internet access is limited.
A fourth project is work toward data/layout standardization across Wiktionary languages, ultimately in support of the third and second projects.
Have I presented enough reasons for you to provide the algorithm you said you could provide? - Amgine/talk 02:20, 21 November 2009 (UTC)Reply
First of all, my "crusade" is doubtlessly beneficial to the current Wiktionary users and contributors. I don't give a flying ... about some hypothetical applications in some fantasy scenarios of yours. You're Ullmann's obnoxious little puppet-troll and I'm sick of your goddamn abuse. If you cannot comprehend in what terms is common Serbo-Croatian treatment a Good Thing, that is a personal problem of your ignorance and malice. This issue has been discussed for months, and I have provided dozens of arguments why we should change, from various perspectives. You haven't participated in single one of those discussions, but in the last of Ullmann's disgusting trollfests, where you apparently argue that word written in 2 different script belongs to "different languages", and mention some kind of "international linguistic standards" where you clearly demonstrate that you don't have a clue what you're talking about. So please spare me of your "righteous concern" that the common SC treatment "destroys simple and fast method". You're damn right it does. At the expense of what - 100 times more optimized usage for humans. You know, millions of people who come at this website every month and click around. Ever thought of them?
For any external presentation of Wiktionary data there is absolutely no need to separate Serbo-Croatian entries simultaneously. You'd get pages such as misao which look completely retarded. OTOH, if one wants to build a database of entries containing only modern standard Croatian or e.g. Serbian lexis - that's another thing. The bestest method would be if you had a tagged list of such per-standard words which you use for filtering the SC entries. I have several such lists and I'm continuously improving them because I need them in one of my projects (machine translation among B/C/S standards, one day hopefully to be installed at sh wikipedia trivializing the importation of articles from bs/hr/sr ethnopedias presenting different versions of articles in different tabs. It already works pretty well, handling Ijekavian/Ekavian dualities, the only thing left is a boring task of building a completely tagged database for ~ 20-30k the most frequent words, which would take months of manual labor..).
Internally, all the SC data should be kept as it is structured in Wiktionary. Any kind of processing should occur only at the presentation time. To filter out particular standards, here's a general algorithm that should work in some 95% of cases (it's all explained in WT:ASH already):
  1. For Croatian disregard entries in Cyrillic script, entries which contain (Ijekavian) in ===Alternative forms===, are linked to by entries containing (Ekavian) in ===Alternative forms===. Disregard meanings which have (Bosnian, Serbian), (Bosnian), (Serbian) labels
  2. For Serbian disregard meanings which have (Bosnian, Croatian), (Bosnian), (Croatian)
  3. For Bosnian disregard meanings which have (Croatian, Serbian), (Croatian), (Serbian)
If an entry has e.g. only one definition line in one of the standards you're not interested into in that pass, you should disregard it completely. You should also remove all the instances of ===Alternative forms=== section which are not interesting to a particular presentation (For Croatian that would be Ikavian and Ijekavian, for Bosnian only Ijekavian, for Serbian Ekavian and Ijekavian). You should also remove all the instances of side-by-side mentioned Cyrillic script entries in Croatian when used with {{term}}. The first pass of the algorithm should also preserve a list of disregarded lemmata, and use it in a second pass use it to filter out any dual mentioning of double forms by means of {{term}} or {{l}} (e.g. in "mlijeko / mleko" for Croatian you'd drop the second park "/ mleko", but for Serbian you preserve both).
The left issues that need to be fixed manually are "Ekavian" words in Bosnian/Croatian in dual forms such as prezir/prijezir, Ikavianisms in Serbian (zaliv, proliv..). Cases which words such as kašika, which are not really "proper" Croatian and are often frowned upon by the purists (people like Kubura who see Communist/Serbian boogymen everywhere), but are nevertheless still abundantly used, should also be taken care of at the reader's preference. So far there is no list of "proper modern standard Croatian" words (the first standardized dictionary is scheduled to be published in 2011/2012), and neither there is for Serbian to my knowledge, so it's all quite arbitrary to where do you draw the line. --Ivan Štambuk 03:24, 21 November 2009 (UTC)Reply
Thank you for this description. It's nearly completely incomprehensible from a data processing point of view, but I'm sure when I get to the SC terms it will be extremely helpful. I suspect you're simply not able to to grasp that I don't care a whit about the linguistics, and assume you're probably right about those issues. All I care about in this regard is making the data more-standardized, thus more accessible for re-use in any way by any one. From this point of view your crusade makes the data less accessible, less easily used, and so is a harm to those goals. In my opinion your actions are rather pointless - you're pro/prescriptive and human language is not - and I'd rather you did not continue to make Wiktionary's data less useful. But that's merely my opinion. - Amgine/talk 17:30, 21 November 2009 (UTC)Reply
So in your opinion millions of on-line users should bend to your needs to process Wiktionary database for machines? I should waste several hundred hours of my personal, manual labor, making retarded duplicates such as on [[misao]] or [[običaj]] so that you can write few hundreds lines of codes less? Ts ts ts, so typical. FYI, my "crusade" is merely reflective of the common scholarly, especially lexicographical, practice for treating SC in the last century and half or so in, and whether you perceive it as "pointless" or "prescriptive" is of course matter of your own, as you say it, non-linguistic judgment. I personally veeery much doubt that the above description will not end up in some of the Ullmann's "restoration" scripts one day, but we'll see about that soon enough, I suspect ^_^ --Ivan Štambuk 17:47, 21 November 2009 (UTC)Reply
English scholars for nearly a century railed against the inclusion of the word "ain't", and all contractions for that matter, as being less correct. Millions of online users, as you put it, are in my opinion far more likely to be looking for BCSM language headers than S-C; but since I don't have the evidence to make that a statement I'll just point it out as a possible circumstance rather than a baseless assertion. As for Ullman's use - I don't know. I don't work in the same development languages as he. <laughs> But I'm such a slow developer it'll be a while before I can finish building my square wheel to interface with the API. - Amgine/talk 07:01, 22 November 2009 (UTC)Reply
How exactly is the analogy to ain't and contractions in English spellings relevant to this? You totally miss the point. You bring up an arbitrary example where the scholarly tradition indeed is subject to change, but in a completely unrelated topic so much less in magnitude (spelling of a few words, and the usage of apostrophe, vs. treating sth as a one language or not is not really the same league), and generalize the conclusion. That logical fallacy probably has some nice Latin name but I'm too lazy to look it up.
In your opinion...ah! And the thing is, your opinion doesn't really value much. Your contributions and knowledge in Slavic languages here are infinitisimal that only your opinion would matter. SC is taught as one language in almost every single university in Russia, Germany, Austria, Netherlands and US (those 5 countries have the strongest Slavic languages programs in the world), and if their teachers think that it makes sense to group them, who are you to defy them? Some monolingual American, right.
Please Amgine don't even try to detach yourself from Ullmann's trollings. What are the odds that you pick up the discussion exactly when he decides to step out (and not participating in SC discussions at all prior to that), and express your support for my desysop at meta within hours after he committed another disgusting mental vomit. And running his "restoration script" (with 5 out of 7 test edits being erroneous) on your own username account. I am supposed to believe that you genuinely care for Serbo-Croatian entries, and the effect the merger has on the users? Puh-lease.
The only reason why you're asking for this algorithm now is so that Ullmann can "upgrade" his useless script later, pretending that you need it for some of your pet projects. --Ivan Štambuk 08:20, 22 November 2009 (UTC)Reply
(undent) Before you begin your undoubtedly powerful rhetorical response, could you please read the final paragraph slowly and with consideration?
The reason I use the very broad argument regarding ain't is the reason you do not attempt to refute its validity. Academia loses to reality.
Yes, my opinion. Yours as well.
Similarly, your opinion that User:Ullmann trolls. Saying that loudly and over and over again does not make it so. It merely shows both your lack facile pejorative vocabulary and inability to prove harm.
Please note that I concurred on meta only with those elements I was aware of, and pointed out that it was the wrong venue for the discussion. Which in my opinion is the ethical and responsible response to being informed of the effort.
Ullmann's bot should be run - the logic falls within guidelines and principles of Wiktionary - and it was blocked by one hypocritical admin who has since repeatedly violated the principles on which he blocked the running of that bot. (In case you're wondering, yes I have pointed this out to him directly in a real-time conversation.) You on the other hand violated any semblance of ethical behavior since you are clearly involved in the dispute - which was a not-unexpected result of running the bot. To be blunt, one of the two goals was to show that you would do whatever possible to achieve your sanitation even if it were demonstrably wrong for you to do so. And you did.
But Ullmann and I are not a part of some dark and mysterious anti-M.-Štambuk cabal. I believe you and I had several run-ins over a vote and messages you left on talk pages, but since I do not track on-wiki politics I do not know this was before/during/after your disagreement with User:Ullmann. We also don't program together; I'm a rather crappy coder working in a different set of development languages, and it's difficult for him to explain the basic-level information I need. I sincerely doubt User:Ullmann needs any support for his script; on the other hand I may depending on where the task force I'm working with goes with its mandate.
The other, and for me primary, goal of running the script was to restore the L2 headers, which are easier for humans and bots to locate.
Now - the above are things we are unlikely to agree on, and they are not generally related to the future. There are many possible ways to end the bickering between us and I would prefer to focus on these. You are the resident expert on S-C, and I'd like to talk with you about what would be the least-effort method to include an L2 header - perhaps similar to alternate spellings of entries? I understand your goal to avoid their creation except at presentation, but for re-users 'presentation' is when the database dump is created. (Most re-users will be processing the content to import to their own database format, and may be presenting the content very differently from the form it is here, so a javascript tool is not a solution in this case.) I'd like to hear anything at all that could lead to a single solution and expands the standardization of layout. - Amgine/talk 16:03, 22 November 2009 (UTC)Reply
If you want the dumps fixed, it would seem most sensible to run the dump through your tool (using the details of the algorithm set out above) you can then publish this as a "fixed" wiktionary dump without wasting anyone's time on-site. Conrad.Irwin 16:08, 22 November 2009 (UTC)Reply
Good idea! I'll write the tool to generate individual B/C/S(/M) sections in a format compatible to Wiktionary dump. That we'd have one argument less for keeping those stupid duplicates (they'd interfere with the generated entries which are always more up-to-date). --Ivan Štambuk 16:25, 22 November 2009 (UTC)Reply
That'd be a great solution, for you. Of course it does not change the dump for every other re-user who draws from downloads.wikimedia.org.
Vahag is encouraging me to explain a specific real-life example of how your approach makes things more difficult for re-users. The example I am familiar with enough with is Wiktionary Lookup Hover, a javascript tool in use on wikimedia sites and a few blogs (it can be used on any website.) It looks up words based on a user's language preferences and the language of the site. It is a small, dumb, fast script because it's an add-on tool to help reader experiences/usability - it has to be as small and fast as possible.
If this tool is used on a website whose language is B/C/S/M, it would not find terms on en.Wiktionary that have been "consolidated", because it looks for the L2 header.
This tool is also a good example why Wiktionary should work toward a single standard for layout. Currently it is only available in 7 languages, with another 5 under development, because each wiktionary requires a completely unique template designed to extract language and definitions for that language. Similarly, every language database dump requires a separate technique to parse the contents. If Wiktionary wishes to maximize the reach of its content it should work toward a single way of presenting that content to publishers/application of that kind of content. - Amgine/talk 17:40, 22 November 2009 (UTC)Reply
I didn't give a link where you can try the tool. Try it at fr.Wikinews. Just double-click on any word that is not a link. - Amgine/talk 17:49, 22 November 2009 (UTC)Reply
for every other re-user who draws from downloads.wikimedia.org. - and how many are those? tens? thousands? millions? Of them, how many would be interested in specifically Serbo-Croatian? You're again arguing with imaginary numbers.
Are you saying me that javascript cannot do if (userlanguage==bs|hr|sr) then lookup(sh) ? I've been using WikiLook for months and it works perfectly fine for Serbo-Croatian entries. Defaulting bs/hr/sr to sh is a matter of trivial conditional.
You cannot "extract" definitions for a word, that cannot be done. Especially not a "top definition for a spelling" as you term it, and esp. not in English, because where there is great deal of coalescence between different parts of speech, words sharing the same spelling but having different etymologies, pronunciations etc. The only possible way is to present the entry completely. --Ivan Štambuk 20:07, 22 November 2009 (UTC)Reply
I believe there are about a dozen scheduled reusers at the moment; one of the goals of Strategy is to increase this number - particularly in middle-to-low internet accessibility region applications. Part of the mandate of Task force/Offline is to collect solid data regarding this, in fact. We would hope that every single one of them is interested in Serbo-Croatian, but whether they are or not does not prejudice whether en.Wiktionary should be developing and presenting generically useful information.
I am not saying that javascript cannot present the information. I am saying it cannot do so when the information is being processed into a database, or other format depending on the re-use to which it is being put.
Depending upon the use of the term extract, it is generally easy to select the top definition by some criteria and present it, assuming you have some available source of data to draw from. Both WikiLook and Wiktionary Lookup Hover do so tens of thousands of times per day. And they both have the option of displaying the full entry. Here are a few examples of the latter doing so: [2] [3] [4] [5] [6] (English among others) [7] [8] [9] (Russian among others)
So you've finally spit out some numbers. Wiktionary has ~ 1.5 million pageview hits per day. The figure of a dozen, or a few dozen, looks ridiculously insignificant to that. It makes absolutely no sense at all to make the task more difficult for millions, in order to make it only a bit easier for a few. Especially when those few, if they genuinely are interested in SC data, can compensate for Serbo-Croatian formatting scheme on Wiktionary rather easily. And, as the statistical probability goes, probably 99 out of 100 of those "database re-users" would not share your and Ullmann's opinions of Serbo-Croatian, which is according to the 99.9% of English-language scholarship one language. Furthermore, I'm pretty sure almost all of them would be happy to have several user language preferences defaulted to a single Wiktionary L2 header. It would make the task of processing much easier. And hypothetically, if they don't, they can easily disregard the information they're not interested into, as outlined in the algorithm above (which is directly deducible from WT:ASH guideline page for Serbo-Croatian).
You say: I am not saying that javascript cannot present the information. and above: If this tool is used on a website whose language is B/C/S/M, it would not find terms on en.Wiktionary that have been "consolidated", because it looks for the L2 header. - you're contradicting yourself. Defaulting bs/hr/sr language preferences to sh L2 on Wiktionary is a matter of a trivial conditional. Not to mention that there is absolutely no reason why the above algo cannot be implemented in javascript itself, or any other layer between database (XML/SQL) and the presentation (HTML), where it could present the users the option of looking up both SC and individual bs/hr/sr, depending on their choice, if they happen to have "problems" with the name Serbo-Croatian. The thing is Amgine, that they'd get exactly the same content in either case (because it is, you know, one language?). If the users browses e.g. Croatian webpages, and you provide definitions for words from the SC entries on Wiktionary, it should work flawlessly (without any change!) in 99% of cases. The only thing you need to is to ignore Cyrillic spellings in the inflection line and ==Alternative forms==, and meanings preceded by (Bosnian), (Serbian). That's some 10 lines of codes maximally.
The links you provide don't invalidate what I've said. In English language it's pointless to focus on the "topmost meaning of a spelling" because of polysemy and orthography. That's why Google's define: operator extracts from several websources all the meanings of the argument and usually combines them with a semicolon (it can also process Wiktionary). The notion of a "meaning of a word" is so 20th century. Words don't have meanings, it's the meanings themselves, i.e. phonetic outputs thereof, that have spellings. Spoken language does not abide by arbitrarily-defined orthographical conventions which define the notion of a "word". The users must be presented with a complete entry for a specific spelling, for them manually to find out which of the meanings suits in a given context. --Ivan Štambuk 06:29, 23 November 2009 (UTC)Reply
One of the data re-users is Google. The number of re-users is quite disconnected to its potential impact. If Wiktionary's data were suitably standardized it is reasonable to expect it would be presented/applied several million times per hour, rather than a couple million times a day. As I am typing into this edit box, my browser is using a pair of dictionaries to check my spellings - these in turn are connected to a couple of dictionary/thesaurus routines in my operating system, so in writing this entry approximately 4 000 dictionary-related actions have occurred, and all could have been done using Wiktionary content.
The Wiktionary Lookup Hover tool does not include the logic required to find B/C/S/M on the English Wiktionary. It uses a template of the current page via the Mediawiki API. Your suggestion is, effectively, to alter the Mediawiki API for the English Wiktionary's Serbo-Croatian entries. The Wikimedia Foundation recognizes at least B/C/S as languages; the Wiktionary Lookup Hover tool on such websites will be looking for those language headers and will be unable to find them.
The links I provided merely show the script extracts a portion of a document - in this case the first entry in the list of definitions. The user has the option to view complete entries, and the script implementer may choose the number of values to return. - Amgine/talk 19:15, 23 November 2009 (UTC)Reply
Your ill-logic is astounding. You're trolling Ullmannn-style, inventing allaged claims of mine, fabricating "impact" and "technical difficulties" where there are none. Several million times per hour. LOL. This ridiculous claim is on par with Lmaltier's running spellchecker live while browsing webpages. Wiktionary won't reach the quality of commercial dictionaries in English language alone within a decade at least, and you're talking as if it's ready for prime time this very instant, about to rock the world of lexicography, and the only thing that it's keeping it from doing so is how we chose to format obscure language such as Serbo-Croatian, of which most English speakers prob. think that it's spoken by some tribe in Africa. You guys are simply unbelievable.
My suggestion is not to alter the Mediawiki API, but to post-process the SC data that it retrieves. Mediawiki API is completely irrelevant. The SC data can be post-processed directly from the database dump, or by extracting the data from the page HTML, like Google robots do it, or dynamically via javascript or whatever. No need to touch the Mediawiki software at all. And as I said, that post-processing would be unnecessary in most of the cases (i.e. the differences between the individual Serbo-Croatian standards are irrelevant, only the meaning of a particular word is, and whether that word is Serbian or Croatian or whatever it doesn't matter).
Wikimedia Foundation recognizes at least B/C/S as languages - LOL, you're hallucinating. Projects for individual languages are assigned by some kind of Language Committee at meta, and the Foundation has nothing to do with it. They already assigned a bunch of wikipedias for languages that don't exist. E.g. there is something called "Old Church Slavonic Wikipedia" where they're writing in sth that has absolutely nothing to do with OCS. On other "ancient" and constructed language wikipedias they're doing OR by inventing thousands of new never-used words for modern concepts. E.g. handspreca for "cellphone" in Anglo-Saxon - never existed. I won't even mention the "Siberian" fiasco. On examples such as these everyone can see that the competence of that "language committee" is not really something that can be relied on.
Furthermore, how they assign different language wikipedias is of no concern to us. Your imagination that the very existence of separate bs/hr/sr wikipedias somehow legitimizes the existence of "separate languages" breaks on its first stumbling point: there is also something called Serbo-Croatian Wikipedia, and it was originally the first and the only one, before enough bigoted nationalists expressed their separatist desires for their own individual 'pedias. So today we have 4 ethnopedias written in one and the same language, with the 5th one (Montengrin) soon coming. And FYI, these 4 ehtnopedias routinely copy/paste thousands of articles among one another, with trivial changes. Hence, stop speaking of some kind of "official recognition" by the WMF because there is none. This claim was already refuted several times in the previous discussion and I'm unsure of how you happened to miss it.
Next you say Wiktionary Lookup Hover tool on such websites will be looking.. - as I said above, that tool can be trivially adopted. Now, you either:
  1. Don't understand that part at all. Let me repeat it again: any stupid javascript tool can trivially post-process Wiktionary SC entries disregarding the content it's not interested into.
  2. Do understand but chose to ignore it, deliberately trolling again, by spreading obnoxious untruths. Lookup Hover tool wouldn't work...wow. Because we chose to implement SC on Wiktionary in human-friendly and not Lookup-Hover-tool-friendly way? Right. Poor tool.
Look Amgine, I gave you all the answers you need to hear, and even offered help that you apparently don't need/want, and if you cannot comprehend what I'm trying to say that's from my perspective only your problem. You're an isolated case and not the Wiktionary target audience I have in mind. I would appreciate if you stopped trolling on my talkpage further unless you have sth useful to say. --Ivan Štambuk 20:13, 23 November 2009 (UTC)Reply
I'm very happy to hear you can prove how trivial such an adoption is. Here is the xslt we're using: n:MediaWiki:Api-stylesheets/wiktEN.xsl. Here is the javascript: n:MediaWiki:Gadget-dictionaryLookupHover.js.
In the expectation that you might find it somewhat challenging to come up with a proof-of-concept in any reasonable conversational time frame, may I again request that we consider examining some possible methods of compromise - ones in which you are not put upon to do any more than you are currently volunteering and yet which allows both simplified machine parsing of the data and some TOC-level redirection for readers? - Amgine/talk 22:56, 23 November 2009 (UTC)Reply
What TOC redirection you speak off? All "redirection" that is required in your programe is a matter of adding bs/hr/sr/sh -> "Serbo-Croatian" resolve to a few of the code->langname maps in the code. Wiktionary TOC don't need to be touched at all. --Ivan Štambuk 04:13, 24 November 2009 (UTC)Reply

krzno, Sogdian

Hi. Vasmer explains in the Russisches etymologisches Wörterbuch the derivation from an Eastern language and juxtaposes it with Ossetian кæрц and Sogdian k´r´zkh, which is a romanisation... Whilst the proper script for Ossetian is easy to render, I do not have any clue as to whether Sogdian script is digitalised. Surprisingly, we do not have Category:Sogdian language. Do you think we need such one? Or only after the sript becomes available in Unicode? Also, I noticed some diacritical sign under the r (in k´r´zkh), but not a dot as in the transliteration of the Sanskrit vowel. How many r had Sogdian? The uſer hight Bogorm converſation 11:32, 19 November 2009 (UTC)Reply

There is no support for the Sogdian script in Unicode, and there won't be any for a long time to come. I would advise to avoid adding Sogdian entries completely until the script becomes officially supported. Sadly, I have no knowledge or resource on Sogdian so I cannot answer your question. We also have some entries for Tocharian A/B in a romanized form (because the script is not yet supported in Unicode), but these few are added using a consistent scholarly transcription. If you really want to add Sogdian entries I suggest you create Appendix:Sogdian alphabet (similar to e.g. Appendix:Avestan alphabet) where one could see which transcription scheme we should prefer. --Ivan Štambuk 11:55, 19 November 2009 (UTC)Reply
No, I would not create entries in languages I am not familiar with. But might I requæst here the creation of Hittite kurša, which is supposed to mean skin, leather and is also a cognate, as there is no Wiktionary:Requested entries:Hittite? The uſer hight Bogorm converſation 15:10, 20 November 2009 (UTC)Reply
There you go: Wiktionary:Requested entries:Hittite :D
About Hittite kurša- - definitely not of PIE origin (according to 2 the most recent Hittite etymological dictionaries), so I'm not sure what would you consider it cognate with.. This is extraordinary word nevertheless, for it underlies Ancient Greek βύρσα (búrsa, (skin)bag, wineskin) > Latin bursa > French bourse > English purse and dis-burse ! I can create it nevertheless if you want (just place a request at WT:RE:hit). --Ivan Štambuk 16:11, 20 November 2009 (UTC)Reply
According to Václav Machek it is cognate with krzno. Please, create it. Do you think the rest are cognate with krzno too? The uſer hight Bogorm converſation 19:42, 20 November 2009 (UTC)Reply
Well, Václav is wrong, at least from the modern perspective... --Ivan Štambuk 00:49, 21 November 2009 (UTC)Reply
:( But may I mention his theory in the etymology section? I see there are numerous other sourced theories and this is from Etymologický slovník jazyka českého, ČSAV, 1968, str. 298: odvozeno od nezachovaného slova, které bylo příbuzno s het. kurša. I am eager to mention it. The uſer hight Bogorm converſation 11:57, 21 November 2009 (UTC)Reply
Feel free! It's just that we must put the newest theories in proper perspective. --Ivan Štambuk 11:59, 21 November 2009 (UTC)Reply
𒆳𒊭𒀸 has been created. --Ivan Štambuk 01:44, 21 November 2009 (UTC)Reply

ṣāṣu = moth

Yet another request: could you add script to ցեց (cʻecʻ, moth), please? It appears there are too similar Akkadian words meaning “moth”: ṣāṣu and sāsu. I need the first one. It can be found in Muss-Arnolt, Ass. Handwb. 887. --Vahagn Petrosyan 15:46, 20 November 2009 (UTC)Reply

I can find only the sāsu spelling in my sources.. That is the only possible "proper" outcome of the Proto-Semitic word which had */s/ inside and has /s/ reflex in all the daughter languages (and where it doesn't, it's by regular sound changes).. Perhaps ṣāṣu is a misreading, hapax, or some intermediary form needed to yield the Armenian word? I'd have to investigate this a bit more, there is no trace of that Kouyunjik tablet 3726 on the Internet for a closer inspection :/ BTW, I've created WT:RE:akk for Akkadian requests! --Ivan Štambuk 16:45, 20 November 2009 (UTC)Reply
ṣāṣu is supposed to be completely different from sāsu and his numerous cognates, which, indeed, cannot phonetically render Armenian cʿecʿ according to Hübschmann. --Vahagn Petrosyan 17:43, 20 November 2009 (UTC)Reply

Cognates

Where may I add the removed two Germanic cognates of трчати? I do not think there are many cognates in Slavic languages to put them in (just SC and Bulgarian търча) and the etymology was not too replete with just three cognates... And since I am particularly fond of Germano-Slavic kinships, I would like to link them somehow. The uſer hight Bogorm converſation 13:35, 21 November 2009 (UTC)Reply

I found only τρέχω, which derives from PIE *dʰregʰ- (as SC, Bg, Gothic and OHG, right?). I have no experience in PIE appendices, so if you create one for *dʰregʰ-, I would be glad to see there all descendants - Ancient Greek, Bulgarian, Gothic, OHG, Serbo-Croatian. Is this acceptable? There is no page for requested appendices, right? The uſer hight Bogorm converſation 13:40, 21 November 2009 (UTC)Reply

Use WT:RE:ine for PIE requests from now on.
May I ask for the source of this PIE *dʰregʰ- connection ? I personally don't see how this could yield Slavic *tъrk- and Balto-Slavic *turk- (Lithuanian tùrkterėti being a clear cognate).. --Ivan Štambuk 15:05, 21 November 2009 (UTC)Reply
I copied *dʰregʰ from τρέχω#Etymology, which is cognate with Gothic 𐌸𐍂𐌰𐌲𐌾𐌰𐌽 (per Atelaes) which is (per Skok, Et. r. s. h. j.: vol. 3, str. 495: U obzir dolazi upoređenje s gr. τρέχω »bježim« - upoređenje - comparison, but it implies kinship, does it not?) cognate with трчати. I suppose, if those three words are cognates, they derive from the same PIE root, am I right? The uſer hight Bogorm converſation 16:31, 21 November 2009 (UTC)Reply
Not, these can't be related to Balto-Slavic. Pie *dʰ yields always Balto-Slavic *d, you can account for *ur as a reflex of syllabic sonorant in zero grade (*dʰrgʰ-), and PIE *gʰ always yields Balto-Slavic g (unless changed by palatalizations or sth similar). So it should be *derg- or *dъrg- in Slavic which we find no traces of. Skok is obsolete: he claims that there is no parallel in Baltic which there is (tùrkterėti, which I'm not sure what it means). He only mentions Old Irish, Ancient Greek, Gothic and OHG words as worth comparing to, not listing it as formally valid cognates. This is typical of older works - they just list a bunch of lexemes with the same meaning and approximately the same form in other branches, and let the reader "draw the conclusion". I know that you're eager to list Germanic cognates but please refrain unless there is some (other) strict evidence. --Ivan Štambuk 16:53, 21 November 2009 (UTC)Reply
Ok. But I listed a Germanic cognate in trn, because Vasmer explicitly says urverwandt and we already discussed the difficulty of rendering this word and Bulgarian, and Serbo-Croatian прасродство into English, so I listed thorn as cognate. The uſer hight Bogorm converſation 17:06, 21 November 2009 (UTC)Reply
That's OK, these are the doubtless cognates. You could've also added German Dorn :D Really archaic word, phonologically very "isomorphic" with respect to common sound changes. --Ivan Štambuk 17:15, 21 November 2009 (UTC)Reply
I wanted to, but I thought the rule allowed only one cognate from non-Slavic language families, so I desisted... The uſer hight Bogorm converſation 17:25, 21 November 2009 (UTC)Reply

bijel, bel

I see you’ve added this as an Ijekavian/Ekavian pair, but were you perhaps too hasty? I have at least not found any trace of bel, but the Rečnik srpskohrv. knj. jezika lists only beo for Ekavian, bijel and bio for Ijekavian. – Krun 13:58, 25 November 2009 (UTC)Reply

The entry for beo in Srpski elektronski rječnik (which contains all the data form R. sh. knj. jez. + one other massive dictionary) also lists bel form as dialectal. But you're right - Ekavian form should've been lemmatized at beo, not bel. The only reason why I've put it there is for the sake of consistency (bijel : bel & bio : beo fits more nicely than bijel : beo !).
bel is a historical pre-form of beo, where l turned to o word-finally (it has been sporadically retained where the loss would ruin the word structure or shift the meaning of the resulting stem, e.g. with *bijeo sequence there would be 4 consecutive vowels which is no good), and certainly must (have) exist(ed). In Chakvaian and Kajkavian dialects the change of such word-final l hasn't occurred at all. (And it has been preserved word-medially, as you can see in the inflection).
Anyway, thanks for pointing that out and feel free to correct such blunders of me in the future :P --Ivan Štambuk 14:44, 25 November 2009 (UTC)Reply

prudovito = shoal?

I was looking at the translations of shoal, and found prudovito as one for SC. However, all the web results linked to Wiktionary. Is this an error? --Volants 13:22, 27 November 2009 (UTC)Reply

That's an adjective (lemma form being prudovit, prudovit-o is neuter singular), which means something different: "abounding in sandbanks, that which has many sandbanks". That "translation" was added by a well-known IP - that dude has been adding crappy translations for years, and I've blocked him many times for that. --Ivan Štambuk 13:27, 27 November 2009 (UTC)Reply

željezan

I was just adding this and was wondering about its usage, because it is given as željezan/železan in the dictionaries, but HJP doesn't list any indefinite forms with the "IZVEDENI OBLICI" option. – Krun 17:35, 27 November 2009 (UTC)Reply

Because their algorithm for the generation of inflected forms is somewhat poor. Look for the headword line, the adjective ends in -an so it's an indefinite form for sure, and it lists -zni as odr. (određen (definite)), so it must have both. HML has better algorithm for the generation of inflected forms, but the problem with them is that they frequently generate hypothetical forms (in this particular case, the comparative and superlative forms - this type of adjective cannot be graduated at all). So use your wits to eliminate junk, or tag with {{attention|sh}} and I'll check it out for ya.. --Ivan Štambuk 17:40, 27 November 2009 (UTC)Reply

artus

Can more be said about the etymology of this word? It looks like it has cognates, but L&S are cautious about saying anything. --EncycloPetey 00:20, 28 November 2009 (UTC)Reply

Does the adjective descend from arceo, as suggested by Dvoretsky and Koroljkov? The connection is not mentioned as of now. The uſer hight Bogorm converſation 08:09, 28 November 2009 (UTC)Reply

No, it's a direct reflex of a listed PIE adjective, with clear cognates in 3 other branches. --Ivan Štambuk 08:11, 28 November 2009 (UTC)Reply

osel, осао

Could you explain whether it is possible for a Proto-Germanic word to be derived from Latin? Vasmer explicitly explains it's origin as Latin asinus > Gothic 𐌰𐍃𐌹𐌻𐌿𐍃 > Russian осёл and all other Slavic cognates of course. At osel#Etymology I found a Proto-Slavic reconstruction which is to be placed between Gothic and the concrete Slavic word, id est asinus > Gothic 𐌰𐍃𐌹𐌻𐌿𐍃 > Proto-Slavic ... > Russian осел. However, I found also a proto-Germanic reconstruction, which did not surprise me since there is German Esel. Therefore I would like you to check whether the whole chain of succession is asinus > Proto-Germanic ... > Gothic 𐌰𐍃𐌹𐌻𐌿𐍃 > Russian осёл (and bg, sl, sh, sk...). I am also curious why I could not find осао in neither sr nor hr wiki. Is this a rare word? I added it amongst the requæsted entries. The uſer hight Bogorm converſation 11:08, 28 November 2009 (UTC)Reply

Yeah, that is borrowed from Latin originally, but not directly from asinus but from its diminutive asellus (otherwise you get the unexplained change *-l- < *-n-!). It's a perfect reflex: OCS & Common Slavic osьlъ < Early Proto-Slavic *asilu < Germanic *asilu-. The word is attested in several Germanic branches (OE, Gothic, OHG, OS..) which guarantees its antiquity to Proto-Germanic.
SC osao is rather obscure dialectalism. You can find some attestations on Crotian Wikisource:
It's a cute word.. --Ivan Štambuk
Yes, I am fond of rare and archaic words too ^_^ Could you please take care of the conjugation of prisegnuti? I am not aware of the differences in the templates for verbs... The uſer hight Bogorm converſation 16:53, 2 December 2009 (UTC)Reply

Two things

The first one, I've been listening to some Serbocroatian speakers and the /a/ sound has been causing me some confusion.../a/ is used to describe Serbocroatian and Slovene "a", but in these languages, the sound of "a" is very different from Romanian and Spanish... it sounds more like the "a" in Armenian to me... I'm not sure how familiar you are with IPA, but what do you think about this in general?

The other is about the ISO code thing that people tend to bring up from time to time. Do you want to just switch to ISO 639-3's hbs instead of Wikimedia's sh? — [ R·I·C ] opiaterein14:15, 28 November 2009 (UTC)Reply

Well, there is just one /a/ phoneme in SC and Slovene, and it's usually transcribed with the <a> symbol... Could you elaborate a bit more on the differences to Romanian and Spanish? You think that the IPA [a] is not the most precise notation? I also wonder what have you been listening to..the most "proper" literary dialect (Neoštokavian) is spoken in the villages, and in colloquial speech speakers tend to utter much faster in some local idiom, ignoring vowel lengths and tones.. We're chiefly focusing on the prescribed literary idiom with the label Serbo-Croatian.
I have absolutely no problems with either one. sh is one letter shorter (and thus more "elite" :D), used by Wikimedia, and more intuitive to users, and hbs is kind of newish. sh is officially "obsolete", but they don't assign 2-letter codes anymore so it's usage is completely safe. If you think that we should switch to hbs because sh is somehow "unsafe" or something, I suppose you could mention it on some discussion board (WT:ASH comes to mind) and we could see what the other think. --Ivan Štambuk 14:54, 28 November 2009 (UTC)Reply
As for the codes, it really makes zero difference to me, but Ullmann was whining about it, so I thought I'd bring it up :D
I've been listening to the Pimsleur Croatian and the Bosnians, Croatians and Serbians on the BBC language thing. It sounds (not surprisingly) like the Slovene /a/, but I'm not sure what IPA symbol I would use to describe it... I'd have to listen more closely and compare it to other languages, I think. — [ R·I·C ] opiaterein15:11, 28 November 2009 (UTC)Reply

Citations:osao

*Should this be for oslom rather than osao? - sorry, didn't read the table

Well, I added translations for citations one time, but it's really tricky with translating these sometimes because they're written often in archaic and literary language, or in verse (with meter and rhyme, or "dropped" letters such as in os'o' = osao), and lots would be lost in translation.. E.g. the quote from Marin Držić I just added: it contains some embedded Italian (which was trade lingua franca at the 16th century Republic of Ragusa where the writer lived), and its usage is for deliberate artistic affect.. I suppose I could translate some of them (simpler sentences from relatively modern sources), but this ancient poetry and drama I wouldn't dare. --Ivan Štambuk 15:49, 28 November 2009 (UTC)Reply
BTW, I only tend to add citations for terms and meanings that are are normally not found in the dictionareis: dialectal and regional words, archaic words and spellings and similar. Persons who'd be looking them up are likely to already have some kind of proficiency in the language to understand its context of usage (otherwise they're unlikely to encounter the word in the first place). --Ivan Štambuk 15:59, 28 November 2009 (UTC)Reply
That's a good idea - I have just been trying to estimate how many years it would take me to add a single citation to every Italian word (I would be long dead before I got half way!).SemperBlotto 16:02, 28 November 2009 (UTC)Reply

zaman (time)

Hi. Are the Middle Iranian zamān, Parthian žamān (borrowed as Armenian ժամանակ (žamanak)), Persian زمان (zamān) borrowed from the descendants of Appendix:Proto-Semitic *zaman-? --Vahagn Petrosyan 03:27, 29 November 2009 (UTC)Reply

Ultimately all from Arabic. --Ivan Štambuk 05:33, 29 November 2009 (UTC)Reply
There is also Akkadian simānu (season), but unfortunately I could not find it in cuneiform... The uſer hight Bogorm converſation 07:53, 29 November 2009 (UTC)Reply

Category:Serbo-Croatian words prefixed with do-

Is this right? A latin d and a Cyrillic o? Mglovesfun (talk) 09:52, 29 November 2009 (UTC)Reply

Are you sure? That's Latin <o> in my browser... --Ivan Štambuk 10:10, 29 November 2009 (UTC)Reply
Well I typed {{prefixcat|do|lang=sh}} and it produced an error message. That's all I know. Mglovesfun (talk) 10:11, 29 November 2009 (UTC)Reply

Wiktionary:Requests for verification#Alahu ekber

Hi Ivan,

Could you do your merge-y/clean-up-y thing on this entry? I assume that the reason someone changed "Croatian" to "Croatian (not really)" is because they consider Muslim borrowings to be "Bosniak" rather than "Croatian". Given that, I don't want to RFV-fail it just because no one is working on citing these languages. Merging it now seems like the best approach.

Thanks in advance!
RuakhTALK 18:15, 29 November 2009 (UTC)Reply

Thanks! —RuakhTALK 18:26, 30 November 2009 (UTC)Reply

хи‧драт?

You Jugoslavs and your strange hyphenations :D Good teamwork. ATTAAAACK — [ R·I·C ] opiaterein18:00, 3 December 2009 (UTC)Reply

Oh yeah, I dunno if you meant to remove the gender, so I won't put it back :D — [ R·I·C ] opiaterein18:01, 3 December 2009 (UTC)Reply
Bug in my program...fixed now ^_^
For hyphenations - I'm doing it "by the ear", following a rule: all syllables end in a vowel, except when weird consonant clusters occur, or they break otherwise obvious morpheme boundary. dr sequence is abundantly attested in Serbo-Croatian, so I figured out it was OK to hyphenate it as hi-drat, especially when there is a short rising tone on medial -i- which itself is kind of micro-syllable-breaker :D --Ivan Štambuk 18:10, 3 December 2009 (UTC)Reply
When I start learning Serbocroatian, remind me not to worry about speaking perfect literary language... to just stick with regular day-to-day speech, or I'll go insane :D — [ R·I·C ] opiaterein18:16, 3 December 2009 (UTC)Reply
LOL xD Worry not my friend - the knowledge of literary SC is an arcane art reserved for language freaks. 95% of students of kroatistika ("Croatian studies") I spoke to (usually in types of discussions: "How can one dialect be '4 different languages', and 3 different dialects 'one language' at the same time" ^_^) cannot even properly accentuate Ja govorim hrvatski.. When you start learning SC make sure you notify me first on my talkpage and I'll send you some goodies on e-mail that will make your Wiktionary editing experience easy as pie. --Ivan Štambuk 18:26, 3 December 2009 (UTC)Reply
Mmmm I like pie :D Add HBS to apple pie and pumpkin pie! Those are delicious :) — [ R·I·C ] opiaterein18:37, 3 December 2009 (UTC)Reply

xšaθra

Is the xšaθra in աշխարհ (ašxarh) the uninflected form of 𐎧𐏁𐏂𐎶 (xšaçam) or is it a little different word? Can you create xšaθra if it’s attested, please? --Vahagn Petrosyan 11:16, 5 December 2009 (UTC)Reply

Same shit, what is transliterated as OP <ç> is of dubious phonetic value ([s], [sr]...). Historically it comes from older *θr so some transcribe it as <θr> or <θr>. The noun is of neuter gender, thus traditionally IEish nominative=accusative (nominoaccusative :D). Unlike (cognate) Avestan (xšaθra) and Sanskrit (क्षत्र (kṣatrá), English kshatriya and satrap are thus deeply related!) lexemes, due to the extreme scarcity of OP attestation, we (well, I..) lemmatize OP as attested, and not as stems, i.e. flexional endings included, which is in this case good-old IE -m for N (also A and V) sg. neuter thematic stems.. --Ivan Štambuk 11:40, 5 December 2009 (UTC)Reply

hydrate

Hi there Ivan. Can you add the Serbo-Croatian translation for the noun form of the word hydrate please? Thanks :) Razorflame 20:38, 6 December 2009 (UTC)Reply

It's already there! --Ivan Štambuk 20:40, 6 December 2009 (UTC)Reply
That it is. Sorry for the confusion, I didn't think it was there. :( Razorflame 21:31, 6 December 2009 (UTC)Reply

tjelesni

Fix this bad boy up for me, would ya? :) — [ R·I·C ] opiaterein02:12, 7 December 2009 (UTC)Reply

kakaovac

Could you take care of this entry? The uſer hight Bogorm converſation 07:59, 8 December 2009 (UTC)Reply

Wiktionary:Citations

Hi. What is the reason for this edit? In Wiktionary:Citations it is written: Citations pages should link back to all main entries using {{citation}}. This link got lost in your edit. The uſer hight Bogorm converſation 13:02, 8 December 2009 (UTC)Reply

That template has changed in functionality since that was written, and its current behavior is broken and it shouldn't be used now. Use ==Serbo-Croatian== + manual categorization instead, because otherwise section linking from {{seeCites}} wouldn't work. See the discussion at {{citation}} page. We still need to develop a well-thoughted policy on citations. --Ivan Štambuk 13:10, 8 December 2009 (UTC)Reply

Avestan.

Apparently the encoding did change since you created our Avestan entries.

See, for example, the entry 𐬞𐬀𐬌𐬭𐬌𐬸𐬛𐬀𐬉𐬰𐬀. It uses the character U+10B38, which is unassigned in the Avestan Unicode block. Since I don't know which character this is, I'm unable to fix these. -- Prince Kassad 17:44, 9 December 2009 (UTC)Reply

Really? I hacked out ALPHABETUM demo version to relocate Avestan to range in the Unicode proposal when I originally created that. I haven't dealt with Avestan since. In the meantime Unicode 5.2 came out and apparently no one has taken half an hour to create one simple 60-glyph font *sigh*. Feel free to fix these, I won't be messing around with Avestan until 2010. --Ivan Štambuk 17:49, 9 December 2009 (UTC)Reply

Appendix:Proto-Indo-European *seh₂wel-

Does the main section really need to be named Cognates instead of Descendants? Could you take care of this appendix? The uſer hight Bogorm converſation 21:12, 9 December 2009 (UTC)Reply

No. And forget that page, use *sóh₂wl̥. --Ivan Štambuk 21:14, 9 December 2009 (UTC)Reply
Ok, I was obviously too lazy to catch a glimpse of the cathegory of PIE nouns and notice it. What do you think about кобь being loaned from Celtic (proposed by Aleksej Šaxmatov, but dismissed by Vasmer)? What does recent research suggest? See Talk:кобь. The uſer hight Bogorm converſation 21:22, 9 December 2009 (UTC)Reply

plurals and mass nouns

I was just doing zec/зец, and there is the mass noun zečad, like jagnjad for jagnje. I'm just wondering how these should be mentioned in the entry. As you can see, I just put it under See also, but I'm not entirely satisfied. – Krun 23:56, 9 December 2009 (UTC)Reply

I use to place them in the inflection line (next to "normal" plural), but have later decided to place them in the ====Derived terms==== (which strictly speaking they are, regular derivations by means of collectivizing suffix -čad/-ad.) Mass/Collective nouns have their own fully separate entries (with their own plural forms), so I felt uncomfortable denoting them as "plurals" solely on the semantic grounds. If you have any suggestions on how to specifically handle these kind of important regular derivations like diminutives and collectives, I'd be happy to hear them. --Ivan Štambuk 09:01, 10 December 2009 (UTC)Reply

The origin of elephant

Could you please trace down the ultimatest source of Persian پیل (pīl, elephant)? Is it Sanskrit पील (pīlu)? I am looking for a place to list the numerous descendants. --Vahagn Petrosyan 01:08, 10 December 2009 (UTC)Reply

pīlu is पीलु (pīlu). Monier-Williams mentions also Arabic فيل, but did not explain that it has been borrowed from Persian, as stated here. Interestingly, पीलु (pīlu) also means Salvadora persica... The uſer hight Bogorm converſation 08:26, 10 December 2009 (UTC)Reply
Arabic from is definitely from Persian (Arabic has no /p/ and regularly replaces it with /f/), which from Sanskrit pīlu or Akkadian pīlu or pīru, which from Egyptian (prefixed by article) p-āb(u), which is from some alleged Afroasiatic source eḷu "elephant" which I've been unable to pin down more precisely. It is also the source of the first part of Ancient Greek el-ephas, Latin eb-ur and probably Sanskrit ibha. I'd place the descendants in the Egyptian entry, but in its proper hieroglyphic spelling, if any of you can make it.. (I know I can't, because there are still no Unicode 5.2 fonts for Egyptian available). --Ivan Štambuk 08:53, 10 December 2009 (UTC)Reply
I created the entry for the Aramaic æquivalent ܦܝܠܐ based on its alternative spelling and added the etymology according to this conversation. Could you check whether Akkadian pīru is spelt as AM.SI (, 𒄠𒋛). The uſer hight Bogorm converſation 09:56, 10 December 2009 (UTC)Reply
Yes, it's written as am-si. --Ivan Štambuk 10:37, 10 December 2009 (UTC)Reply

help with translation

Hi. May I ask you to explain what does лончиће mean in this edit. I really do not understand this expression, although I understand its parts, but I suspect that this may be something ad hominem. To derange the small pots? Is this an idiom? The uſer hight Bogorm converſation 16:01, 10 December 2009 (UTC)Reply

Right, pobrkati lončiće or polupati lončiće would idiomatically mean sth like "to be completely wrong in sth", "to completely confuse the issue". You probably won't find that in any dictionary tho... It is usually not considered an ad hominem, more like more cynical variant of "Don't be silly" type of remark. --Ivan Štambuk 19:42, 10 December 2009 (UTC)Reply
Ok, thanks for the explanation. I needed to ask, because the first thought when I read this expression was of the German nicht alle Tassen im Schrank haben (Tasse = cup ~ small pot), but the German proverb conveys a ruder connotation. The uſer hight Bogorm converſation 07:41, 11 December 2009 (UTC)Reply

метеор

Hi there. Can you fix up the Serbian part of this entry please? Thanks, Razorflame 19:02, 11 December 2009 (UTC)Reply

You mean the Serbo-Croatian part? In Serbia, Croatia, Montenegro and Bosnia and Hercegovina people speak one language - Serbo-Croatian. Serbian and Croatian in linguistics are not that different than Austrian or Swiss with regard to the German language. The uſer hight Bogorm converſation 19:16, 11 December 2009 (UTC)Reply
Thanks very much for the help with that word :). Yes, I meant the Serbo-Croatian part ;) метеоре was also made by me (accelerated from the root word), and since I see it in the declension table for the word in the heading, could you add the Serbo-Croatian language to that entry as well? Thanks, Razorflame 19:18, 11 December 2009 (UTC)Reply
I don't do SC inflected forms manually, because I consider it a waste of time. (Except for some irregular inflections and similar.) I'll add them by bot all at once one day once I add enough lemmata. --Ivan Štambuk 19:24, 11 December 2009 (UTC)Reply
Ok, thanks for the help anyways, Razorflame 19:30, 11 December 2009 (UTC)Reply

Čad/Чад

I noticed you just changed the instrumental from Čadem to Čadom, but I had the declension directly from HJP. I just checked HML, and it seems to use a very different paradigm (acc. -a vs. -Ø, voc. -e vs. -u). Which, if either, is correct? – Krun 14:56, 12 December 2009 (UTC)Reply

Typical example when both are wrong :) instr. must be -em not -om because stem doesn't end in a palatal, and accusative must be because it's an inanimate object. --Ivan Štambuk 15:00, 12 December 2009 (UTC)Reply
Wait a minute, did you say the instrumental must be -em, not -om; because you changed it to -om! It sounds like you're actually saying HJP is perfectly correct. I'm getting a bit confused here. – Krun 15:05, 12 December 2009 (UTC)Reply
Whoops, I meant the other way around: -em when it ends in a paltal, -om otherwise. e.g. polje -> poljem, rad -> radom. Soft consonants bind front vowels, hard consonants bind back vowels.. --Ivan Štambuk 15:07, 12 December 2009 (UTC)Reply
OK, that explains a lot, but what about the vocative? What is the rule on -e vs. -u? – Krun 15:10, 12 December 2009 (UTC)Reply
If the stem ends in a palatal, it can only be -e. Otherwise, generally both can occur, but there is a strong preference to -u for words where depending on the semantics vocative is very unlikely to occur (i.e. for inanimate objects, abstract nouns and similar). --Ivan Štambuk 15:17, 12 December 2009 (UTC)Reply

Hey there

Hey there Ivan. Earlier, you questioned me about whether or not you could get a count of entries that you've made here on the English Wiktionary. Soxred's article count tool is now available to use and will show you all the entries that you've made. See the link in my userpage underneath the Entries created header for more information. Cheers, Razorflame 19:13, 12 December 2009 (UTC)Reply

OK, thanx for informing me :) --Ivan Štambuk 20:30, 12 December 2009 (UTC)Reply
No problems. It says that I have made about 11,780 entries so far, so since you have been here longer, it'll probably say that you've made quite a few more. Razorflame 20:32, 12 December 2009 (UTC)Reply

visoravan

Hi there. The genitive singular form of this word seems to be screwed up. Maybe you could fix it? Razorflame 20:59, 12 December 2009 (UTC)Reply

Thanks for catching that :D (that's what happens when you'r on an edit spree semi-drunk lol) --Ivan Štambuk 21:00, 12 December 2009 (UTC)Reply
No problems :). Have fun :) Razorflame 21:01, 12 December 2009 (UTC)Reply

Latin dialect templates

Per WT:RFDO#Dialect etymology templates, the separate dialect templates ({{VL.}}, {{ML.}}, {{LL.}}) will be deleted. Please use the more functional and standard {{etyl}} approach ({{etyl|VL.}}, {{etyl|ML.}}, {{etyl|LL.}}). The template parameters work just the same. Thanks. --Bequw¢τ 15:16, 14 December 2009 (UTC)Reply

urnek

I'm looking for Serbo-Croatian urnek meaning “example, sample” borrowed from Turkish örnek, itself from Armenian օրինակ (ōrinak). Not in my dictionaries, so must be something rare like tel. --Vahagn Petrosyan 16:46, 16 December 2009 (UTC)Reply