Wiktionary talk:Normalization of entries

Definition from Wiktionary, the free dictionary
Jump to navigation Jump to search

Removed items[edit]

More controversial, undiscussed items, or items which would rather belong in WT:ELE, were removed until further discussion:

  1. Whitespace or non-printing characters other than space and newline may be encoded as HTML entities, such as   or &ltr;.
  2. No other HTML entities allowed, these should be converted to UTF-8. & -- > &
  3. For templates, newlines are allowed for clarity.
  4. To be discussed: What to do with unicode character boxes when there is no Translingual section?
  5. Floating boxes like {{wikipedia}} can appear on a line in between, but maybe we want to discourage this?
  6. (Controversial/Unwanted) Definition lines should be sentences, starting capitalized, ending with a period "."
    1. Alternate proposal: If they are sentences they must be capitalized and end with a full stop, if they are simple glosses they should be all lower case with no full stop.
  7. Reducing the number of third level headings in en.wikt, by combining things like "===Pluralish noun===" or "===Noun phrase===" into 'standard headings' like "===Noun==="
    1. Note: Apparently it does not require any change at the moment as we already have a number of standard-ish headers here and maybe here. Any serious change to that would most likely require its own discussion.
  8. Every POS header must be immediately followed by a headword line.
  9. Floating boxes like {{wikipedia}} can appear on a line in between the POS header and the headword line.
  10. Categories are placed at the end of each language section.
  11. Language names should not be linked or templated:
    • Neapolitan: {{t|nap|acqua|f}}
  12. All templates, boxes, or images should be in a certain language section except for templates like {{also}} that are designed to be at the very top and do not have anything to do with any specific entries.
  13. POS sections may contain at most one headword line and one definition list. Thus, entries like this or this are not correct.
  14. All tables should be either templates or wikitables. There should be no HTML tables.
  15. Markup such as gender should be provided within the {{t}}/{{t+}} template, except for qualifiers, which should use {{qualifier}}.
  16. All POS (or equivalent sections) must have at least one sense, starting with "#", before the next heading. If no definition is known, use {{rfdef}}.
  17. When necessary, indicate gender and number only with templates (e.g. {{m|fr|livre||book|g=m}}, {{m|fr|livre||pound|g=f}}). Use {{g}} only if no other template can be used.
  18. Context labels should use {{context}}, {{label}} or the shortcuts {{cx}} and {{lb}}.
  19. Lines beginning with whitespace are formatted in monospace font by the software; these should not occur in entries.
  20. Interwikis must link to the exact same spelling and are usually maintained (added/removed/sorted) by bots.
  21. One blank line before the first category of a language.
  22. Each image on its own line in the wiki text.
  23. No blank lines between a heading and an image directly below the heading.
  24. No blank lines between an image and a headword line.
  25. No blank lines between two images.
  26. One blank line after an image, before any content that follows that is not another image or a headword line.
  27. Each right-aligned box — such as {{wikipedia}}, {{wikispecies}}, etc. — on its own line in the wiki text.
  28. No blank lines between a heading and a right-aligned box directly below the heading.
  29. No blank lines between a right-aligned box and a headword line.
  30. No blank lines between two right-aligned boxes.
  31. One blank line after a right-aligned box, before any content that follows that is not another right-aligned box or a headword line.

--Daniel 17:23, 27 May 2015 (UTC) (then edited later to add more items)

  • This list was trimmed to some extent to contain only the norms which I believe to be the most uncontroversial ones, some of them seem to be the status quo; this policy would just formalize them. Others, which seem to be more controversial or undiscussed, such as capitalization/punctuation of senses discussed here and one headword line per POS section formulated here were left out. In case Wiktionary:Normalization of entries becomes a policy, these may be subject to further discussion and implemented to the page in the future. --Daniel 07:29, 28 May 2015 (UTC)
  • I've been striking a few items in my message, for my own purposes. (Basically, those that I've managed to make votes for.) --Daniel Carrero (talk) 22:00, 1 December 2015 (UTC)

Discussions[edit]

Related discussions:

--Daniel 06:55, 28 May 2015 (UTC)

Some small changes[edit]

I made a few small changes in the wording and formatting. I also added an extra point about no whitespace around category sort keys. I don't think anyone would mind. —CodeCat 20:01, 6 September 2015 (UTC)

Things that should appear on a line by themselves[edit]

This a proposal for an additional rule that certain things, mostly specific templates, should be placed on a line of their own with nothing before or after on the same line. Nothing is said about preceding or following blank lines, so they can be packed tightly. As a general rule, this applies to all items that are, in HTML terms, block-level elements. This includes tables and right-aligned content.

Please comment on this proposal. —CodeCat 18:41, 10 October 2015 (UTC)

Symbol support vote.svg Support --Daniel Carrero (talk) 18:10, 18 October 2015 (UTC)

Always use templates for headword lines[edit]

Per Wiktionary:Beer parlour/2012/July#A suggestion, I suggest adding the rules:

  1. Headword lines should always use templates, such as {{en-noun}} or {{head}}, not just wikitext, as in '''example'''.
  2. Entries should be categorized into POS categories (like Category:English nouns) using the headword-line templates, not using manual category links.

--Daniel Carrero (talk) 18:16, 18 October 2015 (UTC)

Symbol support vote.svg SupportCodeCat 18:21, 18 October 2015 (UTC)
Symbol support vote.svg Support- here is a list of entries having explicit categorization (just English for now) User:DTLHS/cleanup/explicit pos cat DTLHS (talk) 18:31, 18 October 2015 (UTC)
Symbol support vote.svg SupportJohnC5 19:09, 18 October 2015 (UTC)
Symbol support vote.svg SupportUngoliant (falai) 20:29, 28 October 2015 (UTC)

Blank line before first heading[edit]

Currently, one of the rules says: "One blank line before all headings, including between two headings, except for before the first language heading." This particular exception is apparently a bit controversial, as several people have said that they think a blank line should be added before the first heading as well, if there is content before it. I personally agree. So I would like to propose this change in wording:

One blank line between a heading and any preceding content, including between two consecutive headings. If the heading is the first thing on a page, no blank lines are to preceded it.

The consequences are that whereas before, the norms prescribed

{{also|foo}}
==English==
...

they now prescribe

{{also|foo}}

==English==
...

Any thoughts? —CodeCat 20:11, 28 October 2015 (UTC)

Symbol oppose vote.svg Weak oppose. Tomato tomato, but seeing that having no blank line after {{also}} is the most common situation, we might as well stick to that. — Ungoliant (falai) 20:29, 28 October 2015 (UTC)
Symbol oppose vote.svg Weak oppose too, as I said elsewhere recently. Per Ungoliant. --Daniel Carrero (talk) 01:25, 29 October 2015 (UTC)

Collapse multiple spaces into one space[edit]

Right now, there is no particular rule about having more spaces in a row. MewBot's current run has been collapsing these, but some have suggested that maybe this should be allowed. Apparently there is an older typing style in which two spaces after a period are prescribed, though personally I think it's a bit silly. The software ignores multiple spaces anyway, so I figure we might as well put it in writing that they can be collapsed into one space. —CodeCat 20:15, 28 October 2015 (UTC)

It appears that Eirikr has been actively undoing these edits. Apparently he considers having two spaces important enough to edit war with a bot over it. —CodeCat 00:52, 29 October 2015 (UTC)

Symbol support vote.svg Support Use a proportional font if it bothers you. This is a relic of from when people used typewriters and has absolutely no place here. DTLHS (talk) 20:19, 28 October 2015 (UTC)
Symbol support vote.svg Support --Daniel Carrero (talk) 20:21, 28 October 2015 (UTC)
Symbol support vote.svg SupportUngoliant (falai) 20:29, 28 October 2015 (UTC)
Symbol support vote.svg SupportEnosh (talk) 11:21, 29 October 2015 (UTC)
Symbol oppose vote.svg Oppose -- CodeCat, I very much dislike your misrepresentation of the facts here. Reverting a bot's edits, when the given summary says "Applied WT:NORM rules" and the change implemented by the bot is not part of said rules, is pretty far from edit warring. When I asked you about this behavior of your bot and requested that you change the bot's coding to no longer implement this change, you stated that the bot was nearly done so there wasn't any point in altering the bot's code. I then saw several more days of such bot edits, including of pages I had edited after I asked you about this issue.
This spacing convention is most definitely *not* a relic of typewriters: please read up on the topic some. I provided several relevant links in the thread on CodeCat's talk page. I am not opposed to the practice of other editors using a single space after a sentence-final period. I personally prefer two spaces, because I find it makes it much easier for me to quickly scan a text and find the sentence breaks. I am not alone in this preference, as numerous online resources clearly state. I am opposed to forcing this convention one way or the other, especially when the only rationale given is that some editors apparently find it “silly”.
A proportional font is not a solution: I am looking at a proportional font right now in the current editing window, and there is a clear visual difference between single- or double-spacing after a period. Neither the font nor the rendering software appears to make any distinction in how to render single spaces after non-final periods such as in Mr. or asst. or mgr., vs. how to render single spaces after sentence-final periods. ‑‑ Eiríkr Útlendi │Tala við mig 07:28, 2 November 2015 (UTC)
Symbol oppose vote.svg Oppose Mostly for enhanced legibility. Further, I'd like us to extend the legibility to users, not just editors, by requesting that two spaces after a period be so displayed by the MediaWiki software at least for our site, though the benefit is greater for sites with paragraphs with multiple sentences. The double spaces can also aid the reliable detection by regular expressions of sentence-ending periods rather than those after abbreviations. DCDuring TALK 01:52, 8 November 2015 (UTC)
Symbol support vote.svg Support. It's outdated; I don't know anyone under the age of 50 IRL who does this and doesn't have dyslexia. —Μετάknowledgediscuss/deeds 02:02, 8 November 2015 (UTC)
@Metaknowledge I think you will find that virtually all printed works have extra space after periods. This is actually more useful for proportional fonts and, especially, kerned ones. If we are willing to settle for perpetuating the quaint amateurishness of our displayed pages, by all means we should continue current practice. I'd hoped for better. DCDuring TALK 02:12, 8 November 2015 (UTC)
@DCDuring: That's, er, patently untrue. And I'm surprised that the single space between sentences, as mandated by the Modern Language Association (MLA), Chicago Manual of Style, and every modern professional journalistic style guide I could find quickly, is something you can refer to as "quaint amateurness". Let me guess: you are likely over the age limit I mentioned or nearly so, and you don't recognise that Wiktionary might reflect more modern standards than the ones you were taught in school. —Μετάknowledgediscuss/deeds 03:53, 8 November 2015 (UTC)
@Metaknowledge You are right. I was wrong. The tiny sample that, in my enfeebled and befogged aged state, was all I could muster the energy to take was apparently not enough to detect the now-common practice, which I still think makes text harder for those who share my enfeebled and befogged condition. DCDuring TALK 15:18, 8 November 2015 (UTC)
  • Μετάknowledge, style guides aside, I'm well under 50, and I much prefer having extra space between sentences. Another 2p, anyway. Also, the stylistic convention of using identical spacing between words and between sentences appears to have arisen since WWII, at least in the English-writing world. Works published before then tend to have greater spacing between sentences. See also w:Space_(punctuation)#Spaces_between_sentences and w:Sentence spacing. And by way of other examples, see the list of older publications that exhibit greater inter-sentence spacing in this post of mine on CodeCat's Talk page.
I also find it significant that later versions of CSS added an option to preserve additional whitespace, presumably to prevent the HTML and XML processing standard that collapsed multiple spaces into single spaces, effectively enforcing this style by programmer fiat. This suggests substantial demand for more-flexible spacing standards. ‑‑ Eiríkr Útlendi │Tala við mig 20:28, 9 November 2015 (UTC)
Symbol oppose vote.svg Oppose I sense that single or double space as a sentence separator is the sort of matter of taste that a bot should leave alone. It should not be consistent; it should reflect the preference or habit of the author of the sentence. --Dan Polansky (talk) 15:49, 8 November 2015 (UTC)
  • Just a thought: We have an Oxford-comma template, {{,}}, transcluded about 1200 times. Formerly users could select whether they did or did not want to see the serial comma. It was used to satisfy the different tastes of those who do and those who do not like the Oxford comma. DCDuring TALK 18:17, 8 November 2015 (UTC)
Symbol support vote.svg Support I am aware of the debate here and have in the past used both. In truth, I don't care whether the story of the practice's origins are true or not; it is a convention, and I will not begrudge people their typological conventions. At least on my browser (Chrome using Arial for sans-serif), no difference in spaces appears in the rendered page.
  • Foo. Bar. (1 space)
  • Foo. Bar. (2 spaces)
  • Foo. Bar. (10 spaces)
These all are rendered identically to me, so remove the extra space for all I care. —JohnC5 18:29, 8 November 2015 (UTC)
  • John, the discussion here is (or at least was originally) about the wikitext as shown in the editor, not the rendered HTML as shown in the basic page view or preview. Does that change your view at all? ‑‑ Eiríkr Útlendi │Tala við mig 20:28, 9 November 2015 (UTC)
  • Oh, oops. Still, not particularly. If it doesn't show up on the output, I'd prefer not to have it in the input. —JohnC5 20:35, 9 November 2015 (UTC)

Symbol support vote.svg Support ① The Wikimedia server collapses extra spaces (AND, when the paragraph is not indented by ":", lone hard line-breaks). Any number of spaces (or up to one hard line-break, or both) (as seen in the markup for this sentence) are replaced by exactly one space in the sent HTML (as seen in the HTML of this sentence). ② Even if unlimited numbers of extra spaces and unlimited hard line-breaks were included in the sent HTML, every web browser would collapse every cluster into exactly one space. ③ Therefore every extra space in Wikipedia and Wiktionary markup (and in HTML) is cruft. They are useless, except maybe as a form of steganography. (Hidden messages encoded in the number and placement of non-functional extra characters in the markup.) ④ There is an exception: extra spaces and line-breaks that lie between formatted code tags get passed verbatim in the HTML, surrounded by HTML tags that force the browser to render them verbatim. If not for that exception, it would have been wisest to make the Wikimedia server collapse every string of consecutive spaces into exactly one space, during the save operation of every edit. I wish the Wikimedia server would automatically remove extra spaces and other cruft from the markup, before and after every edit, but I don't expect to see that any time soon. ⑤ Meanwhile, I remove cruft with every edit, and I applaud everyone who does so when they edit. (However, logged edits that only adjust invisible "white space" in the markup, in either direction, by bot or by human, should be a punishable offense.) Likewise, I totally deprecate anyone who intentionally adds cruft, puts back cruft, or wastes everyone's time advocating or defending cruft. ⑥ That all said, I strive to consistently put TWO SPACES after every sentence in my emails, documents, and typesetting, because their formatters DO NOT automatically render wider spaces between sentences, like many have claimed, probably because they cannot reliably tell which periods end sentences. ⑦ In programming, I strive to consistently put ONE SPACE after every sentence, in part because of the monospace fonts used in the IDE editors (even though, when I print out programs (possibly showing my age), I only use proportional fonts).

Ever-so-slightly eroding my support: @Eiríkr Útlendi: "later versions of CSS added an option to preserve additional whitespace, presumably to [override] the HTML ... [rendering] standard that [by default] collapses multiple spaces into single spaces." From one Wikipedia article: "Web browsers usually do not differentiate between single and multiple spaces in source code when displaying text, unless text is given a "white-space" CSS attribute." (That's "white-space:pre" or "white-space:pre-wrap".) Isn't that nice. Now web developers have the option of using two consecutive spaces in the HTML to implement spacing between sentences of exactly 2 spaces... assuming that exactly 2 spaces is what they want there, and that they don't mind rendering 3 or 4 spaces between sentences and between words whenever someone fails to proofread the spaces. This option doesn't affect Wikimedia (or any web site) at all, until they at least change their code to send out all those extra spaces in the HTML. The extra spaces will only render if the web site also adds the needed CSS option, though Wikimedia users could probably opt to enable it in their configuration file, and users of other sites could maybe add it using a script or installing an add-on. IF Wikimedia ever changes to pass the extra spaces (unlikely), then the extra spaces will affect rendering, and then I MIGHT consider tolerating or imposing 2 spaces between sentences (in addition to enforcing 1 space between words). The problem with that is: It breaks my simple editing action. My current action is: Search-and-replace every {space-space} with {space} (but NOT between "formatted text" tags and in aligned tables (while they last)). After the change, the rule is far too complex: only replace {period-space-space} with {period-space} IF it is followed by a lower-case letter OR, when it is followed by an upper-case letter, it was also preceded by a whole word of Mr, Mrs, Dr, Sr, Sra, St, Ste, A, B, C, ... Z, etc.

(Other solutions for wider sentence spacing in HTML were not as easy. Some said use nbsp+sp; I thought of that, tried it, and stopped bothering. A lagging Wikipedia article says "sentence spacing can be controlled in HTML by separating every sentence into a separate element (e.g., a span), and using CSS to finely control sentence spacing."; I hope no one does that.)

In summary, heck yes. Make it a recommendation, or at least NEVER recommend against it. Every user is allowed and encouraged to replace every {space-space} with {space} (except between "formatted text" tags and in aligned tables), incidental to actual edits. No exception for doubled spaces after ".", "!", "?", "¿", "‽", "!!", "??", ":", or even "...". And bots can do the same. (Because Wikimedia should have quietly done it internally, except for the risk of damaging preformatted text if the surrounding tags are missing.) -A876 (talk) 05:14, 11 January 2016 (UTC)

Order of named parameters in templates[edit]

Can or should this be standardized? Should named parameters always come after positional parameters? Should they be sorted? Should "lang=" come after everything? DTLHS (talk) 19:54, 7 November 2015 (UTC)

I like the idea of placing "lang=" after everything. This sounds something like, even if people forget or don't care to do always, bots could fix easily. --Daniel Carrero (talk) 09:13, 8 November 2015 (UTC)
  • Some templates no longer have an explicit lang parameter.
  • I prefer the opposite, where the language comes first. When looking at the wikitext, I'm often interested first in the language of a given term.
  • More importantly, why is this an issue to enforce throughout the entire wiki? Different editors have different preferences. Generally, editors work with specific languages. To some extent, this (and a few of the other WT:NORM proposals) sound like non-issues, where a certain subset of editors is looking to enforce their preferences -- regardless of impact on the site's functioning, regardless of different sets of preferences common to different language sub-communities, and regardless of usability.
As an answer to your stated question, yes, this could be standardized (as you note, it's just another set of rules for a bot), but no, I don't think it should be. ‑‑ Eiríkr Útlendi │Tala við mig 20:40, 9 November 2015 (UTC)

Space between context and definition[edit]

There's a norm that has not been discussed recently, but I suggest adding to the policy later. It should be very uncontroversial, just formalizing existing practice:

  • One space between {{context}} and the start of the definition.

--Daniel Carrero (talk) 06:04, 15 November 2015 (UTC)

I created Wiktionary:Votes/pl-2016-02/Space after context labels. --Daniel Carrero (talk) 06:25, 16 February 2016 (UTC)

Tests[edit]

It would be nice to have a set of examples that all bot owners could test their code against. DTLHS (talk) 21:18, 1 December 2015 (UTC)

@DTLHS If I understood correctly, I'd probably support that if that would help, but maybe the specific set of examples won't be able to meet the needs of all bots. Suppose the bot owner needs to test something that affects specifically Old Church Slavonic entries and the set of examples contains only English entries. So, the bot owner should be allowed to ignore the list in the process of testing their bot anywhere they see fit. --Daniel Carrero (talk) 22:20, 1 December 2015 (UTC)
There isn't anything specific to one language in this set of norms. So even if a bot is only editing one section of a page, it should be able to edit any other section without changing any logic. DTLHS (talk) 22:42, 1 December 2015 (UTC)
I understood your suggestion to mean "We should have some specific entries, such as maybe pizza, chocolate, establishment, where all bot owners could test their codes against." Is it correct? --Daniel Carrero (talk) 23:27, 1 December 2015 (UTC)
I think what was meant was artificial test pages outside the main namespace. --WikiTiki89 23:34, 1 December 2015 (UTC)
That's correct. So we would have tests that covered every rule described on the page, outside of the main namespace. DTLHS (talk) 23:40, 1 December 2015 (UTC)
Sure, as far as I'm concerned, go ahead, if that helps. We already have WT:Sandbox and Special:MyPage/Sandbox (or Special:MyPage/sandbox). Maybe create the artificial entries as subpages of WT:Sandbox. --Daniel Carrero (talk) 23:43, 1 December 2015 (UTC)

Categories at end of language section[edit]

Per the old vote Wiktionary:Votes/2007-05/Categories at end of language section, I suggest adding this rule to the policy:

  • Categories are placed at the end of each language section.

--Daniel Carrero (talk) 21:47, 1 December 2015 (UTC)

Proposal: encompassing reconstruction pages[edit]

I suggest editing this policy to make it encompass reconstruction pages, too. (random page: Appendix:Proto-Germanic/hirdijaz)

Reason: They are already supposed to be formatted like an entry, AFAIK.

One way to implement this is editing the introduction of the policy.

  • Current text: "This is a list of aspects that govern how the wiki code behind an entry should be formatted."
  • Proposed text: "This is a list of aspects that govern how the wiki code behind an entry or a reconstruction page should be formatted."

--Daniel Carrero (talk) 21:57, 1 December 2015 (UTC)

It already applies to them, as reconstructions are also entries. —CodeCat 22:17, 1 December 2015 (UTC)
I didn't realize it, I was interpreting "entries" to mean "0th namespace pages". I propose:
  1. Making it explicit in the policy that reconstructions are included in its scope anyway.
  2. Doing the same for Appendix:Capital letter (which, too, is formatted like an entry) and all the subpages of Appendix:Gestures (they are not perfectly formatted as entries right now, but that can be fixed).
--Daniel Carrero (talk) 22:24, 1 December 2015 (UTC)
I think we should make this explicit. --WikiTiki89 22:29, 1 December 2015 (UTC)

Spaces in links[edit]

It's probably obvious in any wiki, but this rule might be added for completeness later:

  • No spaces between a linked term and the opening or closing brackets. ([[example]], not [[ example ]])

--Daniel Carrero (talk) 21:02, 29 December 2015 (UTC)

It's not obvious. Some wikis seem to encourage those spaces. But I agree. --WikiTiki89 21:11, 29 December 2015 (UTC)
On second thought, I forgot to mention the pipe in the middle:
  • No spaces between a linked term the opening or closing brackets, or between a linked term and the pipe. ([[examples|example]], not [[ examples | example ]])
--Daniel Carrero (talk) 21:22, 29 December 2015 (UTC)
I created Wiktionary:Votes/pl-2016-02/Spaces in links. --Daniel Carrero (talk) 04:53, 15 February 2016 (UTC)

wiki code -> wikitext[edit]

Proposal: replace the 2 instances of "wiki code" in the introduction by "wikitext".

Rationale: Per @This, that and the other, at Wiktionary:Votes/pl-2015-11/NORM: 10 proposals#Support option 3: Wiktionary:Wikicode normalization (WT:WCN). --Daniel Carrero (talk) 11:24, 23 January 2016 (UTC)

Interwikis[edit]

Please, remove interwiki mentions on #Categories and interwikis section. It may be confusing since Cognate deployment. Or reword it simply as "no interwiki links". --Vriullop (talk) 18:47, 27 June 2017 (UTC)

Thanks, I have gone ahead and removed any reference to interwikis; now that we don't have them, this should be noncontroversial. As for adding a section that explicitly disallows interwiki links, that could be a good idea, but any substantive addition will require consensus, and it seems fairly unnecessary. —Μετάknowledgediscuss/deeds 18:50, 27 June 2017 (UTC)

Leading or trailing underscores in template names[edit]

Turns out that underscores are trimmed in template names because they are equivalent to spaces in page names: {{_m_}} (or even {{___m_____}}) transcludes {{m}}. Kinda hilarious. There don't seem to be any template names with leading or trailing underscores, but the Templates section could explicitly indicate that it's not allowed. — Eru·tuon 03:31, 15 April 2019 (UTC)