The following discussion has been moved from Wiktionary:Requests for cleanup.
This discussion is no longer live and is left here as an archive. Please do not modify this conversation, but feel free to discuss its conclusions.
The definition is unclear as to whether files in HTML, RTF, and the like formats are considered "text files" (it says only the one extreme, sans formatting, is so considered, and the other is not), so needs clarification.—msh210℠ 20:09, 19 February 2009 (UTC)
- I'd say the term itself is unclear. Even if you stick to the fairly objective criterion of the MIME media type (see http://www.iana.org/assignments/media-types/; it's what goes in HTTP's "Content-Type" headers and so on), there's some vacillation. HTML is text/html; unspecified XML is both text/xml and application/xml; and XHTML is application/xhtml+xml. (IIRC, the relevant spec says that the difference between text/… and application/… is that if a user agent doesn't recognize a text/… type, it can treat it as text/plain — i.e., it's something expected to be vaguely useful to a human. With this in mind, you can take the change in MIME type from HTML to XHTML either as a change in expected userbase — time was, it was expected that most Netizens could look at HTML and recognize angle brackets as containing "stuff I don't care about" — or as a change in expected context — time was, it was expected that most HTML files were mostly text documents with a bit of markup, which is certainly not the case today. I'm sure there's a document somewhere giving the rationale for this change, but I haven't looked for it.)
- The Google hits for might interest you; you can see some of the different ways people interpret it. Some seem to take it to mean "an ASCII-coded plain-text file, with no special markup and no non-ASCII characters" (one of the extremes you mention); others seem to take it to mean "a file with a specific text-minded character encoding, that you can open in a text editor (assuming it supports the encoding) and do useful things with"; and at least one seems to take it to mean something like "a file with a .txt extension" (regardless of what's actually in the file), which is both more extreme and less extreme than the extreme you mention.
- —RuakhTALK 20:33, 19 February 2009 (UTC)
- This term encompasses at least two different, but perhaps overlapping, attributes of a file: its encoding and its content.
- Technically, the encoding of a file is, broadly, either text or binary. Text encoding denotes a range of types too, whether it be ASCII, ISO-Latin or another code page, Unicode, etc. Most text files are 8-bit bytestreams, but some kinds of Unicode text files, for example, are not. In this sense, all HTML and XML files, all UNIX mbox mailboxes, all tab-delimited data tables, all or most RTF formatted text documents, etc., are text files.
- I think so, since it has no clear definition.—msh210℠ 22:39, 24 February 2009 (UTC)
- Well, it would meet CFI if textfile were verified, but that doesn't make it non-SOP. (textfile and textfiles appear only once each in COCA, so it seems somewhat rare. Perhaps the closed compound is mainly used attributively, and is not exactly equivalent to text file.) —Michael Z. 2010-05-25 13:28 z