Definition from Wiktionary, the free dictionary
- The naming test: Can the candidate for a lexeme be referred to in questions or statements such as the following: ‘What is it called?’ ‘It is called X.’ ‘We call it X, but they call it Y.’
- Membership in a terminological system: [...] Does X encompass other terms; can one say ‘it (dog) is a kind of X (animal)’ (=generic)? Is it a member of a set of similar things; can one say ‘X (a chair) is a kind of Y (furniture)’ (=specific)? Can it be used to show contrast; ‘is it a kind of X (fruit), but not a Y (vegetable)’? Does it have synonyms or antonyms?
- Customary status: Does the use of the phrase imply certain behavior patterns, values, or sequences of activities that are known by society at large? They represent conventionalized knowledge. For example, expected behavior at the front door is different from at the back door (besides their participation in idioms), indicating that these function as cultural units (lexemes) that are more significant than the sum of the parts. Consider go to the mosque, get off work, take a vacation.
- Legal status: Some phrases have such status that they are codified in legal usage: driving under the influence, breaking and entering, assault and battery, justifiable homicide. Even so-called ‘primitive’ societies with unwritten languages have categories of this sort for dealing with things like marriage negotiations and litigations over land, property, and adultery.
- Speech act formulas: Every language has some formulas “which carry out conversational moves” (Pawley 1986:106). For example, excuse me, how are you, y'all have a nice day, etc.
- Use of acronyms: This is often proof that a multi-word phrase represents concepts that have attained conventionalized or institutionalized status. Consider: VIP, DWI/DUI, IQ, RBI, SAT, ASAP, PTO, PTL, AWOL, BS, RSVP, R and R; in Indonesia: KB, DKI, KK, ABRI, DPRD, GBHN, etc.
- Single-word synonyms: the only one of its kind ↔ unique.
- Belonging to a terminological set: This is similar to (2), but focuses more on a pair of antonyms. Consider: tell the truth ↔ tell a lie, take care of ↔ neglect.
- Base for inflected or derived forms: short temper → short-tempered; ooh and ah → oohing and ahing, Indonesian ke mana → dikemanakannya (‘to where’ → ‘wind up where’).
- Internal pause unacceptable: The unacceptability of inserting a pause in the middle of clichés, idioms, and compounds is partial indication of their functioning as a unit. Consider the functional differences between bunch of baloney vs. bunch of bananas. One can say two bunches of bananas, but cannot do the same with the figurative sense of bunch of baloney.
- Inseparability of constituents: Insertion of other material changes the unity or naturalness of a phrasal lexeme. Consider: lead up the garden path. Saying lead up the beautiful garden path shifts it from a figurative to a literal interpretation. This is similar to (10) above.
- Ambiguity as to whether it should be written as a single word: whatchamacallit, thingamajig, man-in-the-street, oneupmanship.
- Conventionally reduced pronunciation: bosun (boatswain), won't, can't, o'clock, Newfoundland, Christmas, Worcestershire, thruppence (threepence) etc.
- Conventionally truncated forms: Widespread occurrence of shortened forms often indicate their role as a lexeme in the language: exam(ination), rad(ical), ex-con(vict), con(vict), con(fidence man), con(fidence trick), ex(-husband/-wife), pro and con, etc.
- Omission of headword: The modifier stands metonymically for the whole: She had an oral (examination), He had a physical (examination), A short (circuit) cut off the (electrical) power.
- Omission of final constituents: This often implies conventionalized knowledge: If you can’t beat ’em..., A stitch in time..., I haven’t the faintest (idea). These elided forms are often marked by peculiar intonation.
- Stress and intonation patterns: Different languages give different phonological clues for what is seen to function as a unit. English often uses stress and intonation. Government jargon is often coined through these means. Consider political matters memorandum.
- Invariable constituents or grammatical frame: The demanding and rhetorical Who do you think you are? does not have the same impact in the future. Kick the bucket does not mean the same when put in the passive. The thought had crossed my mind, and he took the law into his own hands are unnatural in the passive. Compare also stripped down formulaic sentences easier said than done, spoken like a man! There are also syntactically irregular or archaic idioms like easy does it, no go, no way, be that as it may, (she) wants in, once upon a time.
- Use of definite article on first mention: In English this can indicate the conventionalized nature of the ‘object’, showing the speaker assumes the identity is understood by the addressee: the fire department, the foreign legion, the eight ball.
- Writing conventions: Where there is a written tradition these may provide clues to perceived status as a unit. Capitals may indicate lexemes that are not typical proper nouns: Third World, Big Bang, Inner City. Beware that where a society has the luxury of supporting a literary community, some writers manipulate the use of capitals for unconventional purposes. Quotation marks may also indicate unitary status: he was considered a ‘bad boy’. Orally, some speakers use so-called or a preceding pause to mark an equivalent to quote marks.
- Unpredictability of form-meaning relation in semantic idioms: kick the bucket, chew the fat, shoot the breeze.
- Arbitrary selection of one meaning: Notice that button hole is a hole FOR putting buttons THROUGH, whereas bullet hole is a hole MADE BY bullets, post hole is a hole FOR setting posts IN, etc.
- Use in ritual language of parallelism: This is a special case of (2) and (8). Ritual language in parallelisms is widespread. It is found, for example, in Biblical Hebrew and many Austronesian languages, particularly in eastern Indonesia (Fox 1988). Existence as a paired entity in this context is sufficient for justifying its status as a conventionalized unit, and hence a lexeme.