alignment tax

English

Etymology

First attested in a 2019 speech by computer scientist Paul Christiano (see quote), who attributed the idea to AI researcher and writer Eliezer Yudkowsky.

Noun

alignment tax (plural alignment taxes)

(artificial intelligence) A cost to the capabilities of an artificial intelligence resulting from the effects of aligning it with human ethics and morality. [from late 2010s]
- 2019 August 29, Paul Christiano, Current work in AI alignment‎^[1], EA Global San Francisco 2019:
  I like this notion of an "alignment tax" […] the reason I might compromise is if there's some tension, between having the AI that's robustly trying to do what I want, and having the AI that is competent or intelligent, and the alignment tax is intended to capture that gap—that cost that I incur if I insist on alignment.
- 2021 December 1, Askell, A. et. al., “A General Language Assistant as a Laboratory for Alignment”, in arXiv‎^[2], →DOI:
  The fact that larger models are less subject to forgetting may be related to the fact that larger models do not incur significant alignment taxes.
- 2022 March 4, Ouyang, L. et. al., “Training language models to follow instructions with human feedback”, in arXiv‎^[3], →DOI:
  We want an alignment procedure that avoids an alignment tax, because it incentivizes the use of models that are unaligned but more capable on these tasks.
- 2023 February 27, Kornai, A. et. al., “Safety without alignment”, in arXiv‎^[4], →DOI:
  We note that instead of an alignment tax our proposal entails a safety dividend – the more rational the system the more capable and the safer it will be.

alignment tax

English

Etymology

Noun

Navigation menu

Search