Module:User:Theknightwho/get header

From Wiktionary, the free dictionary
Jump to navigation Jump to search

This is a private module sandbox of Theknightwho, for their own experimentation. Items in this module may be added and removed at Theknightwho's discretion; do not rely on this module's stability.


-- Both functions return the page section of the calling template; i.e. the total number of ==headers== (of all levels) above it in the wikitext. This includes the page title (an L1 header).

-- They both make several assumptions about how pages are parsed, and cannot necessarily be relied on in the long-term.

-- Currently, pages are parsed forwards. frame:preprocess("=a=") iterates the parser's current header count, and generates a strip marker of the form '"`UNIQ--h-X--QINU`"', where X is the current section header count plus 1 (i.e. it generates a new 'fake' header). From that, we can determine the current section.

-- However, because the only way to read the header count is to iterate it, it must be offset for each previous call on the page. As such, the offset needs to be stored somewhere that can be accessed later.

-- WARNING: These will break if anything else runs frame:preprocess on text with headers, but that hopefully shouldn't be an issue, because the 'fake' headers don't actually work properly if you return them as the section edit links are broken.

local export = {}

local frame = mw.getCurrentFrame()

-- get_header1 is more elegant and efficient, but takes advantage of a nowiki strip marker hack that will likely be patched out at some point, as it allows arbitrary information to be passed directly between invokes.

-- It's possible to add new nowiki strip markers to the parser's internal storage by running frame:extensionTag("nowiki", text), which will generate a strip marker of the form '"`UNIQ--nowiki-XXXXXXXX-QINU`"', where XXXXXXXX is a hexadecimal. Its contents can be retrieved in another invoke with mw.text.unstripNoWiki. Unlike headers (which fortunately work on a dedicated counter), all other strip markers share one. So to obtain it, we iterate the counter with frame:extensionTag, and then count backwards in a loop running mw.text.unstripNoWiki until we find a match. To avoid false positives, we make sure the offset is in the form HEADER_OFFSET:X.
function export.get_header1()
	local header = tonumber(frame:preprocess("=a="):match("[0-9A-F]+"))
	local i, offset = tonumber(frame:extensionTag("nowiki", ""):match("[0-9A-F]+"), 16)
	repeat
		offset = mw.text.unstripNoWiki(("\127'\"`UNIQ--nowiki-%08X-QINU`\"'\127"):format(i)):match("HEADER_OFFSET:(%d+)")
		i = i - 1
	until offset or (i < 0)
	offset = (offset or -1) + 1
	header = header - offset + 1
	frame:extensionTag("nowiki", "HEADER_OFFSET:" .. offset)
	return header
end

-- get_header2 is more cumbersome, but uses a method that is less likely to be seen as an exploit, and is harder to patch out.

-- It takes advantage of the fact that anything loaded by mw.loadData remains static: it iterates the header count with a local frame:preprocess("=a="), and then runs mw.loadData(...get_header/1), which iterates the header count again with the same call. However, when it is run subsequent times, it will load ...get_header/1, ...get_header/2, etc. iteratively, and any that have been run before will return values lower than the newly calculated header count. The loop ends when a higher value is returned, and the offset is then calculated from the number of loops.

-- Obviously this becomes an issue for large pages, where 200+ submodules may be required.
function export.get_header2()
	local header = tonumber(frame:preprocess("=a="):match("[0-9A-F]+"))
	local offset, n = 0
	repeat
		offset = offset + 1
		n = mw.loadData("Module:User:Theknightwho/get_header/" .. offset)[1]
	until n == header + 1
	offset = 2 * (offset - 1) - 1
	return header - offset
end

return export