Filters and Shortcodes

Extending Quarto workshop @ posit::conf(2025)

Charlotte Wickham
&  
Mine Çetinkaya-Rundel

What are filters?

What are filters?

Filters manipulate the AST between the parsing and the writing phase. First, understand the AST.

AST: A document is composed of Blocks

example.qmd
I **really** like bold and *really* like italics, and *really **really*** can't decide which to use.

:::{.shout}
And sometimes I **really** need to shout!

It's like I want to `toupper()` everything!
:::

AST: Some Blocks contain other Blocks

example.qmd
I **really** like bold and *really* like italics, and *really **really*** can't decide which to use.

:::{.shout}
And sometimes I **really** need to shout!

It's like I want to `toupper()` everything!
:::

AST: Some Blocks contain Inlines

example.qmd
I **really** like bold and *really* like italics, and *really **really*** can't decide which to use.

...

AST: Diagrams might collapse Str and Space

example.qmd
I **really** like bold and *really* like italics, and *really **really*** can't decide which to use.

:::{.shout}
And sometimes I **really** need to shout!

It's like I want to `toupper()` everything!
:::

A filter function transforms a type of node

E.g. A Strong filter function

Output: a single node of the same kind

E.g. A single Inline. Node is replaced.

Output: an array of nodes of the same kind

E.g. An array of Inline, a.k.a an Inlines. Spliced in.

Output: an empty array

Node is removed

Output: nil

Node is unchanged

AST and filter review

  • A document is an array of Block elements
  • Some Block elements contain other Block elements
  • Some Block elements contain Inline elements
  • Some Inline elements contain other Inline elements

A filter function is called on every instance of a particular type of node.

The input is the node itself, the output replaces the node.

Your turn: AST

Take a look at the AST diagram on the next slide.

  • What are some other types of Block nodes?

  • What are some other types of Inline nodes?

  • If we wrote a filter for Para, how many times would it be called?

  • If we wrote a filter for Str, which of the following would be affected?

    • The text Filter in the title
    • The text Introduction in the heading
    • The text Lua in the link text
    • The text lua-filters in the link URL
    • the text quarto in the code block
06:00

Exercise: 03-filters/your-turns/1-explore-ast

AST

View Full Screen

Solution

  • Other Block nodes: Header, BulletList, CodeBlock, Meta is special.

  • Other Inline nodes: Link, Image, Code

  • A filter function for Para would be called four times.

  • Affected by a Str filter function?

    • The text Filter in the title Yes
    • The text Introduction in the heading. Yes
    • The text Lua in the link text. Yes
    • The test lua-filters in the link URL. No
    • The text quarto in the code block. No

Problems solved with filters

  • Remove # fmt: skip comments from code cells. Discussion

  • Number all callouts. Discussion

  • Put the contents of an SVG image in a raw HTML block rather than using <img>. Discussion

  • Display the language on every code cell. Discussion

  • Collect all code chunks and display in a code appendix. Discussion

Writing filters

Writing filters

Filters are written in the programming language Lua.

A filter is a Lua file that contains one or more filter functions.

A filter function is a function whose name is a type of node.

A filter function on Strong nodes

Define a function named Strong:

no-change.lua
Strong = function(el)
  return nil
end

To use the filter, specify it in the document header:

example.qmd
---
title: "Filter Basics"
filters:
  - no-change.lua
---

A filter function that returns nil, leaves the node unchanged.

Example: 03-filters/examples/1-writing-filters

Live Code: “Print” debugging

no-change.lua
Strong = function(el)
  quarto.log.output("Here!")
  quarto.log.output(el)
  return nil
end

quarto.log.output(): Positron/VS Code look in Terminal, RStudio look in Background Jobs.

  • Strong filter function is called twice.
  • el is an Strong object, an example of an Inline.
  • el contains a content field which is an Inlines.

Live Code: Replace bold text with italic text

replace-strong.lua
Strong = function(el)
  return pandoc.Emph(el.content)
end
  • pandoc.Emph() creates a Emph node another example of an Inline node.
  • el.content gets the content field from the el object.

Other similar types of Inline elements

See Pandoc Lua types Quick Reference

Your turn: Write a filter

  1. Write a filter, replace-emph.lua, that turns all italic text to underlined text.

  2. Add the Strong filter function from replace-strong.lua to replace-emph.lua. What happens?

Other challenges:

  • Write a filter that removes all bold and italic formatting, leaving just the text.

  • Write a filter that converts all double quotes to single quotes.

10:00

Exercise: 03-filters/your-turns/2-write-a-filter

Solution

replace-emph.lua
Emph = function(el)
  return pandoc.Underline(el.content)
end

Solution

remove-all.lua
Emph = function(el)
  return el.content
end

Strong = function(el)
  return el.content
end

Solution

replace-double-quotes.lua
Quoted = function(el)
  if el.quotetype == "DoubleQuote" then
    return pandoc.Quoted("SingleQuote", el.content)
  end
end

Valid return values

A filter on an Inline must return either:

  • nil, node is unchanged, e.g. no-change.lua
  • an Inline which replaces the original, e.g. replace-strong.lua
  • a list of Inline (known as an Inlines) which replaces the original, spliced into its siblings.

An Inlines with three elements

double-strong.lua
Strong = function(el)
  return pandoc.Inlines({
    el, 
    pandoc.Space(), 
    el
  })
end

I really really like bold and really like italics, and really really really can’t decide which to use.

Example: 03-filters/examples/2-return-types

An empty list

remove-strong.lua
Strong = function(el)
  return {}
end

I like bold and really like italics, and really can’t decide which to use.

Example: 03-filters/examples/2-return-types

Common mistake: an array of Inlines

This won’t work because el.content is an Inlines object:

return pandoc.Inlines({el.content, el.content})

 

Error running filter /Applications/quarto/share/filters/main.lua:
Inline expected, got Inlines

See a useful pattern in the next section.

Targeting specific content

Live Code: Targeting specific content

target-span.qmd
---
title: "Filter Basics"
filters:
  - shout.lua
---

I **really** like bold and *really* like italics, and *really **really*** can't decide which to use.

[And sometimes I **really** need to shout]{.shout}

Example: 03-filters/examples/3-target-text

Live Code: Smallcaps all spans with class shout

shout.lua
Span = function(el)
  if el.classes:includes("shout") then
    return pandoc.SmallCaps(el.content)
  end  
end

Example: 03-filters/examples/3-target-text

Live Code: Constructing content

shout.lua
Span = function(el)
  if el.classes:includes("shout") then
    local result = pandoc.Inlines({})
    result:extend(el.content)
    result:insert(pandoc.Str("!"))
    return result
  end  
end

A useful pattern:

  • Create an empty Inlines object
  • Use extend() to add Inlines
  • Use insert() to add Inline

Your turn: Simon says

Complete says.lua, a filter that:

  • targets Span elements with class says, and
  • turns them into “Simon says” instructions.

E.g. source:

Before
[Write a filter]{.says}

Becomes equivalent to:

After
Simon says "Write a filter"

Challenge: Instead of Simon, let the user specify the name as an attribute, e.g. [Write a filter]{.says name="Charlotte"}

10:00

Exercise: 03-filters/your-turns/3-simon-says

Solution

says.lua
Span = function(el)
  if el.classes:includes("says") then
    result = pandoc.Inlines({})
    result:insert(pandoc.Str("Simon says "))
    result:insert(pandoc.Quoted('DoubleQuote', el.content))
    return result
  end
end

Solution

says.lua
Span = function(el)
  if not el.classes:includes("says") then    
    return nil
  end
  local name = el.attributes.name or "Simon"
  local results = pandoc.Inlines({})
  results:insert(pandoc.Str(name))
  results:insert(pandoc.Str(" says "))
  results:insert(pandoc.Quoted('DoubleQuote', el.content))
  return results
end

Filters in practice

Target content in a div

target-div.qmd
---
title: "Filter Basics"
filters:
  - shout.lua
---

::: {.shout}
And sometimes I **really** need to shout

```r
library(scream)
```
:::
shout.lua
Div = function(el)
  if not el.classes:includes("shout") then
    return nil
  end
  local result = pandoc.Blocks({})
  -- Transform el, construct result
  return result 
end

Example: 03-filters/examples/4-filters-in-practice/1-target-content-in-div

Use walk to apply filter functions to children

target-div.qmd
---
title: "Filter Basics"
filters:
  - shout.lua
---

::: {.shout}
And sometimes I **really** need to shout

```r
library(scream)
```
:::
shout.lua
Div = function(el)
  if not el.classes:includes("shout") then
    return nil
  end
  result = el:walk({
    Str = function (el) 
      -- filter function on Str
    end
  })
  return result.content
end

Example: 03-filters/examples/4-filters-in-practice/2-walk-children-nodes

Construct format specific output

example.qmd
---
title: "Format-Specific Output"
format: pdf
filters:
  - shout.lua
---

::: {.shout}
And sometimes I **really** need to shout

```r
library(scream)
```
:::
shout.lua
Div = function(el)
  if not el.classes:includes("shout") then
    return nil
  end

  if quarto.format.is_latex_output() then
    local result = pandoc.Blocks({})
    -- use `pandoc.RawBlock('latex', )`
    return result
  end
  
end

Example: 03-filters/examples/4-filters-in-practice/3-format-specific-output

Filter function on Meta to examine metadata

article.qmd
---
author:
  - name: Mine Çetinkaya-Rundel
    orcid: 0000-0001-6452-2420
    email: mine@posit.co
    affiliations:
      - name: Duke University
      - name: Posit, PBC
  - name: Charlotte Wickham
    orcid: 0000-0002-6365-5499
    email: charlotte.wickham@posit.co
    affiliation:
      - name: Posit, PBC
---
meta.lua
Meta = function(meta)
  quarto.log.output(meta)
end

Example: 03-filters/examples/4-filters-in-practice/4-meta-filter

Controlling the order of filter functions

Filter functions in the same filter are run in a specific order: Inline elements, Inlines(), Block elements, Blocks(), Meta(), Pandoc().

Specify a different order by returning an array of filter sets.

meta.lua
local string_out = ""
return {
  { -- this set is run first
    Meta = function (meta)
      -- store `string_out`
    end
  },
  { -- this set is run second
    Div = function(el)
      -- use `string_out`
    end
  }
}

Example: 03-filters/examples/4-filters-in-practice/5-filter-sets

Controlling when a filter runs

Quarto’s internal filters are grouped and run in sequence: ast, quarto, render.

By default, custom filters are run pre-quarto.

You might need to run a filter later, e.g. after quarto has processed cross-references.

You can specify with at e.g:

example.qmd
filters:
  - at: post-quarto
    path: shout.lua

Also pre-ast, post-ast, pre-render, post-render

Wrapping Up

Filter extensions

quarto create extension filter creates boilerplate. Drop your .lua files in.

shout/
├── README.md
├── _extensions
│   └── shout
│       ├── _extension.yml
│       └── shout.lua
└── example.qmd
_extension.yml
title: Shout
author: Charlotte Wickham
version: 1.0.0
quarto-required: ">=1.7.0"
contributes:
  filters:
    - shout.lua

Users must opt-in to extension under filters:

example.qmd
---
filters:
  - shout
---

Filters in custom format extensions

shouty/
├── README.md
├── _extensions
│   └── shouty
│       ├── _extension.yml
│       └── shout.lua
└── template.qmd
_extension.yml
title: Shouty
author: Charlotte Wickham
version: 1.0.0
quarto-required: ">=1.7.0"
contributes:
  formats:
    html:
      filters:
        - shout.lua

Users specify format: shouty-html, and get filter applied automatically.

Shortcodes

Lua functions that insert their output into the AST.

hello.qmd
---
shortcodes:
  - hello.lua
---

{{< hello >}}
hello.lua
hello = function ()
  return pandoc.Str("Hi there!")
end

Can take arguments: args, kwargs, meta, raw_args, context

Learning Lua

https://quarto.org/docs/extensions/lua.html#learning-lua

I also quite liked: https://ebens.me/posts/lua-for-programmers-part-1/

Questions?

AST diagrams are WIP

The AST diagrams you’ve seen are produced using Pandoc’s version of markdown.

Quarto specific features won’t appear in the AST diagrams as you might expect. E.g. cross-references, executable code blocks (ones with {), shortcodes, callouts, etc..

Use quarto.log.output() to examine the AST as it is when your filter is run.

This will improve!