Skip to article frontmatterSkip to article content

Cross Reference Simplifications using Markdown Links

Authors
Affiliations
Curvenote Inc.
Curvenote Inc.

Summary

We propose a cross-reference syntax that uses CommonMark links to support all use cases of cross-referencing content internal to a project. The syntax aims to be familiar and work across different rendering platforms. Most internal content can be referenced using a hash-link, [](#my-id), which is the recommended replacement for the multiple role options that can do this in MyST currently (e.g. {ref}`my-id`, {eq}`my-id`, {numref}`my-id`). We provide options for increasing specificity for these links in all cases to deal with duplicate references across pages in a project.

Existing SyntaxNew Syntax
[](my-id)[1][](#my-id)
{ref}`my-id`[](#my-id)
{eq}`my-equation`[](#my-equation)
{ref}`Custom Text <my-id>`[Custom Text](#my-id)
{numref}`See "{name}" <my-id>`[See "{name}"](#my-id)
{numref}`Custom Number %s <my-id>`[Custom Number {number}](#my-id)
{numref}`Custom Number {number} <my-id>`[Custom Number {number}](#my-id)
{doc}`my-doc`[](my-doc.md)
{doc}`my-doc`[](../examples/my-doc.md)
{download}`my-doc.zip`[](my-doc.zip)

Context

In MyST (and Sphinx) there are many ways to cross-reference content:

  • {ref}`reference-target` - generic referencing
  • {numref}`Custom Table %s text <my-table-ref>` - numbered references for most numbered elements
  • {eq}`my-equation` - referencing equations
  • {any}`any-reference` - referencing anything, including terms and python classes
  • {doc}`./my-file.md` - referencing other documents
  • {download}`pdf <doc/mypdf.pdf>` - downloading content

These are all powerful roles, encoding semantic meaning and providing rich inter-linked content. These links can also be used to power rich user-interfaces, such as sphinx-hoverref. There are also simple configuration options for adding new external links in Sphinx. However, the breadth, verbosity, and overlapping functionality of these roles can be confusing and unfamiliar to new users.

For example:

  • It is not possible to use numref for equations, you must use the specific {eq} roles.
  • There is functional overlap between numref and ref, with numbering -- a common activity in scientific/technical writing -- not possible with a generic ref.

Additionally, there is currently not overlap with CommonMark syntax that can, for example, reference a section header with a hash [header](#context). This has the advantage that the syntax works in multiple platforms and is a familiar pattern from using website links.

Design Goals

Our goal with this MEP is to provide a simplified syntax to make use of markdown links, and tap into rich cross-referencing capabilities. In this MEP we aim to balance:

Reuse Existing Standards
where possible syntax should reuse existing practices and standards, for example, CommonMark compliance
Graceful Degradation
syntax should aim to render with reduced functionality in places that don’t support MyST
Memorability
having a syntax that is easy to remember
Readability
having a syntax which people can understand at a glance
Terseness
limiting “boilerplate” syntax
Extensibility
having syntaxes that will not limit us from adding features in the future

Specifically for this MEP, our proposal aims to:

  • Provide a concise markdown link syntax to hook into Sphinx cross-referencing infrastructure
  • Ensure that this syntax can be used by multiple projects (e.g. myst-parser (sphinx/python), mystjs, and future ways to cross-reference content (e.g. intersphinx, JATS or other structured formats))
  • Ensure that the markdown rendering gracefully degrades on other rendering platforms (e.g. GitHub, Jupyter)
  • Continue to support existing roles, and have as few deprecations in syntax as possible
  • Support styled links and cross-references (e.g. with bold or italic in the reference)
  • Ideally provide an extension point for new types of cross-references (e.g. academic DOIs or other structured content like wikipedia)
  • Follow web-standards / conventions for URLs where possible (e.g. query strings, protocols)
  • Design new additions to the myst-spec AST that can provide rich information to renderers

These link improvements are completed in the context of supporting (1) academic citations; and (2) intersphinx cross-references. However, this MEP does not specifically support the intricacies of intersphinx, bibliographies or referencing. We encourage future MEPs to address these concerns.

Background

The MEP aims to build on the existing CommonMark link format, which come in three forms (see spec). In the current MEP we are not proposing any changes to CommonMark - and are designing a cross-referencing syntax that can work with existing links. For context, the three CommonMark link types are:

  1. Inline links with optional text or titles:

    [Explicit *Markdown* text](destination "optional explicit title")
    
    or, if the destination contains spaces,
    
    [text](<a destination>)
  2. Reference links, which define the destination separately in the document and can be used multiple times:

    [Explicit *Markdown* text][label]
    
    [label]: destination "optional explicit title"
  3. Autolinks are URIs surrounded by < and >:

    <scheme:path?query#fragment>

In most cases, the scheme[2] (e.g. https:, mailto:, or ftp:) is optional and assumed to be a web URL (http:). For autolinks, however, the scheme is required; this is designed to disambiguate inline HTML elements (i.e. <b> is not a link, but <https://executablebooks.org/> is).

Current MyST Markup

Currently MyST supports external URLs (e.g. http:, https:, ftp:, mailto:) and uses Sphinx or Docutils for cross-references. The current supported syntax is listed for each component below:

External Links

  • <https://example.com> - CommonMark auto link syntax can be different configurable schemes/protocols (e.g. http, https, ftp, or mailto).
  • [Some *text*](https://example.com "title") - A styled link to an external URL, including a title shown on hover.

Figures, Sections

  • {ref}`my-fig-ref` - standard reference, will be filled in with the figure/section title
  • [](my-fig-ref) - using existing MyST link syntax (note this does not have a #)
  • [My cool fig](my-fig-ref) - labeled reference
  • {numref}`my-fig-ref` - numbered reference (e.g. “Fig. 1”)
  • {numref}`Custom Figure %s text <my-fig-ref>` - custom numbering for the figure, using %s or {number}

Equations

  • {eq}`my-eqn` - uses a custom role, only for equations, no ability to override title
  • {math:numref}`my-eqn` - a verbose form of the eq role (see note about “Domains” below).
  • [](my-math-ref) - using existing MyST link syntax (note this does not have a #)

Documents

  • {doc}`my-eqn` - a doc role.
  • [](../file-types/myst-notebooks.md) - a relative path with POSIX path separators /, if no title provided MyST will fill in with the referenced title (or file name).
  • [A different page](../file-types/myst-notebooks.md) - using a custom title.

Intersphinx

Multiple other sphinx documentation sites can be referenced in MyST syntax (Sphinx documentation). For example, the python documentation can be referenced from a configuration (e.g. the intersphinx_mapping in conf.py), which points to the appropriate intersphinx inventory (e.g. https://docs.python.org/3) containing a *.inv file.

  • {external+python:py:class}`zipfile.ZipFile` - a reference to the Python class ZipFile
  • {py:class}`zipfile.ZipFile` - a short-hand reference that will search locally first, then any referenced inventories

Styling

All link syntax supports styling inside of the reference, (e.g. [A **bolded _reference_** to a page](./myst.md)) the reference role syntax currently does not support styling of the inner content.

Proposal

We propose a cross-reference syntax that uses CommonMark links in all three forms. The goal is to support all use cases of cross-referencing with the most common use cases of referencing a document, file, section or element being simple, terse and familiar.

Overview:

Existing SyntaxNew Syntax
[](my-id)[1][](#my-id)
{ref}`my-id`[](#my-id)
{eq}`my-equation`[](#my-equation)
{ref}`Custom Text <my-id>`[Custom Text](#my-id)
{numref}`See "{name}" <my-id>`[See "{name}"](#my-id)
{numref}`Custom Number %s <my-id>`[Custom Number {number}](#my-id)
{numref}`Custom Number {number} <my-id>`[Custom Number {number}](#my-id)
{doc}`my-doc`[](my-doc.md)
{doc}`my-doc`[](../examples/my-doc.md)
{download}`my-doc.zip`[](my-doc.zip)

All of the above link examples can be easily complemented by both adding Title Link syntax and Reference Link syntax (i.e. [Explicit *Markdown* text][label]). We have omitted the auto-link syntax from the overview for brevity, they are shown in detail below. In all cases, the existing role syntax should continue to work and receive ongoing support from the parser(s).

Syntax

The parts of the link are [text](link "title") with an optional scheme ([text](scheme:link "title")). The "title" is not modified in our proposed syntax. Auto Link syntax, <scheme:link>, requires the scheme to be present, we follow the CommonMark definition of a scheme.

text

If the text is not included, it will be filled in by the default of the target.

  • If the reference target is enumerated this will by default be “Sec 2.3”; “Fig. 1” for enumerated figures; “(5)” for equations; “Thm. 2”, “Lemma 3”, etc. depending on the target type. This can be customized in project configuration or frontmatter, however, the details of that are out of scope for this MEP.
  • If the reference target is not enumerated, the title will be used. This may be the section header text (including syntax/style) or a figure caption.
  • If the node is not enumerated and does not have a title, the reference label will be used.

If the text is included it will be used as is with two additional template values ({number} and {name})

  • Enumeration ({number})

    • Any reference target that is enumerated can reference that number or string with a {number}; the {number} is only the enumerator and does not include any preceding text.
      • For example, if referencing #my-section with default text of Section 2.1.2, you can change the reference to See 2.1.2 with [See {number}](#my-section).
    • If a {number} is used and the node is not enumerated, the {number} will be replaced by “??” and a warning raised.
  • Reference text ({name})

    • Any node can include the text of a reference including any styles (e.g. Sections are the section title; Figures and Tables are the caption).
    • If {name} is used and the node does not have an explicit name or title, the node reference label will be used.

In both cases, the template can be escaped with a preceding backslash, that is \{number} or \{name}, and the text will not be replaced.

The links are defined by a scheme, which can be standard protocols (http:, mailto:). Here we propose two new schemes, project and path: the project scheme allows for cross-referencing pages, sections, equations, figures, or other components of a MyST project; the path scheme allows for referencing files outside of the project or explicitly downloading the source of a document in the project. The schemes are an extensibility point specifically described by CommonMark and are used to indicate that the link should be resolved by MyST specific logic. They follow standard URI syntax:

URI = scheme ":" pathname ["?" query] ["#" fragment]

In most cases, as seen in the summary above the scheme is optional and can be inferred safely by the context. The exception is when explicitly referring to an external MyST site, Jupyter Book or Sphinx documentation site. These URIs can be safely and easily parsed by any common URL parser. For example in Javascript:

const url = new URL('project:target.md#my-ref');
url.protocol; // "project:"
url.pathname; // "target.md"
url.hash; // "#my-ref"

The following links and references are supported:

Link TypeAuto LinkInline
External URL<https://example.com>[](https://example.com)
Local file download<path:file.txt>[](file.txt)
File download (explicit)<path:file.md>[](path:file.md)
Project document<project:file.md>[](file.md)
Target in a document<project:target.md#file>[](file.md#target)
Target in project<project:#target>[](#target)

Search Order and Specificity

All references search the local document first[3], then the local project in the order of the table of contents. A xref_ambiguous warning is raised if multiple matches are found.

In large documentation sites, a referenced target can be present in multiple documents, in that case, the parser will emit a xref_ambiguous warning letting you know that there are multiple matches for the intended target. If a link cannot be resolved, an external link should be rendered, for example, <a href="#target">#target</a>.

Implicit Section Headers

We suggest a configuration option to create anchor “slugs” for section headers, which stay close to the GitHub implementation and produces references that:

  • lower-case text
  • remove punctuation
  • replace spaces with -
  • enforce uniqueness via suffix enumeration -1

For example, ## Links and Referencing can be referenced as [](#links-and-referencing). Every heading level in a document should have an anchor, however, these are implicit references, and referring to them can raise an xref_implicit warning, which can optionally be suppressed by users.

Implicit references are not available project wide, and are only accessible in the current document, as many documents follow similar structures (Abstract, Introduction, Methods, Summary). Adding two sections of the same name does not raise a duplicate identifier warnings (xref_duplicate), section identifiers are only unique to the document.

Paths

  • Links can be relative from the containing file.
  • Links can be absolute paths from the project root.
  • Absolute paths start with /.
  • The path separator is POSIX /.
  • The path must include the extension.

Downloads

Files that are outside of the table of contents of the project and are referenced directly are downloads.

  • These follow the same path rules as above when referring to a unknown file-type (e.g. ./my-file.txt).
  • Files with known file extensions (e.g. *.md, *.ipynb) will default to being document links, not downloads, with the document title being used as the default text.
  • Downloads can be specified explicitly by a <path:my-file.md>, which will revert to a download regardless of the extension. (i.e. this overrides the default for [](my-file.md), which is <project:my-file.md>).

Warnings and Errors

xref_missing
There is a missing reference, that could not be found.
xref_implicit
You are referencing an implicit reference which could change easily in the future, consider making this explicit.
xref_unsupported
Raised if the the current environment does not support the reference look up. For example, single page builds.
xref_ambiguous
Raised when multiple conflicting targets are matched.
xref_legacy
Raised when a [](ref) is used in place of [](#ref).
For example, "Legacy syntax used for link target, please prepend a ‘#’ to your link url: “{link.url}” in “{document}”.

Specification AST

The links should follow the link AST for external links. For internal project cross-references, these should be resolved to a crossReference node (spec).

For external project links, we will extend the link object with additional data that includes the url source (urlSource), the scheme name (e.g. project or download), whether the link is internal (e.g. false), and additional optional metadata about the page that may be helpful to a renderer.

Extensibility

We hope that this syntax will be helpful in simplifying the cross-reference experience in MyST. Additionally, we believe that the scheme/protocol extension point is a powerful way to add rich cross-referencing ability to other types of structured data sources. We expect a future MEP to introduce additional logic to resolve intersphinx references, and other structured data. For example, one could imagine a <wiki:Gravitational_Waves> extension that cross-references pages in Wikipedia, or a <doi:10.5281/zenodo.6476040> extension that adds additional information about DOIs. For simple link replacements, this syntax could also be extended with simple configuration options, similar to the extlinks feature in Sphinx (see documentation).

UX implications & migration

All of the syntax is CommonMark compliant and introduces new capabilities to resolve cross references. All existing roles are being maintained indefinitely to ensure compatibility with existing content as well as long-term compatibility with Sphinx. We suggest that documentation is updated to highlight the new, consistent markdown-link references with the old styles being moved to compatibility sections.

There is a single deprecation of the existing markdown link syntax that references a target and does not have a #. When parsers encounter a legacy linked reference, they should raise an xref_legacy warning.

Questions or objections

File Protocol
We want to minimize the additional syntax, and it was suggested that we use the file: protocol. The file: protocol is a security concern in markdown parsing, and was not chosen.

References

Additional projects, specs, configuration and syntax consulted:

Other context and links:

Footnotes
  1. This is backwards compatible, however, now raises a xref_legacy warning for old syntax.

  2. the URL scheme is also known as the URL protocol.

  3. With the exception of an explicit reference to a specific page, i.e. [](./examples/my-doc.md#explicit-reference)