Cross Reference Simplifications using Markdown Links
Summary¶
We propose a cross-reference syntax that uses CommonMark links to support all use cases of cross-referencing content internal to a project.
The syntax aims to be familiar and work across different rendering platforms.
Most internal content can be referenced using a hash-link, [](#my-id)
, which is the recommended replacement for the multiple role options that can do this in MyST currently (e.g. {ref}`my-id`
, {eq}`my-id`
, {numref}`my-id`
).
We provide options for increasing specificity for these links in all cases to deal with duplicate references across pages in a project.
Existing Syntax | New Syntax |
---|---|
[](my-id) [1] | [](#my-id) |
{ref}`my-id` | [](#my-id) |
{eq}`my-equation` | [](#my-equation) |
{ref}`Custom Text <my-id>` | [Custom Text](#my-id) |
{numref}`See "{name}" <my-id>` | [See "{name}"](#my-id) |
{numref}`Custom Number %s <my-id>` | [Custom Number {number}](#my-id) |
{numref}`Custom Number {number} <my-id>` | [Custom Number {number}](#my-id) |
{doc}`my-doc` | [](my-doc.md) |
{doc}`my-doc` | [](../examples/my-doc.md) |
{download}`my-doc.zip` | [](my-doc.zip) |
Context¶
In MyST (and Sphinx) there are many ways to cross-reference content:
{ref}`reference-target`
- generic referencing{numref}`Custom Table %s text <my-table-ref>`
- numbered references for most numbered elements{eq}`my-equation`
- referencing equations{any}`any-reference`
- referencing anything, including terms and python classes{doc}`./my-file.md`
- referencing other documents{download}`pdf <doc/mypdf.pdf>`
- downloading content
These are all powerful roles, encoding semantic meaning and providing rich inter-linked content. These links can also be used to power rich user-interfaces, such as sphinx-hoverref. There are also simple configuration options for adding new external links in Sphinx. However, the breadth, verbosity, and overlapping functionality of these roles can be confusing and unfamiliar to new users.
For example:
- It is not possible to use
numref
for equations, you must use the specific{eq}
roles. - There is functional overlap between
numref
andref
, with numbering -- a common activity in scientific/technical writing -- not possible with a genericref
.
Additionally, there is currently not overlap with CommonMark syntax that can, for example, reference a section header with a hash [header](#context)
.
This has the advantage that the syntax works in multiple platforms and is a familiar pattern from using website links.
Design Goals¶
Our goal with this MEP is to provide a simplified syntax to make use of markdown links, and tap into rich cross-referencing capabilities. In this MEP we aim to balance:
- Reuse Existing Standards
- where possible syntax should reuse existing practices and standards, for example, CommonMark compliance
- Graceful Degradation
- syntax should aim to render with reduced functionality in places that don’t support MyST
- Memorability
- having a syntax that is easy to remember
- Readability
- having a syntax which people can understand at a glance
- Terseness
- limiting “boilerplate” syntax
- Extensibility
- having syntaxes that will not limit us from adding features in the future
Specifically for this MEP, our proposal aims to:
- Provide a concise markdown link syntax to hook into Sphinx cross-referencing infrastructure
- Ensure that this syntax can be used by multiple projects (e.g. myst-parser (sphinx/python), mystjs, and future ways to cross-reference content (e.g. intersphinx, JATS or other structured formats))
- Ensure that the markdown rendering gracefully degrades on other rendering platforms (e.g. GitHub, Jupyter)
- Continue to support existing roles, and have as few deprecations in syntax as possible
- Support styled links and cross-references (e.g. with bold or italic in the reference)
- Ideally provide an extension point for new types of cross-references (e.g. academic DOIs or other structured content like wikipedia)
- Follow web-standards / conventions for URLs where possible (e.g. query strings, protocols)
- Design new additions to the
myst-spec
AST that can provide rich information to renderers
These link improvements are completed in the context of supporting (1) academic citations; and (2) intersphinx cross-references. However, this MEP does not specifically support the intricacies of intersphinx, bibliographies or referencing. We encourage future MEPs to address these concerns.
Background¶
The MEP aims to build on the existing CommonMark link format, which come in three forms (see spec). In the current MEP we are not proposing any changes to CommonMark - and are designing a cross-referencing syntax that can work with existing links. For context, the three CommonMark link types are:
Inline links with optional text or titles:
[Explicit *Markdown* text](destination "optional explicit title") or, if the destination contains spaces, [text](<a destination>)
Reference links, which define the destination separately in the document and can be used multiple times:
[Explicit *Markdown* text][label] [label]: destination "optional explicit title"
Autolinks are URIs surrounded by
<
and>
:<scheme:path?query#fragment>
In most cases, the scheme[2] (e.g. https:
, mailto:
, or ftp:
) is optional and assumed to be a web URL (http:
). For autolinks, however, the scheme is required; this is designed to disambiguate inline HTML elements (i.e. <b>
is not a link, but <https://executablebooks.org/>
is).
Current MyST Markup¶
Currently MyST supports external URLs (e.g. http:
, https:
, ftp:
, mailto:
) and uses Sphinx or Docutils for cross-references.
The current supported syntax is listed for each component below:
External Links
<https://example.com>
- CommonMark auto link syntax can be different configurable schemes/protocols (e.g.http
,https
,ftp
, ormailto
).[Some *text*](https://example.com "title")
- A styled link to an external URL, including a title shown on hover.
Figures, Sections
{ref}`my-fig-ref`
- standard reference, will be filled in with the figure/section title[](my-fig-ref)
- using existing MyST link syntax (note this does not have a#
)[My cool fig](my-fig-ref)
- labeled reference{numref}`my-fig-ref`
- numbered reference (e.g. “Fig. 1”){numref}`Custom Figure %s text <my-fig-ref>`
- custom numbering for the figure, using%s
or{number}
Equations
{eq}`my-eqn`
- uses a custom role, only for equations, no ability to override title{math:numref}`my-eqn`
- a verbose form of theeq
role (see note about “Domains” below).[](my-math-ref)
- using existing MyST link syntax (note this does not have a#
)
Documents
{doc}`my-eqn`
- a doc role.[](../file-types/myst-notebooks.md)
- a relative path with POSIX path separators/
, if no title provided MyST will fill in with the referenced title (or file name).[A different page](../file-types/myst-notebooks.md)
- using a custom title.
Intersphinx
Multiple other sphinx documentation sites can be referenced in MyST syntax (Sphinx documentation).
For example, the python
documentation can be referenced from a configuration (e.g. the intersphinx_mapping in conf.py
), which points to the appropriate intersphinx inventory (e.g. https://docs.python.org/3
) containing a *.inv
file.
{external+python:py:class}`zipfile.ZipFile`
- a reference to the Python classZipFile
{py:class}`zipfile.ZipFile`
- a short-hand reference that will search locally first, then any referenced inventories
Styling
All link syntax supports styling inside of the reference, (e.g. [A **bolded _reference_** to a page](./myst.md)
) the reference role syntax currently does not support styling of the inner content.
Proposal¶
We propose a cross-reference syntax that uses CommonMark links in all three forms. The goal is to support all use cases of cross-referencing with the most common use cases of referencing a document, file, section or element being simple, terse and familiar.
Overview:
Existing Syntax | New Syntax |
---|---|
[](my-id) [1] | [](#my-id) |
{ref}`my-id` | [](#my-id) |
{eq}`my-equation` | [](#my-equation) |
{ref}`Custom Text <my-id>` | [Custom Text](#my-id) |
{numref}`See "{name}" <my-id>` | [See "{name}"](#my-id) |
{numref}`Custom Number %s <my-id>` | [Custom Number {number}](#my-id) |
{numref}`Custom Number {number} <my-id>` | [Custom Number {number}](#my-id) |
{doc}`my-doc` | [](my-doc.md) |
{doc}`my-doc` | [](../examples/my-doc.md) |
{download}`my-doc.zip` | [](my-doc.zip) |
All of the above link examples can be easily complemented by both adding Title Link syntax and Reference Link syntax (i.e. [Explicit *Markdown* text][label]
).
We have omitted the auto-link syntax from the overview for brevity, they are shown in detail below.
In all cases, the existing role syntax should continue to work and receive ongoing support from the parser(s).
Syntax¶
The parts of the link are [text](link "title")
with an optional scheme ([text](scheme:link "title")
).
The "title"
is not modified in our proposed syntax. Auto Link syntax, <scheme:link>
, requires the scheme to be present, we follow the CommonMark definition of a scheme.
text
¶
If the text
is not included, it will be filled in by the default of the target.
- If the reference target is enumerated this will by default be “Sec 2.3”; “Fig. 1” for enumerated figures; “(5)” for equations; “Thm. 2”, “Lemma 3”, etc. depending on the target type. This can be customized in project configuration or frontmatter, however, the details of that are out of scope for this MEP.
- If the reference target is not enumerated, the title will be used. This may be the section header text (including syntax/style) or a figure caption.
- If the node is not enumerated and does not have a title, the reference label will be used.
If the text
is included it will be used as is with two additional template values ({number}
and {name}
)
Enumeration (
{number}
)- Any reference target that is enumerated can reference that number or string with a
{number}
; the{number}
is only the enumerator and does not include any preceding text.- For example, if referencing
#my-section
with default text ofSection 2.1.2
, you can change the reference toSee 2.1.2
with[See {number}](#my-section)
.
- For example, if referencing
- If a
{number}
is used and the node is not enumerated, the{number}
will be replaced by “??” and a warning raised.
- Any reference target that is enumerated can reference that number or string with a
Reference text (
{name}
)- Any node can include the text of a reference including any styles (e.g. Sections are the section title; Figures and Tables are the caption).
- If
{name}
is used and the node does not have an explicit name or title, the node reference label will be used.
In both cases, the template can be escaped with a preceding backslash, that is \{number}
or \{name}
, and the text will not be replaced.
link
¶
The links are defined by a scheme, which can be standard protocols (http:
, mailto:
).
Here we propose two new schemes, project
and path
:
the project
scheme allows for cross-referencing pages, sections, equations, figures, or other components of a MyST project;
the path
scheme allows for referencing files outside of the project or explicitly downloading the source of a document in the project.
The schemes are an extensibility point specifically described by CommonMark
and are used to indicate that the link should be resolved by MyST specific logic. They follow standard URI syntax:
URI = scheme ":" pathname ["?" query] ["#" fragment]
In most cases, as seen in the summary above the scheme is optional and can be inferred safely by the context. The exception is when explicitly referring to an external MyST site, Jupyter Book or Sphinx documentation site. These URIs can be safely and easily parsed by any common URL parser. For example in Javascript:
const url = new URL('project:target.md#my-ref');
url.protocol; // "project:"
url.pathname; // "target.md"
url.hash; // "#my-ref"
The following links and references are supported:
Link Type | Auto Link | Inline |
---|---|---|
External URL | <https://example.com> | [](https://example.com) |
Local file download | <path:file.txt> | [](file.txt) |
File download (explicit) | <path:file.md> | [](path:file.md) |
Project document | <project:file.md> | [](file.md) |
Target in a document | <project:target.md#file> | [](file.md#target) |
Target in project | <project:#target> | [](#target) |
Search Order and Specificity¶
All references search the local document first[3], then the local project in the order of the table of contents. A xref_ambiguous
warning is raised if multiple matches are found.
In large documentation sites, a referenced target can be present in multiple documents, in that case, the parser will emit a xref_ambiguous
warning letting you know that there are multiple matches for the intended target.
If a link cannot be resolved, an external link should be rendered, for example, <a href="#target">#target</a>
.
Implicit Section Headers¶
We suggest a configuration option to create anchor “slugs” for section headers, which stay close to the GitHub implementation and produces references that:
- lower-case text
- remove punctuation
- replace spaces with
-
- enforce uniqueness via suffix enumeration
-1
For example, ## Links and Referencing
can be referenced as [](#links-and-referencing)
.
Every heading level in a document should have an anchor, however, these are implicit references, and referring to them can raise an xref_implicit
warning, which can optionally be suppressed by users.
Implicit references are not available project wide, and are only accessible in the current document, as many documents follow similar structures (Abstract, Introduction, Methods, Summary).
Adding two sections of the same name does not raise a duplicate identifier warnings (xref_duplicate
), section identifiers are only unique to the document.
Paths¶
- Links can be relative from the containing file.
- Links can be absolute paths from the project root.
- Absolute paths start with
/
. - The path separator is POSIX
/
. - The path must include the extension.
Downloads¶
Files that are outside of the table of contents of the project and are referenced directly are downloads.
- These follow the same path rules as above when referring to a unknown file-type (e.g.
./my-file.txt
). - Files with known file extensions (e.g.
*.md
,*.ipynb
) will default to being document links, not downloads, with the document title being used as the default text. - Downloads can be specified explicitly by a
<path:my-file.md>
, which will revert to a download regardless of the extension. (i.e. this overrides the default for[](my-file.md)
, which is<project:my-file.md>
).
Warnings and Errors¶
xref_missing
- There is a missing reference, that could not be found.
xref_implicit
- You are referencing an implicit reference which could change easily in the future, consider making this explicit.
xref_unsupported
- Raised if the the current environment does not support the reference look up. For example, single page builds.
xref_ambiguous
- Raised when multiple conflicting targets are matched.
xref_legacy
- Raised when a
[](ref)
is used in place of[](#ref)
. - For example, "Legacy syntax used for link target, please prepend a ‘#’ to your link url: “{link.url}” in “{document}”.
Specification AST¶
The links should follow the link AST for external links.
For internal project cross-references, these should be resolved to a crossReference
node (spec).
For external project links, we will extend the link object with additional data that includes the url source (urlSource
), the scheme name (e.g. project
or download
), whether the link is internal (e.g. false
), and additional optional metadata about the page that may be helpful to a renderer.
Extensibility¶
We hope that this syntax will be helpful in simplifying the cross-reference experience in MyST.
Additionally, we believe that the scheme/protocol extension point is a powerful way to add rich cross-referencing ability to other types of structured data sources.
We expect a future MEP to introduce additional logic to resolve intersphinx references, and other structured data.
For example, one could imagine a <wiki:Gravitational_Waves>
extension that cross-references pages in Wikipedia, or a <doi:10.5281/zenodo.6476040>
extension that adds additional information about DOIs.
For simple link replacements, this syntax could also be extended with simple configuration options, similar to the extlinks
feature in Sphinx (see documentation).
UX implications & migration¶
All of the syntax is CommonMark compliant and introduces new capabilities to resolve cross references. All existing roles are being maintained indefinitely to ensure compatibility with existing content as well as long-term compatibility with Sphinx. We suggest that documentation is updated to highlight the new, consistent markdown-link references with the old styles being moved to compatibility sections.
There is a single deprecation of the existing markdown link syntax that references a target and does not have a #
.
When parsers encounter a legacy linked reference, they should raise an xref_legacy
warning.
Questions or objections¶
- File Protocol
- We want to minimize the additional syntax, and it was suggested that we use the
file:
protocol. Thefile:
protocol is a security concern in markdown parsing, and was not chosen.
References¶
Additional projects, specs, configuration and syntax consulted:
- Sphinx cross-references
- Sphinx extlinks
- JATS
ref-type
- A prototype of this extended link type is currently defined in myst-transforms.
Other context and links: