Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Per-Output AST Representation for Code Cells

Authors
Affiliations
2i2c
Curvenote Inc.

Summary

We propose extending the MyST AST with a new node type to represent individual code-cell outputs as separate AST nodes. In the existing schema, the Output node type represents a collection of outputs, and has a one-to-one correspondence with each code-cell. This structure does not naturally admit distinguishing between AST trees for each output, precluding the ability to consider code-cell outputs during subsequent transformations of the document. With this proposal, a new Outputs node replaces the Output node as a direct child of code-cells with a unist Parent container. This new node may contain several Output child nodes, each possessing an IOutput bundle returned by code-execution. The new AST structure enables each Output to carry its own AST subtree. Subsequent enhancements may build upon this proposal to process code-cell outputs and build derived ASTs, e.g. parsing Markdown or LaTeX outputs into MyST AST that may reference and be referenced by other content.

Context

Current Limitations

Code-cells from Jupyter Notebooks can produce rich output like Markdown, LaTeX, and tables. Right now, each output is effectively treated as a Black Box, which is only interpreted at export time (to PDF, web, etc.) such that static exports and web build are required to interpret the MIME bundle outputs. As a result, the output content does not participate in a MyST build i.e. the Markdown or LaTeX output cannot generate or consume referencing labels.

This limitation prevents programmatic generation of content that integrates with the rest of the document. For example, if a code-cell generates Markdown output containing a figure with a label, that label cannot be referenced from other parts of the document, and the figure cannot appear in cross-reference resolution.

Use Cases

This enhancement lays the foundation to enable several important use cases in future MEPs:

  1. Programmatic Content Generation: Users can generate MyST Markdown from code-cells and have the MyST Document Engine parse the results.

  2. Integrated Outputs: Generated Markdown can define and consume reference targets that are visible to the rest of the project, enabling richer integration between computational content and documentation.

  3. MyST-aware Kernels: Libraries running in Jupyter Kernels may output AST via MIME bundles with a MyST-aware MIME type, e.g. application/vnd.mystmd.ast+json;version=1. This would enable richer integrations than simple Markdown code generation.

This proposal addresses the requirements described in #1026, which tracks the need to associate AST subtrees with individual cell outputs.

Proposal

AST Structure Changes

The MyST AST will be extended to support per-output AST representation through the following changes:

Previous Structure (Version 2):

type OutputV2 = {
  type: "output";
  data?: any[]; // Array of IOutput bundles
  visibility?: any;
};

In this structure, a single Output node contained an array of output data bundles in the data field.

New Structure (Version 3):

// Outputs contains one or more Output nodes (below)
type Outputs = {
  type: "outputs";
  children: (Output | FlowContent | ListContent | PhrasingContent)[]; // Support placeholders in addition to outputs
  visibility?: Visibility; // `show`, `hide`, or `remove`
  scroll?: boolean;
};

type Output = {
  type: "output";
  children: (FlowContent | ListContent | PhrasingContent)[];
  jupyter_data: IOutput; // Single IOutput bundle from Jupyter Notebooks
};

The key changes are:

  1. Output nodes now represent a single output with its own AST subtree in children and a single jupyter_data field (instead of an array).

  2. Outputs node is introduced as a container with Output children, where there is a 1:1 correspondence between Output children and IOutput bundles.

  3. Each Output child can have its own AST subtree, enabling per-output parsing and reference resolution.

Migration Strategy

The version migration is handled through upgrade and downgrade functions that transform between V2 and V3 representations. The migration preserves:

Existing identifiers for the Output / Outputs nodes are not modified, as these are considered “content”. The migration path includes:

Upgrade (V2 → V3):

Downgrade (V3 → V2):

Example Transformation

Before (V2):

{
  "type": "output",
  "data": [
    { "output_type": "stream", "text": "Hello" },
    { "output_type": "display_data", "data": { "text/markdown": "**Bold**" } }
  ],
  "children": [{ "type": "text", "value": "Shared content" }]
}

After (V3):

{
  "type": "outputs",
  "children": [
    {
      "type": "output",
      "jupyter_data": { "output_type": "stream", "text": "Hello" },
      "children": []
    },
    {
      "type": "output",
      "jupyter_data": {
        "output_type": "display_data",
        "data": { "text/markdown": "**Bold**" }
      },
      "children": []
    }
  ]
}

The text/markdown output may be parsed into native children nodes in native MyST AST, however, that is out of scope for this proposal.

Implementation Details

Backward Compatibility

Existing content using V2 format can be automatically upgraded during processing (e.g. #2551). The downgrade path ensures that tools expecting V2 format can still work with V3 content when needed.

UX Implications & Migration

Migration Impact

Theme Considerations

As a direct consequence of this MEP, themes and renderers will need to be updated to handle the new Outputs container node and iterate over Output children. However, there are several ways that AST renderers such as web themes and templates can pull in incompatible ASTs:

Content-Renderer Separation

The MyST Document Engine enforces a strong separation between the MyST AST production engine (the MyST Document Engine itself) and the MyST AST rendering engine (e.g. MyST Theme for web applications). This separation is typically invisible, but it is entirely possible (and indeed desired) to perform the project build and project rendering steps at different times. For example, one may produce AST and push it to a CDN, where a MyST Theme application can render it.

Due to the separation of the content rendering and the content generation, it is important to ensure that deployed AST rendering applications are prevented from attempting to render projects with unsupported AST versions. Whilst there is tooling to perform bidirectional AST migrations, these are not widely deployed in existing web themes. As such, users deploying MyST rendering engines must take care to anticipate AST mismatches, and perform AST migration steps themselves.

Although typical usage of the MyST Document Engine as a static site generator hides this separation, the existing lack of constraint between the MyST Theme and MyST Document Engine versions can result in users encountering incompatible versions at build time. We anticipate a future MEP that attempts to address this UX problem.

Dynamic loading of AST

Outside of deploying AST renderers in a CDN context, it is also possible for MyST Themes to pull in incompatible AST via the external cross-reference (xref) mechanism. This is performed at build time (where it is automatically migrated) and at read-time (where it is presently not). This may affect users who build sites with myst build --site and deploy them to static web servers like GitHub Pages. Sites that use external xrefs support dynamic fetching of AST, which happens at reading time, not at page build time. This means that xrefs may pull in incompatible (future or past) versions of the AST that may fail to render properly. Future work may be done to address AST upgrading and/or AST downgrading in these contexts.

References

This MEP addresses requirements from: