"convert" command

BabelFSH has been recently refactored to utilize a real modular build with self-contained plugins. This is not yet documented.

Command Line arguments

The convert command has these command line arguments:

Argument
Kotlin type
Help Text
Notes

-i/--input

File

The input folder to read BabelFSH files from. Paths will be evaluated relative to this folder.

-o/--output

File

The output folder for generated JSON files

-e/--exclude

List<PathMatcher>

Exclude files matching this glob

Using this argument, the runtime of the tool can be reduced during testing. The format will be something like foo*.babel.fsh, using regular Unix globbing

--r4b/r5

BabelfshContext.ReleaseVersion enum

FHIR version, default: R4B

This impacts against which elements the resulting resources are evaluated, mostly impacting the metadata for CS/VS

--pretty-print/no-pretty-print

Bool

Pretty print JSON output (default: true)

--config

File?

Path to a configuration file

See here for options

When this command is run, the command line arguments are parsed, the BabelfshContext is created, the plugin registries are instantiated (see here), and the BabelfshApp is instantiated and run.

BabelfshApp

The BabelfshApp class does the main work of the application. Here, the input folder is queried for source files, they are parsed using the FSH parser, rule sets are resolved, and the terminology comments are parsed for high-level syntactic conformance. Finally, the referenced plugin is called, which parses the comment command line, and converts the terminology to FHIR.

Input files are those that have a file extension .babel.fsh or .babelfsh.fsh.

FSH Parser

The identified input files are parsed in BabelfshApp (line 55):

val parsedItems = inputFiles.fold(FshParseResult()) { parseResult, file ->
  parseFshFile(parseResult, file.absoluteFile)
}

Since the input FSH items (CodeSystem, ValueSet, RuleSet, Alias, Resource (for ConceptMap)) could be distributed across multiple files, the parsing process is run using fold. In Kotlin, this takes an input data structure FshParseResult, which is then passed to the lambda for each run as the acc parameter.

The FSH parsing is implemented using the ANTLR v4 parser generator. The parser/lexer grammar is copied from the SUSHI reference implementation for FSH with some changes.

  • The lexer was changed to handle comments and whitespace differently to the standard implementation. SUSHI disregards comments entirely, so that they never reach beyond the parser. BabelFSH assigns meaning to special comments and handles whitespace in these comments, so they are sent to a differerent channel. In the top of the lexer grammar, the special comments are lexed as the TERM_PLUGIN_MULTILINE_COMMENT token. This lexer rule requires a start-of-comment token /* , a recognition token ^babelfsh, the content itself, an (optional) recognition token, and an end-of-comment */.

  • The parser grammar is changed in the instance, valueSet, codeSystem, ruleSet, and paramRuleSet rules to add an optional terminologyPluginComment rule. Note that for instance, valueSet, codeSystem , this terminologyPluginComment is not really optional, but this is handled in the parser listener rather than in the grammar.

The content of the FSH file(s) are parsed into the FshParseResult data class, which is immutable. This data container is added to through the plus operator when a new FSH item is added.

The FSH grammar(-s) are compiled by ANTLR (through the Gradle build task) to a set of Java source files, including the Listener class. In the FshListener class, the parser rules that are needed for the FSH implementation are listened for through overridden enter functions (enterAlias etc.), and sometimes corresponding exit functions. The listener has mutable lists for each FSH item that are written to as the file is parsed. The lexer tokens and the position in the token stream can then be accessed in the respective ANTLR context. The listener functions also enforce semantic rules for the FSH files. For example, the id of FSH item is enforced to be compliant with the id datatype in FHIR, CodeSystems must not directly declare concepts, any Alias must not be quoted and must start with a $, etc.

Alias and RuleSet resolution

After the FSH items are parsed to FSH item data structure, any aliases and inserts of re-usable RuleSets are resolved.

First, for any Alias whose name (e.g. $SCT ) is featured in the rule value, the corresponding alias name is replaced by the alias value:

After Alias resolution, RuleSets are applied. RuleSets might be referenced in RuleSets and so on, so that the insertions in RuleSets are resolved first. To make it clear where an error actually occured in the FSH declaration, the context where each rule was originally declared is maintained in the data structure to return meaningful error messages.

To ensure correct and meaningful RuleSet resolution, the inserts are added to a Graph datastructure in FshParseResult.resolveInsertsAndParameters . The graph must be acyclic.

Walking this graph, the referenced RuleSet rules are inserted in place of the insert rule:

The insert code also resolves parametrized RuleSets, as in the example. Parameters are referenced in curly brackets, and replaced with the respective value. This is done using a string replacement in the rule value.

Importantly, every FSH item that is ultimately converted to a FHIR resource on disk needs to have a "terminology comment" (exactly one), i.e. a multi-line comment that starts with the recognition token, identifies the plugin ID, and then provides the needed command line arguments for the plugin. This comment could be present in the resource declaration itself (e.g. the CodeSystem `Bar`), or could be inserted into a RuleSet down the insertion chain. In combination with parametrized RuleSets, which also work with the terminology comments, this can greatly simplify converting multiple resources in the same BabelFSH source file.

Consider this example, used for providing the German Alpha-ID-SE resource (truncated after the 2019 version, but otherwise exactly as the one used in the SU-TermServ project):

With the use of those parametrized RuleSets, which in turn reference other RuleSets, each CodeSystem declaration can be kept to the absolute minimum of source code lines, allowing very quick conversion of new versions.

Resource Conversion

Now that the RuleSets and Alias inserts are resolved, each resulting FSH item is converted to a FHIR resource through the respective ResourceFactory class.

To be compatible with both R4(B) and R5, the resources are converted not to HAPI FHIR structures directly, but rather to version-independent proxy classes and later to JSON resources. The relationships between the classes are shown here:

UML Class Diagram for ResourceFactory and implementing classes

Terminology Plugin Parsing

Using a supplemental grammar, the command line in each plugin content is parsed onto a high level to ensure a minimum of syntactic compliance. This is independent of the arguments declared in each plugin, and merely enforces that the command line "looks like" a Unix command line (single and double dashes, equals, quoted strings etc.).

This is accomplished in the combined Parser/Lexer grammar BabelFSHCommandLine . Importantly, this is also where the plugin ID is captured, which needs to be a primitive lower-case string conforming to /[a-z-]+/ .

CodeSystemFactory

In createResourceWithComment, the FSH rules are first serialized into a JSON data structure and validated, without the terminology content (see here).

Next, the plugin is called to add the content to the metadata resource. First, it parses the command line through its declared command line arguments into a typesafe data structure. Then, the produceContent method is called, which then produces the entries (concepts for a CS).

Lastly in the factory, some semantics are validated (CS properties that are used must be declared or implicit, properties that are declared should be used, etc.).

Resource Serialization

The resource containers that are generated through the factories are lastly encoded to JSON and written to disk. This is accomplished using HAPI FHIR with two FhirContext instances, to benefit from the serialization and validation facilities this library provides. To ensure that the correct HAPI classes are used, each resource container that's encoded to FHIR has two methods, toR4B() and toR5() through the interface SerializableToFhir:

Most implementations of this interface are almost identical (with the packages of the used classes differing), but especially for CM, the implementations are very different:

Finally, back in BabelfshApp.runFromFsh, the resources are written to disk, with the filenames being auto-generated from the resource type and resource ID (enforced to be present).

Last updated