CliSchema Language Reference
The language reference for the CliSchema.
Caution
This reference is not yet finished and neither is the underlying implementation. Some notable aspects not yet designed are program presence resolution and effect definitions as part of the schema language.
Metadata
Since each schema describes a single program, the metadata information reveals exactly which version of the program the schema references and details behind the author of the schema.
program_name
The unique name of the program used as the command name when invoking the program on the command line.
schema_version
The version of the schema itself using the semver standard.
authors
The list of authors for the schema. Specified using full names and email addresses using RFC 5322. “Full Name email@address.com”
Invocation Fundamentals
CliSchema enables invocation patterns that are mainstream and commonly used across most programs. Modern CLI tools have adopted a common structure that CliSchema exposes. There are three essential components to an invocation: Commands, Arguments and Options. The following guiding example illustrates each individually:
git push -d origin fix/typo
│ │ │ │
│ │ │ └─── Argument_2
│ │ └────────── Argument_1
│ └───────────── Option
└────────────────── (Sub)Command
gitis the program name, but also refers to the root invocation.pushis a sub-command which groups functionality related to pushing under a single command.-dis a single option denoted with a single dash “-” that due to its presence sets the “delete” option as true.origin&fix/typotwo string arguments that specify the names of the remote destination as well as the branch.
The governing parsing pattern can be summarized as follows:
- Commands are usually single words appearing at the start of the argument list.
- Options are by default optional and can appear anywhere in the argument list and are identified with a starting single or double dash.
- Arguments are positionally dependent meaning that their order of appearance matters. Unless specified, arguments are optional by default as well.
Schema Settings
The schema settings can define a set of globally tunable rules which affect the parsing logic. For example, some programs rely on signaling with an isolated singled dash - that data either comes from stdin or goes to stdout. The following (incomplete) list of schema settings expands the parsing capabilities of the language, mostly to support conventions adopted from legacy tools.
| Setting | Description |
|---|---|
Arguments Defined Last (default: false) | Positional arguments can be interleaved between options when true positional arguments must be specified last and any interleaving turns into a validation error |
DashLess Available: (default: false) | Options must be specified with a single dash. When set to true options may omit the dash entirely, enabling style popularize by tar |
Enforce Single Dash Long Options: (default: false) | long options must be specified with a double dash. When true long options can be set using a single dash as well. |
Allow Clustering Options (default: false) | When true, boolean options can be clustered together where the last flag can also optionally take a value. For example tar -c -f backup.tar /home/user can be written as tar cf backup.tar /home/user |
Allow Adjacent Options (default: false) | When set to true the value of an options can be specified immediately after the option. For example, this would enable gcc -I./my/include |
Members
The following section dives into first class citizens of CliSchema and highlights how the organization of such members makes for intuitive argument parser logic construction.
Commands
Commands define the entry point any invocation and a logical grouping or options and arguments. They answer the question what functionality or sub-functionality is being invoked? By default, all invocations contain the root command which is just the invocation of the program. Options and arguments available to the root invocation are in “global” scope meaning they are valid options for all sub-commands unless specifically opted out. Each sub-command creates its own scope, for example git push invokes the “push” sub-command which has a separate family of options and arguments associated with it compared to the git status command.
Commands are specified using the cmd keyword
Schema Entry
Each schema is required to have exactly one command entry point named “Main” which references the program_name to use. Any effects referenced in this section refer to the canonical invocation of the program without any options or arguments. The following example shows the entry point for a program called foo.
cmd Main("foo") [] {}
Sub-Commands
The entry point or any other command definition can specify sub commands resulting in a tree like hierarchical organization of commands.
cmd Main("git") [
subcommands GitPush & GitStatus
] {}
cmd GitPush("push") {}
cmd GitStatus("status") {}
Arguments
Arguments in an invocation provide the program with data required to carry out the core functionality of that program. They tend not to be optional as they are usually the “subject” of the program. They are identified by their position in the arguments array. For example rm foo.txt bar.txt has two arguments associated with the invocation of the rm command.
cmd Main("rm") [
requires RemoveFiles
] {
/// Files to be removed
arg RemoveFiles(..) => List<Path>
}
The arguments are defined using the arg keyword and named uniquely as “RemoveFiles”. This name can be referenced by other members, like for example Main declaring that the argument is required. Furthermore, the definition for arguments takes a positional range specifier, the (..) indicates that all arguments passed will be identified as the RemoveFiles argument. The argument carries a data type List<Path>, since it is a range of strings. Both List and Path are types from the standard library of CliSchema.
Arguments can specify ranges which are non-overlapping and don’t leave any gaps. For example, the cp command requires at minimum one source file and exactly one destination directory. All arguments until the last are the source files as denoted with range specifier (..-1). And the destination is specified last using (-1).
cmd Main("cp") [
requires Sources & Destinations
] {
/// Source files or directories to copy
arg Sources(..-1) => List<Path>
/// Destination directory
arg Destinations(-1) => Path
}
Options
Options provide configuration for a particular invocation. For example, ls -h enables the human readable option for the ls command. Options are identified by short name or long name, for example ls also has a long name for the human readable option “–human-readable”. By default, single character names will be considered shortname identifiers and matched with a single dash while multi-character names will be match with a starting double dash. There are global settings governing the parsing behaviour of options.
cmd Main("ls") {
/// with -l and -s, print sizes in human readable form (e.g. 1K 234M 2G)
opt HumanReadable("h", "human-readable") => Bool
/// do not list implied entries matching shell PATTERN (overridden by -a or -A)
opt Hide("hide") => String
}
Options are tied to a specific type providing early validation for static invocations. For example, the Bool type associated with the HumanReadable option indicates that the option can only be in a binary fashion, no additional data can be associated with it. Thus, invocation of ls -h=always does not type check against Bool the bool type.
Environment Variables
Similar to how programs rely on arguments being passed in via the command line, they might also rely on environment variables being set. For this reason environment variables are first class citizens. For example, the aws s3 ls invocation has a dependency on AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.
cmd Main("aws") [
subcommands S3
] {
// ...
}
cmd S3("s3") [
subcommands List
] {
// ...
}
cmd List("ls") {
env AccessKeyID("AWS_ACCESS_KEY_ID") => String
env AccessKeySecret("AWS_SECRET_ACCESS_KEY") => String
}
Types
Types play a central role of representing data in CliSchema. They make it possible to constrain the space of possible inputs across all other entities. They are also fully reusable across members of the schema. Types can be referenced directly for Options, Arguments and Environment Variables.
Primitive Types
| Name | Data Representation |
|---|---|
Bool | Single bit boolean |
Int64 | 64 bit signed integer |
Float64 | 64 bit float |
String | Heap-allocated string buffer |
Path | Heap allocated string with path format validation |
Collections
| Name | Data Representation |
|---|---|
List<T> | Heap allocated resizeable vector of type T |
T | T | T ... | Arbitrary large enumeration of type T allocated on the heap |
Refinements
Types can have arbitrary refinement functions attached to them for compile time validation. The most basic kind of refinement can be applied to enums to specify a default value when no value is passed to the option.
type WhenSpecifier => "always" | "auto" | "never"
cmd Main("ls") {
/// color the output WHEN
opt Color("color") => WhenSpecifier [
@default("always")
]
}
As another example, refinements can be used to narrow the range of numeric values:
cmd Main("foo") {
opt PrivilegedPort("p", "port") => Int64 [
@min(1)
@max(1024)
]
}
The following list of refinements could be added to provide better out of the box support for type level validation.
Email(RFC 5322)Domain(RFC 1035)Url(RFC 3986)IPv4(RFC 791)IPv6(RFC 8200)MacAddress(IEEE 802)Port(constrained u16: 1-65535)Cidr(RFC 4632)Date(ISO 8601)DateTime(ISO 8601/RFC 3339)Duration(ISO 8601)Time(time without date)Timezone(IANA timezone)Uuid(RFC 4122)Base64(RFC 4648)Semver(semver.org)MimeType(RFC 2046)Currency(ISO 4217)CountryCode(ISO 3166)LanguageTag(BCP 47)Isbn(ISBN-10/13)Issn(ISSN)Jwt(RFC 7519)GitHash(SHA-1/SHA-256)Json(RFC 8259)Yaml(RFC 9512)RegexDockerTagPackageVersionTomlCsv
Dependent Types
Warning
This feature is actively being considered. While dependent types are a powerful concept, they risk needlessly complicating the interface surface.
Effects
Invocations of programs can either deposit side-effects onto the host or require certain effects to have been present before invocation of the program. Effects can be tagged to members and signal that when an invocation includes that member the effect requirement or deposition will be true. For example, the rm command requires that all files passed as arguments are present before the execution of the program (precondition) and upon successful execution, removes those files ensuring that they don’t exist (postcondition). In this case, every argument has a precondition of FileExists and a post condition of FileNotExists.
cmd Main("rm") [
requires RemoveFiles
] {
/// files to be removed
arg RemoveFiles(..) => List<Path> [
requires FileExists
ensures FileNotExists
]
}
An effect such as FileExists is tied to a specific type. In this case it can only be associated with List<Path> or Path. Furthermore, effects can have inverses of one another where FileExists is the negative to FileNotExists and vice versa. Effects play a key role in systems that consume CliSchema in order to reason statically about effects on the Host-OS.
| Effect Name | Associated Type |
|---|---|
FileExists | Path |
FileNotExists | Path |
FileReadable | Path |
FileNotReadable | Path |
FileWriteable | Path |
FileNotWriteable | Path |
FileExecutable | Path |
FileNotExecutable | Path |
DirExists | Path |
DirNotExists | Path |
DirReadable | Path |
DirNotReadable | Path |
DirWriteable | Path |
DirNotWriteable | Path |
EnvVarSet | String |
EnvVarUnSet | String |
Dependencies and Exclusions
Often times programs might have options that conflict or depend on one another. In the case of conflicts, some Unix/Posix conventions simply use the options that have been specified last. For example git log --one-line --format="%H" will display using the specified format and ignore single line option. If such silent ignores are not desired, explicit exclusions can be put in place to error on validation checking.
cmd Main("git") [
subcommands Log
]
cmd Log("log") {
opt SingleLine("one-line") => Bool [
excludes Format
]
opt Format("format") => String
}
Cancellations
Warning
This feature has not yet been fully designed. Effects should have the option of canceling out other effects.
Scopes
Members are visible based on the scope of the command that they are tied to.