SUMMARY

   Wordgen is, mathematically speaking, a sentence generator for
	arbitrary annotated context-free grammars, with an additional
	transformation step which may, or may not (I haven't yet proven
	it) allow a subset of context-sensitive grammars to be expressed.

   It has been primarily designed for linguistic use, and the
	fullest use of its features is in syllables.yml, the datafile
	for Firen. sajemtan.yml and ffb.yml are similar but much less
	complex. english.yml takes a distinctly different approach, and
	makes use of different features than the others, so it is worth
	a look even though it is not complete. CFGs.yml is the main
	'playground' datafile, while recursive.yml contains a number of
	stress tests. cmap.yml is a simple fuzzer for a different
	project. numbers.yml is a very simple file meant to generate a
	variety of interesting numbers along with reasonably natural
	English renderings, but it hasn't been touched in years.

DATAFILE SYNTAX

   wordgen datafiles use YAML 1.2, rather than the more standard
	format for grammars, EBNF, because wordgen makes extensive use
	of annotations which would be annoying to express in EBNF.
	However, conversions between EBNF and the datafile format are
	planned for a future version.

   A datafile is structured as a number of optional special
	structures, and several "nodes", which are equivalent to
	nonterminals in a more traditional CFG paradigm. Each node is a
	sequence of alternatives, and each alternative is a collection of
	"channels", of which three are given special meaning:

   "val" is the structure channel, which contains the actual CFG.
	In addition to the templatization which applies to other
	channels, val-strings have a particular format, which is
	detailed later. If not present, it defaults to the empty string.

   "freq" is the weighting channel, which controls how wordgen
	selects from the alternatives. Its data can be either a
	floating-point number, or a templatized string which evaluates
	to a number. If not present, it defaults to 1.

   "path" is a reserved internal channel, used for representing the
	parse tree of a generated sentence. It is an error to mention it
	in a datafile.

   Additionally, the "ipa" channel has an abbreviated print flag
	"-p", as well as having a backwards-compatibility alternate
	replacement description syntax, along with "val". However, "ipa"
	is not treated specially in any other way by the program.

   All other channel names are available to the user. Their type is
	a templatized string with no particular restrictions.

VAL-STRING SYNTAX

   Val-strings use an interpolation syntax based on Python format
	strings, wherein a node name is enclosed in {}. To illustrate, a
	simple node is excerpted from CFGs.yml below.

```
binPalindrome:
  - val: ""
    freq: .1
  - val: "0"
    freq: .15
  - val: "1"
    freq: .15
  - val: "0{binPalindrome}0"
  - val: "1{binPalindrome}1"
```

   This node produces binary palindromes, that is, sequences of 0 and
	1 that read the same forwards or backwards. The first three
	alternatives do not recurse, and simply produce themselves.	The
	latter two, however, contain "{binPalindrome}", which is a node
	reference ("noderef"), and it is replaced by an expansion of the
	named node, in this case the reference is recursive. The special
	characters { and } can be escaped as either "\{ \}" or as "{{ }}".
	In the case of an odd number of { or }, they are scanned from the
	left, every pair being collapsed, and the last one is interpreted.
	If different behavior is needed, use the unambiguous \ form.

   Noderefs are not limited to this simple case, as the following
	example shows.

```
Dyck:
  - val: "{Dyck:.8 1 1.2}{Dyck:.8 1 1.2}"
  - val: "[{Dyck:1 .8 1.1}]"
  - val: ""
    freq: .1
```

   The noderef "{Dyck:.8 1 1.2}" contains an annotation called an
	"flist", short for "frequency list", which overrides the
	frequencies of the alternatives in the referenced node. These are
	simply a list of floating-point numbers, separated by spaces. If
	the flist contains fewer values than there are alternatives, the
	remaining alternatives simply keep their old frequencies. If there
	are excess elements, they are ignored.

   Another frequency control mechanism is the "ilist", which is used
	to select only certain alternatives from a node, and optionally
	override their normal frequencies. A simple example is
	"{Cons/Start!0:.5 3}", which refers to either the first or the
	fourth alternative of the node "Cons/Start", using .5 as the
	frequency of the first alternative, and the regular frequency of
	the fourth.

   The frequency control mechanisms are intended to reduce
	duplication of alternatives between related nodes differing only
	in frequency, or in nodes having different subsets of the full
	list.

   The full syntax of a noderef is (in datafile format):

```
NodeRef:
  - val: "\{{text}{args}{NRSuf}\}"
args:
  - val: ""
  - val: "|{text}{args}"
NRSuf:
  - val: ""
  - val: ":{flist}"
  - val: "!{ilist}"
flist:
  - val: "{float}"
  - val: "{float} {flist}"
ilist:
  - val: "{number}"
  - val: "{number}:{float}"
  - val: "{number} {ilist}"
  - val: "{number}:{float} {ilist}"
```

   Nodes "text", "number", and "float" not included for brevity.

TEMPLATE SYNTAX

   Almost every string in a datafile can include templatized
	expressions, or "argrefs", which are dependent on arguments
	passed to the node. These are introduced with < and terminated
	with >, and there are two main forms; short and function-style.
	The short form consists of < followed by an argument number or
	name followed by >, and it is simply replaced by the specified
	argument.

   Numeric arguments are user-defined, and passed by the caller.
	Named arguments are defined by wordgen implicitly, and the full
	list is presented below:

   List args:

   a	All declared numeric arguments (not varargs)
	...	All varargs.
	A	All numeric arguments (including varargs)

   Scalar args:

   d	The current expansion depth
	D	The maximum expansion depth (see -d option)

   e	The current expansion count
	E	The maximum expansion count (see -e option)

   c	The number of numeric arguments passed to the node
	C	The number of declared numeric arguments for this node
	   (this is a constant expression)

   p	The '|' character (may be used for escaping)
	lt	The '<' character
	gt	The '>' character
	b  The '\' character

   The other form of argref is a functional style. A function is a
	name followed by '(' followed by an arbitrary number of
	arguments separated by '|' followed by ')'. A function argument
	may be an argument name or number, which must be prefixed by #,
	however ... is not prefixed with #; a function expression; or a
	string, in the remaining case.

   Currently, functions cannot be user-defined, and only the builtin
	set is supported. This set is detailed below.

   +	Flatten the arguments and return their sum, interpreted as
	   floating-point numbers. The empty sum is 0.

   *	Flatten the arguments and return their product, interpreted
	   as floating-point numbers. The empty difference is 0.

   -	Interpret all arguments as floating-point numbers and
	   return a chained difference. The empty product is 1.

   /	Interpret all arguments as floating-point numbers and
	   return a chained difference. Note that this is a left fold,
		rather than the mathematically typical right fold for
		division. The empty division is 1.

   ^	Interpret all arguments as floating-point numbers and
	   return a chained exponentiation. The empty power is 1.

   len	Flattens the arguments, and then returns the number of
	   arguments passed.

   flatten	Produce a single list which consists of all of the
	   arguments passed to flatten, such that all arguments are
		interpreted as lists and then concatenated together. This
		function is used in the definitions of many other functions.

   Additionally, there is a 'pseudo-function', raw, which is not a
	function but rather a means of escaping a string. It can be used
	like `<raw(some|text\)>` to produce the literal text "some|text\".