The enhype binary expects precisely one argument - a file describing the language it's supposed to be processing. It looks up a few things in this "enhype keyword" file, then copies standard input to standard output, making a few edits along the way.
The enhype keyword file consists of a number of names "sections". The first one enhype looks for is the "[General]" section, which tells it which HTML tages you'd like to use. THe defaults are shown below:
[General]
keyword tag = strong
comment tag = em
code tag = pre
After reading the [General] section, enhype then looks for two language-defining sections, one giving the approximate lexical syntax of the language, the other listing the keywords. At heart, enhype thinks all programming languages have the following characteristics:
(These assumptions seem so reasonable that it's perhaps a little surprising that I've yet to encounter any programming language that doesn't break at least one of them.)
By way of example, here are the sections describing C++. First, the characters section, which defines any interesting characters (and character sequences) in the language:
[characters]
comment = /* */
comment = //
comment = #
letters = _
quotes = '"
escape = \
What's this telling us?
Firstly, there are three different comment styles, one with both start- and end- markers, and two of the "until end of line" style. (Note that # doesn't really mark a comment in C++, it's just that I prefer to have preprocessor lines formatted that way.)
By default enhype expects that identifiers contain letters and/or digits. If, as is often the case, your language allows other things, you can define them with a letters= line. Here I just add the underscore.
The next two lines are related to strings. We need to say which characters start and end strings, and which (if any) hide these characters once you're in a string.
OK - on with the keywords list:
[keywords]
if else
do while for
break continue
switch case default
int float unsigned signed double char void long short
const volatile
typedef struct union enum
static register auto extern
return
sizeof
goto
friend inline this virtual
class private protected public
template operator
new delete
try catch finally
That was fairly painless.
The next example is to do with Rexx, which has the property of being case independent. This gives enhype the chance to look clever - I can ask it to display all keywords in upper case, and all variables in lower case. Oh, and, believe it or not, Rexx's comments nest:
[characters]
letter = _
quotes = '"
comments = /* */ nested
keyword case = upper
variable case = lower
[keywords]
if then else
select when otherwise
do to by for while until forever end leave iterate
call return
(etc ad tedium)