stf
stf [-h headerfile] [-f footerfile] [-i] [-t] [-u] [-l] [-s] filename
-h headerfile insert header file before stf output -f footerfile insert footer file after stf output -i include table of contents -t include timestamp after stf output before footer file -u convert doc- & subtitles to uppercase -l convert doc- & subtitles to lowercase -s only parse the TOC for a file (beta) -v display version information and exit
Simple Text Format (STF) is a simple set of rules for parsing plain text files to create html output. STF was initially setup to be able to parse readable text files to html, mainly for manuals and tutorials.
Note that each line is restricted to 200 character max. All characters after the 199th are discarded.
STF 'knows' 6 types of content for now. These content types are:
For all these types of content STF has some rules which should be followed: (Note that STF looks at the first character of a line to determine the content type of the current line. STF considers a character other than [tab][newline][-][+] as a 'normal character'.)
Document title - starts with the first line of the document - ends when a blank line is encountered Sub title - starts when the first character of a line is a normal character and is preceeded by more than one blank line - ends when a blank line is encountered Text paragraph - starts when the first character of a line is a normal character and is preceeded by exactly one blank line - ends when a blank line is encountered Code paragraph - starts when the first character of a line is a [tab] character - ends when a blank line is encountered List items - starts when the first character of a line is a [-] character and is preceeded by one or more blank lines - ends when a blank line is encountered Include file - starts when the first character of a line is a [+] character and is preceeded by one or more blank lines - ends immediately
When a content type is started, while one of the above rules apply for a line, STF will continue with that content type untill a blank line is encountered. This means that the first character of a line looses its STF meaning inside a content block.
The [+] character is a special type of content. All characters directly after the [+] character are seen by STF as a filename. STF opens the file and includes the complete content in the output. All the [<] and [>] characters are replaced with their html entities, respectively < and >
Also all [<] and [>] characters in lines which belong to a code paragraph are replaced with their html entities.
All content types have default html tags bind to them:
CONTENT TYPE DEFAULT HTML TAG document title <h1></h1> sub title <h2></h2> text paragraph <p></p> code paragraph <pre></pre> list items <ul><li></ul> include file <i><pre></pre></i>
However all html tags can be customized. Create a STF configuration file with the name 'stf.conf' in the /etc/ directory. The format of the configuration file looks like this:
IDENTIFIER=HTMLTAG
Identifiers are:
sd=Start document title ed=End document title ss=Start sub title es=End sub title st=Start text paragraph et=End text paragraph sc=Start code paragraph ec=End code paragraph sl=Start list el=End list sb=List item si=Start include file ei=End include file sa=Start anchor TOC ea=End anchor TOC so=Start TOC eo=End TOC sv=Version Information
Note that maximal number of characters for a html tag is restricted to 256.
Here is an example file:
sv=
Download stf-1.0.tar here (source and manual included) or view the source here.
/***************************************************************************************************************** SIMPLE TEXT FORMAT Version: 1.01 Date: 28 July 2000 Author: Fred Wijnsma Email address: wijnsma@yahoo.com Homepage: www.hacom.nl/~wijnsma/stf/ STILL TODO AND BUGS TO BE FIXED - when a content type has started in some situations it will be changed to another content type without a blank line, this should be fixed ! - create a -d argument which also outputs (probably to a file as secondary output) debugging information for development purposes - re-code the argument parsing with the getopt function and follow the GNU argument convention - create a filename convention which makes it possible to create "index" pages based on a part of the filename document title and the sub titles - create a Makefile which will installs stf and the manual page - create a package for download with: - stf.c -> source file - stf -> compiled version - stf.txt -> example plain text file - stf.html -> example output file - stf.conf -> example configuration file - stf.man -> manual page - README -> readme file - INSTALL -> installation instructions - do_stf.sh -> example shell script *****************************************************************************************************************/ #include <stdio.h> #include <string.h> #include <time.h> #include <ctype.h> /***************************************************************************************************************** VERSION INFORMATION (only used internal) *****************************************************************************************************************/ char version[256] = "*!STF version 1.0.2 (c) 2000 QaD!*"; /* Version information, not used */ /***************************************************************************************************************** DEFAULT VARIABLES CHANGE WHEN NEEDED *****************************************************************************************************************/ char sd[256] = "<h1>"; /* Document title start */ char ed[256] = "</h1>"; /* Document title end */ char ss[256] = "<h2>"; /* Sub title start */ char es[256] = "</h2>"; /* Sub title end */ char st[256] = "<p>"; /* Text paragraph start */ char et[256] = "</p>"; /* Text paragraph end */ char sc[256] = "<pre>"; /* Code paragraph start */ char ec[256] = "</pre>"; /* Code paragrap end */ char sl[256] = "<ul>"; /* List start */ char el[256] = "</ul>"; /* List end */ char sb[256] = "<li>"; /* List item */ char si[256] = "<pre><i>"; /* Include start */ char ei[256] = "</i></pre>"; /* Include end */ char sa[256] = ""; /* Start anchor (TOC) */ char ea[256] = ""; /* End anchor (TOC) */ char so[256] = "<hr>"; /* Start TOC */ char eo[256] = "<hr>"; /* End TOC */ /***************************************************************************************************************** NO CHANGES SHOULD BE MADE BELOW THIS LINE *****************************************************************************************************************/ FILE *inputfile; /* File to process */ FILE *incfile; /* Include file */ FILE *insertfile; /* Include file */ FILE *tocfile; /* Table of contents file */ int firstline = 0; /* If 1 first line is parsed */ int linetype = 0; /* Type of line */ int prev_linetype = 0; /* Previous type of line */ int counter = 0; /* Number of blank lines */ int incheader = 0; /* Include header when 1 */ int incfooter = 0; /* Include footer when 1 */ int inctoc = 0; /* Include TOC when 1 */ int inctime = 0; /* Display timestamp */ int sitemap = 0; /* Parse sitemap information */ int charcase = 0; /* Char case, 1 upper, 2 lower */ char filename[256] = ""; /* Filename to parse */ char configname[256] = "/etc/stf.conf"; /* Configuration filename to parse */ char lines[1024] = ""; /* Line to parse */ char toclines[1024] = ""; /* Line to parse for TOC */ char codeline[1024] = ""; /* Line parsed and html replaced */ char headername[256] = ""; /* Header filename to process */ char footername[256] = ""; /* Footer filename to process */ char thetime[1024] = ""; /* Time of formatting */ char doctitle[1024] = ""; /* Document title for sitemap */ /***************************************************************************************************************** PARSING THE ARGUMENTS Note that this function should actually be rewritten to use of the "getopt" (3) function Also the convention as specified on the GNU site should be followed (e.g. long and short argument notation) *****************************************************************************************************************/ int parse_args(int argcount, char *arglist[]) { int i = 0; if (argcount < 2) { /* No arguments exit */ printf("Usage: %s [-v] [-h headerfile] [-f footerfile] [-i] [-t] [-u] [-l] [-s] filename\n", arglist[0]); exit(1); } for (i = 1; i <= argcount - 1; i++) { if ( strcmp(arglist[i], "-v") == 0) { /* Display version information */ printf("%s", version); exit(2); } else if (i == argcount -1) { strcpy(filename, arglist[i]); /* Filename */ } else if ( strcmp(arglist[i], "-h") == 0) { /* Header filename */ i++; strcpy(headername, arglist[i]); incheader = 1; } else if ( strcmp(arglist[i], "-f") == 0) { /* Footer filename */ i++; strcpy(footername, arglist[i]); incfooter = 1; } else if ( strcmp(arglist[i], "-i") == 0) { /* Include TOC */ inctoc = 1; } else if ( strcmp(arglist[i], "-t") == 0) { /* Include timestamp */ inctime = 1; } else if ( strcmp(arglist[i], "-s") == 0) { /* Only create sitemap information */ sitemap = 1; } else if ( strcmp(arglist[i], "-u") == 0) { /* Convert doc/sub titles to uppercase */ charcase = 1; } else if ( strcmp(arglist[i], "-l") == 0) { /* Convert doc/sub titles to lowercase */ charcase = 2; } else { printf("Usage: %s [-v] [-h headerfile] [-f footerfile] [-i] [-t] [-u] [-l] [-s] filename\n",arglist[0]); exit(1); } } } /***************************************************************************************************************** CREATE TABLE OF CONTENTS Table of Contents is parsed based on the sub_titles *****************************************************************************************************************/ void parse_toc() { int validtoc = 0; int toccounter = 0; tocfile = fopen(filename, "r"); if(tocfile == NULL) { printf("Cannot open file %s for parsing the TOC\n", filename); exit(1); } /* printf("%s\n",so); */ while ( (fgets(toclines, 1024, tocfile)) ) { if (toclines[0] == '\n') { toccounter++; } else { if (toccounter > 1) { if (validtoc == 0) { printf("%s\n",so); validtoc = 1; } toclines[strlen(toclines)-1] = '\0'; change_case(3,toclines); /* printf("<a href=\"#%s\">%s%s%s</a><br>\n",toclines,sa,toclines,ea); */ toccounter = 0; } else { toccounter = 0; } } } if (validtoc != 0) { printf("%s\n",eo); } toccounter = 0; } /***************************************************************************************************************** DETERMINE CONTENT TYPES Function which determines the type of line we are dealing with based on the first character of the line Types can be either one of the following: 0 = blank line 1 = text line 2 = code title 3 = document title line (started in the main function) 4 = sub title line 5 = list line 6 = include file line *****************************************************************************************************************/ int determine_type(char myline[1024]) { switch (myline[0]) { case 10: /* Blank line */ linetype = 0; break; case '\t': /* Code line */ linetype = 2; break; case '-': /* List line */ linetype = 5; break; case '+': /* Include file */ linetype = 6; break; default: if (counter > 1) { linetype = 4; /* Subtitle line */ } else { linetype = 1; /* Text line */ } } return(linetype); } /***************************************************************************************************************** PARSING THE CONFIGURATION FILE First check "/etc/", than see if there is a "-f" argument for a configuration file. If the file cannot be found STF will use the defaults specified in "DEFAULT VARIABLES" *****************************************************************************************************************/ int parse_conf() { FILE *configfile; /* Configuration file */ char configname[256] = "/etc/stf.conf"; /* Configuration filename to parse */ char confline[256] = ""; /* Line to parse for conf file */ configfile = fopen(configname, "r"); if(configfile == NULL) { /* When we cant't find the configuration file, we will use the defaults */ /* printf("Error parsing configuration file (%s)\n", configname); */ } else { while ( (fgets(confline, 256, configfile)) ) { switch (confline[0]) { case 's': /* Line starts with a "s", start tag */ switch (confline[1]) { case 'd': /* Start Document title (sd) */ strncpy(sd,&(confline[3]),256); sd[strlen(sd)-1] = '\0'; break; case 's': /* Start Sub title (ss) */ strncpy(ss,&(confline[3]),256); ss[strlen(ss)-1] = '\0'; break; case 't': /* Start Text (st) */ strncpy(st,&(confline[3]),256); st[strlen(st)-1] = '\0'; break; case 'c': /* Start Code (sc) */ strncpy(sc,&(confline[3]),256); sc[strlen(sc)-1] = '\0'; break; case 'l': /* Start List (sl) */ strncpy(sl,&(confline[3]),256); sl[strlen(sl)-1] = '\0'; break; case 'b': /* List Item (sb) */ strncpy(sb,&(confline[3]),256); sb[strlen(sb)-1] = '\0'; break; case 'i': /* Start Include File (si) */ strncpy(si,&(confline[3]),256); si[strlen(si)-1] = '\0'; break; case 'a': /* Start Anchor (sa) */ strncpy(sa,&(confline[3]),256); sa[strlen(sa)-1] = '\0'; break; case 'v': /* Version Information (version) */ strncpy(version,&(confline[3]),256); version[strlen(version)-1] = '\0'; break; } break; case 'e': /* Line starts with a "e", end tag */ switch (confline[1]) { case 'd': /* End Document title (ea) */ strncpy(ed,&(confline[3]),256); ed[strlen(ed)-1] = '\0'; break; case 's': /* End Sub title (es) */ strncpy(es,&(confline[3]),256); es[strlen(es)-1] = '\0'; break; case 't': /* End Text (et) */ strncpy(et,&(confline[3]),256); et[strlen(et)-1] = '\0'; break; case 'c': /* End Code (ec) */ strncpy(ec,&(confline[3]),256); ec[strlen(ec)-1] = '\0'; break; case 'l': /* End List (el) */ strncpy(el,&(confline[3]),256); el[strlen(el)-1] = '\0'; break; case 'i': /* End Include File (ei) */ strncpy(ei,&(confline[3]),256); ei[strlen(ei)-1] = '\0'; break; case 'a': /* End Anchor (es) */ strncpy(ea,&(confline[3]),256); ea[strlen(ea)-1] = '\0'; break; } break; } } fclose(configfile); return(0); } } /***************************************************************************************************************** CONTENT TYPE END BLOCK Print end of a content block based on the current content type Types can be either one of the following: 0 = blank line (cannot be ended %) 1 = text line 2 = code title 3 = document title line (started in the main function) 4 = sub title line 5 = list line 6 = include file line *****************************************************************************************************************/ int end_block(int mytype, char myline[1024]) { /* int i = 0; */ switch (mytype) { case 1: /* End txt block */ lines[strlen(lines)-1] = '\0'; printf("%s",lines); printf("%s\n",et); break; case 2: /* End code block */ lines[strlen(lines)-1] = '\0'; printf("%s", lines); printf("%s\n",ec); break; case 3: /* End document title */ lines[strlen(lines)-1] = '\0'; change_case(1,lines); printf("%s\n",ed); if (inctoc == 1) { parse_toc(); } break; case 4: /* End subtitle block */ lines[strlen(lines)-1] = '\0'; change_case(1,lines); printf("\n%s\n",es); break; case 5: /* End list items block */ lines[strlen(lines)-1] = '\0'; printf("%s",lines); printf("%s\n",el); break; case 6: /* End incude file */ break; } return(0); } /***************************************************************************************************************** CONTENT TYPE START BLOCK Print start of a content block based on the current content type Types can be either one of the following: 0 = blank line (cannot be started %) 1 = text line 2 = code title 3 = document title line (started in the main function) 4 = sub title line 5 = list line 6 = include file line *****************************************************************************************************************/ int start_block(int mytype, char myline[1024]) { int i = 0; char incfile[256]; switch (mytype) { case 1: /* Start txt block */ printf("%s\n",st); printf("%s",lines); break; case 2: /* Start code block */ printf("%s\n",sc); replace_html(lines); break; case 3: /* Start document title */ printf("%s\n",sd); change_case(1,lines); break; case 4: /* Start subtitle block */ lines[strlen(lines)-1] = '\0'; printf("%s\n",ss); change_case(2,lines); break; case 5: /* Start list items block */ lines[0] = ' '; printf("%s\n%s",sl,sb); printf("%s",lines); break; case 6: /* Start include file block */ strcpy(incfile,""); for (i = 1; i < strlen(lines); i++) { strncat(incfile, &lines[i], 1); } include_file(incfile); break; } return(0); } /***************************************************************************************************************** REPLACE HTML INTERPRETED CHARACTERS WITH THEIR CORRESPONDING ENTITIES For now only the < and > characters are replaced *****************************************************************************************************************/ int replace_html(char myline[1024]) { int i = 0; strcpy(codeline, ""); for (i = 0; i < strlen(lines); i++) { if (lines[i] == '>') { strcat(codeline, ">"); } else if (lines[i] == '<') { strcat(codeline, "<"); } else { strncat(codeline, &(lines[i]), 1); } } printf("%s", codeline); return 0; } /***************************************************************************************************************** CHANGE CASE OF CHARACTERS If "charcase" matches: 0 -> do not change case state 1 -> change to uppercase 2 -> change to lowercase If "state" matches: 1 -> format line appearance as document title 2 -> format line appearance as sub title 3 -> format line appearance as TOC item *****************************************************************************************************************/ int change_case(int state, char caseline[1024]) { int i; switch (charcase) { case 0: strcpy(lines,caseline); break; case 1: for(i=0;i<strlen(caseline);i++) caseline[i] = toupper(caseline[i]); break; case 2: for(i=0;i<strlen(caseline);i++) caseline[i] = tolower(caseline[i]); break; } switch (state) { case 1: printf("%s",caseline); break; case 2: printf("<a name=\"%s\">%s</a>",caseline,caseline); break; case 3: printf("<a href=\"#%s\">%s%s%s</a><br>\n",caseline,sa,caseline,ea); break; } return 0; } /***************************************************************************************************************** INCLUDE FILE Files included with the -h and -f arguments are included as is, no additional formatting is done *****************************************************************************************************************/ int include_file(char fname[256]) { printf("%s\n",si); fname[strlen(fname)-1] = '\0'; incfile = fopen(fname, "r"); if(incfile == NULL) { printf("Error parsing include file (%s)\n",fname); } else { while ( (fgets(lines, 1024, incfile)) ) { replace_html(lines); } } printf("%s\n",ei); } /***************************************************************************************************************** INCLUDE FILE Files included with the + content type are formatted so that every appearance of the < and > character gets replaced by the < and > html entities *****************************************************************************************************************/ void insert_file(char insertname[256]) { insertfile = fopen(insertname, "r"); if (insertfile == NULL) { printf("<i>Error including file (%s)\n", insertname); } else { while ( (fgets(lines, 1024, insertfile)) ) { printf("%s", lines); } } } int main(int argc, char *argv[]) { time_t timer; timer=time(NULL); strcpy(thetime, asctime(localtime(&timer))); thetime[strlen(thetime)-1] = '\0'; parse_args(argc,argv); parse_conf(); if (incheader == 1) insert_file(headername); inputfile = fopen(filename, "r"); if(inputfile == NULL) { printf("Cannot open file %s (main file)\n", filename); exit(1); } while ( (fgets(lines, 1024, inputfile)) ) { if (firstline == 0) { linetype = 3; firstline = 1; } else { linetype = determine_type(lines); } if (linetype == 0) { /* Current line is blank line */ counter++; if (prev_linetype == 0) { /* Previous line is blank line */ printf("%s",lines); } else { /* Previous line is _not_ blank line */ end_block(prev_linetype,lines); } } else { /* Current line is _not_ blank line */ if (prev_linetype == 0) { /* Previous line is blank line */ start_block(linetype,lines); counter = 0; } else { /* Previous line is _not_ blank line */ linetype = prev_linetype; /* Note we should keep the linetype */ if (linetype == 2) { /* regardless what the start char is */ replace_html(lines); counter = 0; } else if (linetype == 5) { lines[0] = ' '; printf("%s%s",sb,lines); counter = 0; } else { printf("%s",lines); counter = 0; } } } prev_linetype = linetype; /* Set PREV_TYPE to TYPE for next run */ } switch (linetype) { /* Close the content neatly when no */ case 1: /* blank is last line of the file to */ printf("%s\n",et); /* parse */ break; case 2: printf("%s\n",ec); break; case 3: printf("%s\n",ed); break; case 4: printf("%s",es); break; case 5: printf("%s\n",el); break; case 6: printf("%s\n",ei); break; } if (inctime == 1) printf("<p align=\"right\"><font size=\"1\">Formatted %s %s</font></p>\n", thetime,version); if (incfooter == 1) insert_file(footername); return(0); }
Free to use, modify or delete :)
STF Version 1.0 QaD (c) 2000
Fred Wijnsma [wijnsma@hacom.nl]
Formatted Sun Jul 30 01:22:36 2000