parsing - How to parse template languages in Ragel? -
i've been working on parser simple template language. i'm using ragel.
the requirements modest. i'm trying find [[tags]] can embedded anywhere in input string.
i'm trying parse simple template language, can have tags such {{foo}} embedded within html. tried several approaches parse had resort using ragel scanner , use inefficient approach of matching single character "catch all". feel wrong way go this. i'm abusing longest-match bias of scanner implement default rule ( can 1 char long, should last resort ).
%%{ machine parser; action start { tokstart = p; } action on_tag { results << [:tag, data[tokstart..p]] } action on_static { results << [:static, data[p..p]] } tag = ('[[' lower+ ']]') >start @on_tag; main := |* tag; => on_static; *|; }%%
( actions written in ruby, should easy understand ).
how go writing parser such simple language? ragel maybe not right tool? seems have fight ragel tooth , nails if syntax unpredictable such this.
ragel works fine. need careful you're matching. question uses both [[tag]]
, {{tag}}
, example uses [[tag]]
, figure that's you're trying treat special.
what want eat text until hit open-bracket. if bracket followed bracket, it's time start eating lowercase characters till hit close-bracket. since text in tag cannot include bracket, know non-error character can follow close-bracket close-bracket. @ point, you're started.
well, that's verbatim description of machine:
tag = '[[' lower+ ']]'; main := ( (any - '[')* # eat text ('[' ^'[' | tag) # try eat tag )*;
the tricky part is, call actions? don't claim have best answer that, here's came with:
static char *text_start; %%{ machine parser; action markstart { text_start = fpc; } action printtextnode { int text_len = fpc - text_start; if (text_len > 0) { printf("text(%.*s)\n", text_len, text_start); } } action printtagnode { int text_len = fpc - text_start - 1; /* drop closing bracket */ printf("tag(%.*s)\n", text_len, text_start); } tag = '[[' (lower+ >markstart) ']]' @printtagnode; main := ( (any - '[')* >markstart %printtextnode ('[' ^'[' %printtextnode | tag) >markstart )* @eof(printtextnode); }%%
there few non-obvious things:
- the
eof
action needed because%printtextnode
ever invoked on leaving machine. if input ends normal text, there no input make leave state. because called when input ends tag, , there no final, unprinted text node,printtextnode
tests has text print. - the
%printtextnode
action nestled in after^'['
needed because, though marked start when hit[
, after hit non-[
, we'll start trying parse again , remark start point. need flush 2 characters before happens, hence action invocation.
the full parser follows. did in c because that's know, should able turn whatever language need pretty readily:
/* ragel so_tag.rl && gcc so_tag.c -o so_tag */ #include <stdio.h> #include <string.h> static char *text_start; %%{ machine parser; action markstart { text_start = fpc; } action printtextnode { int text_len = fpc - text_start; if (text_len > 0) { printf("text(%.*s)\n", text_len, text_start); } } action printtagnode { int text_len = fpc - text_start - 1; /* drop closing bracket */ printf("tag(%.*s)\n", text_len, text_start); } tag = '[[' (lower+ >markstart) ']]' @printtagnode; main := ( (any - '[')* >markstart %printtextnode ('[' ^'[' %printtextnode | tag) >markstart )* @eof(printtextnode); }%% %% write data; int main(void) { char buffer[4096]; int cs; char *p = null; char *pe = null; char *eof = null; %% write init; { size_t nread = fread(buffer, 1, sizeof(buffer), stdin); p = buffer; pe = p + nread; if (nread < sizeof(buffer) && feof(stdin)) eof = pe; %% write exec; if (eof || cs == %%{ write error; }%%) break; } while (1); return 0; }
here's test input:
[[header]] <html> <head><title>title</title></head> <body> <h1>[[headertext]]</h1> <p>i feeling [[emotion]].</p> <p>i brackets: [ cool. ] cool. [] cool. [[tag]] special.</p> </body> </html> [[footer]]
and here's output parser:
tag(header) text( <html> <head><title>title</title></head> <body> <h1>) tag(headertext) text(</h1> <p>i feeling ) tag(emotion) text(.</p> <p>i brackets: ) text([ ) text(is cool. ] cool. ) text([]) text( cool. ) tag(tag) text( special.</p> </body> </html> ) tag(footer) text( )
the final text node contains newline @ end of file.
Comments
Post a Comment