parsing - How to parse template languages in Ragel? -

- February 15, 2010

i've been working on parser simple template language. i'm using ragel.

the requirements modest. i'm trying find [[tags]] can embedded anywhere in input string.

i'm trying parse simple template language, can have tags such {{foo}} embedded within html. tried several approaches parse had resort using ragel scanner , use inefficient approach of matching single character "catch all". feel wrong way go this. i'm abusing longest-match bias of scanner implement default rule ( can 1 char long, should last resort ).

%%{    machine parser;    action start      { tokstart = p; }             action on_tag     { results << [:tag, data[tokstart..p]] }               action on_static  { results << [:static, data[p..p]] }                tag  = ('[[' lower+ ']]') >start @on_tag;    main := |*     tag;          => on_static;   *|;  }%%

( actions written in ruby, should easy understand ).

how go writing parser such simple language? ragel maybe not right tool? seems have fight ragel tooth , nails if syntax unpredictable such this.

ragel works fine. need careful you're matching. question uses both [[tag]] , {{tag}}, example uses [[tag]], figure that's you're trying treat special.

what want eat text until hit open-bracket. if bracket followed bracket, it's time start eating lowercase characters till hit close-bracket. since text in tag cannot include bracket, know non-error character can follow close-bracket close-bracket. @ point, you're started.

well, that's verbatim description of machine:

tag = '[[' lower+ ']]';  main := (   (any - '[')*  # eat text   ('[' ^'[' | tag)  # try eat tag )*;

the tricky part is, call actions? don't claim have best answer that, here's came with:

static char *text_start;  %%{   machine parser;    action markstart { text_start = fpc; }   action printtextnode {     int text_len = fpc - text_start;     if (text_len > 0) {       printf("text(%.*s)\n", text_len, text_start);     }   }   action printtagnode {     int text_len = fpc - text_start - 1;  /* drop closing bracket */     printf("tag(%.*s)\n", text_len, text_start);   }    tag = '[[' (lower+ >markstart) ']]' @printtagnode;    main := (     (any - '[')* >markstart %printtextnode     ('[' ^'[' %printtextnode | tag) >markstart   )* @eof(printtextnode); }%%

there few non-obvious things:

the eof action needed because %printtextnode ever invoked on leaving machine. if input ends normal text, there no input make leave state. because called when input ends tag, , there no final, unprinted text node, printtextnode tests has text print.
the %printtextnode action nestled in after ^'[' needed because, though marked start when hit [, after hit non-[, we'll start trying parse again , remark start point. need flush 2 characters before happens, hence action invocation.

the full parser follows. did in c because that's know, should able turn whatever language need pretty readily:

/* ragel so_tag.rl && gcc so_tag.c -o so_tag */ #include <stdio.h> #include <string.h>  static char *text_start;  %%{   machine parser;    action markstart { text_start = fpc; }   action printtextnode {     int text_len = fpc - text_start;     if (text_len > 0) {       printf("text(%.*s)\n", text_len, text_start);     }   }   action printtagnode {     int text_len = fpc - text_start - 1;  /* drop closing bracket */     printf("tag(%.*s)\n", text_len, text_start);   }    tag = '[[' (lower+ >markstart) ']]' @printtagnode;    main := (     (any - '[')* >markstart %printtextnode     ('[' ^'[' %printtextnode | tag) >markstart   )* @eof(printtextnode); }%%  %% write data;  int main(void) {   char buffer[4096];   int cs;   char *p = null;   char *pe = null;   char *eof = null;    %% write init;    {     size_t nread = fread(buffer, 1, sizeof(buffer), stdin);     p = buffer;     pe = p + nread;     if (nread < sizeof(buffer) && feof(stdin)) eof = pe;      %% write exec;      if (eof || cs == %%{ write error; }%%) break;   } while (1);   return 0; }

here's test input:

[[header]] <html> <head><title>title</title></head> <body> <h1>[[headertext]]</h1> <p>i feeling [[emotion]].</p> <p>i brackets: [ cool. ] cool. [] cool. [[tag]] special.</p> </body> </html> [[footer]]

and here's output parser:

tag(header) text( <html> <head><title>title</title></head> <body> <h1>) tag(headertext) text(</h1> <p>i feeling ) tag(emotion) text(.</p> <p>i brackets: ) text([ ) text(is cool. ] cool. ) text([]) text( cool. ) tag(tag) text( special.</p> </body> </html> ) tag(footer) text( )

the final text node contains newline @ end of file.

Search This Blog

Ray access

parsing - How to parse template languages in Ragel? -

Comments

Post a Comment

Popular posts from this blog

windows - Why does Vista not allow creation of shortcuts to "Programs" on a NonAdmin account? Not supposed to install apps from NonAdmin account? -

c++ - How do I get a multi line tooltip in MFC -

asp.net - In javascript how to find the height and width -