DI Management Home > Cryptography > Converting from lex & yacc to flex & bison

Converting from lex & yacc to flex & bison


Recommended reading

Affiliate disclosure: we get a small commission for purchases made through the above links

This is part one of two pages explaining how to convert example lex and yacc code from Lex & Yacc by Levine, Mason and Brown [1] to work on modern systems using flex and bison. The source code from the book is available from the publisher's site.

That code was written over 20 years ago for the old, buggy AT&T lex and yacc software, and C compilers have changed slightly since then in what external functions they provide by default. The message boards seem to be full of complaints along the line of "OMG I downloaded the code and it doesn't compile. Help!".

This page takes a sample lex and yacc program and explains what we did to make it compile without error or warning on a standard Linux system in 2013 (Bodhi 3.2.0.19-generic with gcc 4.4.3, bison 2.4.1 and flex 2.5.35). Part two Using flex and bison in MSVC++ covers how we made it compile without any error or warning messages on Windows using Microsoft Visual Studio 2008. We also look at how to fix the memory leak problem.

UPDATE: There is a more recent book: Flex & Bison by John Levine [2]

We should add that we are AR* programmers who like our source code to compile without any warnings whatsoever, and we like to be able to compile the same source code on both our Windows MSVC compiler and on Linux, preferably without using convoluted preprocessor macros. The code we give here compiles for us on both systems without any any warnings (OK, we do supress a few of them) and does not have any memory leaks.

* AR = Overly obsessive concerning small details.

The sample program

We try the simple calculator with variables and real values from page 64 of [1] using the original files ch3-03.y and ch3-03.l. Compiling in Linux using the following commands

bison -d -v -b y ch3-03.y
flex ch3-03.l
g++ -o ch3-03 y.tab.c lex.yy.c -lfl
we get these error messages
y.tab.c: In function ‘int yyparse()’:
y.tab.c:1258: error: ‘yylex’ was not declared in this scope
ch3-03.y:23: error: ‘printf’ was not declared in this scope
ch3-03.y:31: error: ‘yyerror’ was not declared in this scope
y.tab.c:1439: error: ‘yyerror’ was not declared in this scope
y.tab.c:1582: error: ‘yyerror’ was not declared in this scope
We should explain why we use the above commands. The -b y option in bison causes it to create its output files in the form y.tab.c and y.tab.h. We use g++ instead of gcc because, surprising enough, it seems to be the easier option on Linux to compile the pure ANSI C output from flex and bison.

The fix

Revised file: ch3-03.lRevised file: ch3-03.y
 1 %{
 2 /* We usually need these... */
 3 #include <stdio.h>
 4 #include <stdlib.h>
 5 
 6 /* Include this to use yylex_destroy for flex version < 2.5.
 7 9 */
 8 #include "flex_memory_fix.h"
 9 
10 /* This is required and is generated automatically by bison 
11 from the .y file */
12 #include "y.tab.h"
13 
14 /* Local stuff we need here... */
15 #include <math.h>
16 extern double vbltable[26];
17 %}
18 
19 /* Add this to get line numbers... */
20 %option yylineno
21 
22 %%
23 ([0-9]+|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?) {
24   yylval.dval = atof(yytext); return NUMBER;
25   }
26 
27 [ \t] ;    /* ignore white space */
28 
29 [a-z] { yylval.vblno = yytext[0] - 'a'; return NAME; }
30 
31 "$" { return 0; /* end of input */ }
32 
33 \n  |
34 . return yytext[0];
35 %%
36 
37 /* We need to add a main() function. 
38  * It is more convenient to put it here to manage flex 
39  memory management issues.
40  * At the minimum it must call yyparse().
41  */
42 extern int yyparse();
43 
44 int main(int argc, char *argv[])
45 {
46   printf("Enter sums using + - * / and () or type $ to quit.
47          \n");
48   yyparse();  /* REQUIRED */
49   yylex_destroy();  /* Add to clean up memory leaks */
50 }
 1 %{
 2 /* For printf() */
 3 #include <stdio.h>
 4 
 5 /* Proformas for functions we define below... */
 6 void yyerror(char *s);
 7 int yylex(void);
 8 
 9 /* Specific for here... */
10 double vbltable[26];
11 %}
12 
13 %union {
14   double dval;
15   int vblno;
16 }
17 
18 %token <vblno> NAME
19 %token <dval> NUMBER
20 %left '-' '+'
21 %left '*' '/'
22 %nonassoc UMINUS
23 
24 %type <dval> expression
25 %%
26 statement_list: statement '\n'
27   | statement_list statement '\n'
28   ;
29 
30 statement:  NAME '=' expression { vbltable[$1] = $3; }
31   | expression    { printf("= %g\n", $1); }
32   ;
33 
34 expression: expression '+' expression { $$ = $1 + $3; }
35   | expression '-' expression { $$ = $1 - $3; }
36   | expression '*' expression { $$ = $1 * $3; }
37   | expression '/' expression
38         { if($3 == 0.0)
39             yyerror("divide by zero");
40           else
41             $$ = $1 / $3;
42         }
43   | '-' expression %prec UMINUS { $$ = -$2; }
44   | '(' expression ')'  { $$ = $2; }
45   | NUMBER
46   | NAME      { $$ = vbltable[$1]; }
47   ;
48 %%
49 /* An optional but friendlier yyerror function... */
50 void yyerror(char *s)
51 {
52   extern int yylineno;  // defined and maintained in lex
53   extern char *yytext;  // defined and maintained in lex
54   fprintf(stderr, "ERROR: %s at symbol '%s' on line %d\n", s,
55           yytext, yylineno);
56 }

Differences

To see the highlighted differences between the revised and original files look at ch3-03.l-differences and ch3-03.y-differences.

The revised source code files ch3-03.l and ch3-03.y and flex_memory_fix.h are in this zip file (2 kB).

Some explanation

  1. We need to add explicitly the standard ANSI C include files like stdio.h and stdlib.h (see .l: lines 3-4, .y: line 3). Back in 1990 you generally didn't need to do that.
  2. We need to add a main() function and (optionally) a yyerror() function for lex. You could put these in a separate .c module. If not, it is more convenient to put the main function in the lex .l file (see .l: lines 44-50)
  3. The core lex regular expressions and yacc grammar rules between the %% lines are unchanged from the originals.
  4. We call yylex_destroy() (.l: line 49) to remove a memory leak problem. The optional flex_memory_fix.h (.l: line 8) file should be added for versions of flex earlier than 2.5.9.

A diagram

If you are interested, the logic in the bison file is shown in this diagram (309 kB). To obtain this gif file, we used the -g option in bison and then used dot (available from Graphviz) on the resulting .dot file to create a gif file.

bison -d -v -b y -g ch3-03.y
dot -Tgif y.dot -o ch3-03.dot.gif

Compiling without warnings

Actually we compiled the above C source code using the option
g++ -Wno-write-strings -o ch3-03 y.tab.c lex.yy.c -lfl
to avoid an annoying and useless warning message [deprecated conversion from string constant to 'char *'].

Even better options

A more general set of options to use with g++ that eliminates most unnecessary warnings with flex and bison output is
-Wall -Wno-unused -Wno-deprecated -Wno-write-strings

Memory leak problem

There is a memory leak problem in GNU flex: see Memory leak - 16386 bytes allocated by malloc. In flex versions 2.5.9 and above you can fix this by calling yylex_destroy(). The fudge in flex_memory_fix.h should allow you to solve this for earlier versions. This is more of an issue for Windows users where the latest available GNU version of flex is currently only 2.5.4 (as of March 2013). If your version of flex is 2.5.9 or above, then you don't need this .h file.

Flex and bison in Windows

See Using flex and bison in MSVC++.

References

Contact

To comment on this page or to contact us, please send us a message. If you send us a relevant comment for this page, we'll post it here. Just mention "converting_from_lex_and_yacc" in your message.

This page first published 3 March 2013. Last updated 24 June 2020.