Experimenting with Syntax...

From: "Juancarlo AƱez"
Newsgroups: sdforum.want
Sent: Sunday, June 17, 2007 11:37 AM
Subject: Experimenting with syntax: Pascal-like

Here goes the experiment of translating WANT's want.xml to pascal like.

Some initial criteria:

  • Pascal's verboseness is sometimes a handicap. I'll try to avoid using "begin" when the construct is already known to be structured, and things like that.
  • Use // for comments.
  • Make all structured constructs end with and "end something" with the "something" being optional:
    project WANT
    end project; // "project" is optional here
  • WANT's current XML-based syntax is expandable because the registration of elements and tasks is done in a grammar-like fashion. This allows for the language to grow and change without having to change the parser (a user could, f.e., add a new sub-element to the existing "patternset"). This may be something to keep.
project WANT  // defaults to name="WANT"
   default compile; // will need two passes to allow forward ident refs
   // Use semicolons to tell the parser not to expect an "end xxx".
   // "name" defaults to the above
   // "basedir" defaults to "."
     // the following definitions are assumed to
     // belong to the "project" construct

     want.master = "%{want_master}" // allow "." in identifiers?
     old.version = "?{release.ini:releases:current}"
     // need syntax for regular expressions
     old.build = "${old.version}" ~ /"^.*\.([0-9]+)$"/"\1"/
     build  = "={1 + ${old.build}}" if want.master
     build" = "${old.build}"    unless want.master

     version = "${old.version}" ~ /"\.[0-9]*$"/".${build}"/
     comma.version = "${version}" ~ /"\."/","/

     tstamp // allows for format element
        format when = "yyyy,mm,dd,HH,nn,ss"
        format date.tag = "yyyy-mm-dd"
     end tstamp
   end const
   // I think this "end const" is important as "const" can be
   // used within targets. It emphasizes the difference between
   // property definitions and task/element invocations.
   // Another option is to use "=" _only_ for properties.

   // skip other properties...

   patternset sources
     // we need to identify the "default" property,
     // the one to which the construct identifier is assigned to.
     // It is "id" in this case, and "name" for the project construct.
     include  "${lib}/**";
     // no "const ... end const" means that these are NOT
     // property definitions but element invocations
     include  "${src}";
     include  "${src}/**";
   end patternset

   patternset resources
     // WANT's old "patternset refid" comes from Ant
     // and is inadecuate
     reference sources;   // refid="sources"
     include "${bin}";
   end patternset

   target prepare
      task mkdir "${dcu}"; // defaults to dir="${bin}"
      task mkdir "${bin}";
      task echo  "version=${version}";
      task echo  // any task or element can be structured
        message = "build=${build}"
      end task
   end target

   target clean
     task delete "${bin}"
       include "**" end  // or we could use "end" instead of ";"?
       exclude "ide" end
       exclude "ide/**" end
     end task
   end target

   // skip to a compilation target

   target compile-want
     // the following syntax breaks consistency
     depends prepare, versioninfo, resources, build;
     task dcc if true  // this is a good place for conditionals
       // in current WANT any attribute can be expressed as an element
       // <attribute-name value="the value" />
       basedir "${src}";  // defaults to value="${src}"
       source "want.dpr";

       exeoutput "${bin}";
       dcuoutput "${dcu}";
       debug     true;  // "true" is a predefined constant
       console   true;
       hugestrings true;

       define "USE_JEDI_JCL";
       define "DUNIT_DLL";
       define "SUPPORTS_WIDESTRING";

       unitpath refid="sources" end // a structured construct in one line
       includepath  refid="sources" end
       resourcepath refid="resources" end
     end task
   end target

   target compile
      depends compile-want,compile-dof2want;
   end target

  // skip more stuff

   target tarball
     task exec "wget" // defaults to executable="wget"
       arg "--quiet"; // defaults to value="--quiet"
       arg "--output-document=@{want-cvs-${date.tag}.tar.bz2}";
       arg "http://cvs.sourceforge.net/cvstarballs/want-cvsroot.tar.bz2";
     end task
   end target
end project

// finished with no need for functions or list/array literals
// the regexp syntax is candy that can be removed
// regexp var_name // defaults to name="var_name"
//   text "target text";
//   pattern "^(.*)match me";
//   subst "\1";
// end regexp;

A decision has to be made between:

task name
end task

or just:

name // language augmentation
end name;

In favor of the second option is that the parser can be generic like the current one in WANT, with the final syntax defined by how tasks and elements can be nested, as it's done now. The current parser doesn't even know that "project" is the top-level construct.

The complete grammar could be:

start ::= element
element ::= ident [default] (";" | elements "end" [ident])
elements ::= (property | element)*
property ::= ident "=" value
value ::= ident | literal
literal = string

if we get rid of "const", or make it just a property-pass-through element.

I just took a look at the base classes and the execution engine, and I think they should go through a major rewrite. The multiple pass (Init, Configure, Execute) is mightily confusing. The check for required attributes must be made before starting execution, but that can be done right after a construct has been parsed.

BTW, there's infrastructure for lexical analysis and parsing in src/jal/JalParse.pas.