Parsing is a process of analyzing character stream according to formal lexical, syntactic and/or semantic grammar, producing output structure or evaluation.
Lexical Analysis
Produces an stream of tokens from a stream of input characters. Stream can be a list. Lexing can be done using a sequential machine, regular expressions, or ad hoc splitting. AKA lexing, scanning, tokenizing.
Sequential Machine, AKA finite state machine, finite automata. Uses state transition table.
dyad ;: Sequential Machine
J implementation with an example of J lexer for Alphabet and WordsEssays/Word Formation on Lines
Sequential machine for J words with space and line tokens with extensive examplesScripts/JavascriptCruncher
stripping out unnecessary content from the files to reduce file size (comments, etc).JWebServer/HttpParser
HTTP header lexer using ;: dyad, and elements of ad hoc parisingAddons/graphics/graphviz
http://olegykj.sourceforge.net/scrshots/graphviz.html
visualizing sequential machines using transition diagramschat/2007-April/000464
JSON style backslash evaluatorchat/2007-April/000466
JSON tokenizer, with details of producing the sequential machine transition table
Regular Expressions internally may use sequential machine, but have intuitive standard syntax.
Regular Expressions Lab
Guide to regex libraryEssays/Regex Lexer
a lexer based on standard regular expressions and simple token declarationsScripts/Regular Expressions Substitution
Regular expressions extended for Perl/awk/sed-like substitution
Ad Hoc looks for simple substrings for (iterative) splitting
programming/2007-January/004756
example of ad hoc splitting for a list of first/initial/last namesScripts/Scheme
has a Lisp S-expression string tokenizer
Syntactic Analysis
Produces a structure or evaluates a stream of tokens. The structure is typically a tree of grammar elements. AKA parsing.
Bottom-up, AKA Shift-reduce. E.g., LR parsers.
Parsing and Execution from J Dictionary, RogerHui, Kenneth Iverson
Parsing and Execution from J for C Programmers, HenryRich
trace script (packages/misc/trace.ijs)
provides a model of the J parser whose internal workings can be examined and experimented withchat/2007-April/000462
JSON shift-reduce parser
Top-down, AKA Recursive descent. E.g. LL parsers.
Essays/Recursive Descent Parser
framework for simple building of hand-coded LL parsers using Regex LexerScripts/Scheme
has a tacit recursive-descent parser
Ad Hoc parsing which alternates splitting and combining substring portions on multiple typically non-recursive levels
csv script (packages/files/csv.ijs)
reads csv file into a boxed arraypp script
J pretty-print script formatterChrisBurke/Export Script utility (packages/export)
converts a script into various formats
Handling Structures
Since a lot of parsing is based on ASTs, an introduction to efficient tree handling in J would help. You might look at
the lab Huffman Coding
Roger's Essays/Huffman Coding
See Also
J-related information
Guides/Strings string and text manipulation resources
programming/2007-November/008869 some initial links
DanBron/Temp/ParseLexExecute implementing J in J
Guides/Language FAQ/J BNF Is there a BNF description of J?
chat/2007-November/000678 J syntax easy to parse? I don't think so
General information
Parsing, Wikipedia
