This page presents 2 ways to extract runnable source from 'literate' MoinMoin pages.

First is implemented in J and is intended to be built into J system to provide seamless integration of literate programs.

Second is implemented in Perl and extracts parts from file using command line interface.

J script

Recursive part unwrapping algorithm uses global variables to keep raw data and track which section are being processed. This is why all verbs reside in their own jtangle locale. j prefix seems appropriate since this is intended to be part of development environment.

«j»=
cocurrent 'jtangle'
require 'regex'
«extract_pieces»
«unwrap»
«read»
NB. readlit_z_=:readlit_jtangle_

extract_pieces

extract_pieces verb takes entire file contents as string and cuts it into named pieces according to 'name' attribute of pieces.

It assigns list of boxed section names in SECTIONS global. The corresponding boxed text is in TEXT global. It assumes input to be utf8-encoded text with each line terminated with LF character (that includes CRLF).

«extract_pieces»=
extract_pieces=:3 : 0
  assert. LF-:{:y
  reB=.'^\s*',(3#'{'),'#!literate.*\sname\s*=\s*''([^'']*)''.*$'
  reE=.'^\s*',(3#'}'),'.*$'
  
  hdri=.reB rxmatches y
  sn=.y rxfrom~ {:"2 hdri
  strt=.>:@+/@{."2 hdri
  i=./: l=.strt , {.@{."2 reE rxmatches y
  TEXT=:sn <@;/. y rxfrom~ _2 ({. , -~/)\ l{~i#~ (+. _1&|.) i<#strt
  SECTIONS=:~.sn
  PREFIX=:''
  STACK=:i.0
  i.0 0
)

cleanup deletes all used temporary globals. It is not used right now (mostly for debugging purposes) but sholdbe included in readlit

«extract_pieces»=
cleanup=:3 : 0
  4!:55 ;:'TEXT PREFIX STACK SECTIONS'
  i.0 0
)

unwrap

«unwrap»=
unwrap=: 3 : 0

unwrap returns contents of a section named y. If this section contains references to ther sections, they are substituted recursively. It uses globals generated by extract_pieces.

«unwrap»=
  y=.boxopen y
  if. y e. STACK do. return. end. NB. prevent circular referencing
  assert. (#SECTIONS)>k=.SECTIONS i. y
  STACK=:STACK,y

Only one section reference per line is allowed. Nothing but whitespaces can be in that line. TODO: (?) Whitespaces before section reference should be inserted before each line of section text.

«unwrap»=
  result=.>k{TEXT
  r=.'^(\s*)«(.*)»\s*$' rxmatches result
  n=.(2{"2 r) rxfrom result
  rplcdata=.unwrap &.>n
  result=.rplcdata  (0&{"2 r) rxmerge result

We used to update section text with unwrapped data, but then decided against it.

  TEXT=:(<result) k} TEXT

«unwrap»=
  STACK=:_1}.STACK
  result
)

readlit

«read»=
readlit=:3 : 0
  extract_pieces 1!:1 < jpath > y
  i=.1 i.~ (1&,^:([: -. +./)) ([: +./ '.ijs'&E.)&> SECTIONS
  unwrap i{SECTIONS
:
  extract_pieces 1!:1 < jpath > y
  unwrap x
)

Optional left argument specifies which section to output. If not specified, then first section name that contains .ijs is used. If there is no such section, then first section in the file.

test commands

T=:1!:1 <jpath '~nsg\literate\jtangle.lit'
readlit '~nsg\literate\jtangle.lit'

Perl script download

For convenirnce the generated perl code is provided as an attachment. It is not necessarily the latest one (which also may be a good thing).

jtangle.pl.txt

Perl script

Note on comments

Current literate parser attaches comment (which points to the literate source from which script was generated) on top of file. Usually it correctly guesses the form of comment (J-style NB. comments or Perl # comments), but may make a mistake once in a while. Please check downloaded source for consistency if you encounter problems.

«perl»=
# 2007-03-19
use strict;
use bytes;
«perloptions»
«perlvar»

Perl script options

«perloptions»=
use Getopt::Std;
our(

«perloptions»=
  $opt_f, # use section names as filenames, otherwise dump everything to STDOUT;

Default behaviour is to ignore filenames and dump everything to STDOUT. The problem is that filename may contain relative paths and attempt to overwrite system files via

{ { {#!literate name='c:\autoexec.bat'
... something sinister

This way the harm can be done during source extraction stage wich is less expected. Need to implement some kind of checking mechanism.

«perloptions»=
  $opt_s, # extract only section with given name

If this option is not specified, then script will extract sections that have '.' in their names, assuming those are source files. If only one file or only specific section is needed, then the name of this section may be specified.

«perloptions»=
  $opt_l, # list section names and their relationships

«perloptions»=
  $opt_q, # quiet mode, do not show any warnings

«perloptions»=
);
getopts("fs:lq");
die "-f and -l are mutually exclusive" if $opt_f && $opt_l;

Instead of die we could have quietly turn -f off when -l is on, but it seems better to not try to guess user intentions.

Scan entire file

«perlvar»=
our %piece;
our $section;

For each line in a file grab lines into global hash %piece, which contains named sections in form of arrays of strings. Current section name is in global $section

«perl»=
my $CLOSE='}' x 3; # kludge to work around current literate parsing
while(<>) {
  my $n;
  if( $n=/^\s*{{{#!literate.*name='([^']*)'/ .. /^\s*$CLOSE\s*$/ ) {
    if( 1==$n ) {
      $section=$1;
      $piece{$section}=[] unless exists $piece{$section};
    }

Perl's .. (range operator) returns 'E0' attached to the position number when line matches final expression. This does not change position's numeric value but gives something to look for to test for final expression.

«perl»=
    elsif( 'E0' ne substr($n,-2,2) ) {
      push @{$piece{$section}},$_;
    }
  }
}

Select and unwrap top-level section

Scan through named sections and recursively unwrap those that contain '.' in their name.

«perlvar»=
our $PREFIX;

Global $PREFIX contains string to prepend to indented sections (for now can only be whitespaces. TODO(?) comments).

«perlvar»=
our @STACK;

Global @STACK contains list of pending sections to detect self references.

«perl»=
close STDOUT if $opt_f;
for my$s(keys %piece) {
  if( $s eq $opt_s || ('' eq $opt_s && 0<=index($s,'.'))  ) {
    $PREFIX='';
    @STACK=();
    if( $opt_f ) {
      warn "Write section to $s\n" unless $opt_q;
      open STDOUT, ">$s" if $opt_f;
    }
    unwrap($s);
    close STDOUT if $opt_f;
  }
}

Procedure that recursively unwraps sections

«perl»=
sub unwrap($)
{
  my $s=shift;
  if( !exists $piece{$s} ) {
    warn "Section $s is referenced but not defined. Nothing is substituted.\n" unless $opt_q;
    return;
  }

If name of a current section is already in @STACK then substitution will never finish. Give warning and ignore this occurence of section.

«perl»=
  for my$e(@STACK) {
    if( $s eq $e ) {
      warn "Recursion detected: $s" unless $opt_q;
      return;
    }
  }

For each line of section either output it (with prepended $PREFIX) or, if it is a section reference, recursively unwrap it. For now there can be only one section reference per line and nothing but whitespace is allowed around it.

«perlvar»=
our %unwrapped;

The hash %unwrapped keeps track of which sections were used and how many times. Currently it is possible to use section more than once. Maybe, this needs to be signalled as a mistake.

«perl»=
  $unwrapped{$s}++;
  if( $opt_l ) {
    print "",("  " x @STACK),("@" x (1<$unwrapped{$s})),$s,"\n";
    return if $unwrapped{$s}>1;
  }
  push @STACK,$s;
  for my$l(@{$piece{$s}}) {
    if( $l=~/^(\s*)«(.*)»\s*$/ ) {
      my $p=$PREFIX;
      $PREFIX=$p.$1;
      unwrap($2);
      $PREFIX=$p;
    } else {
      print "",$PREFIX,$l unless $opt_l;
    }
  }
  pop @STACK;
}

Warn about unused sections

In the end check if any of the named sections were not used by unwrap and give warning.

«perl»=
for my$s(keys %piece) {
  if( $opt_l ) {
    print "-$s\n" if !exists $unwrapped{$s};
  } else {
    if( !exists $unwrapped{$s} ) {
      warn "Section $s is defined but never used\n" unless $opt_q;
    } elsif( 1<$unwrapped{$s} ) {
      warn "Section $s is used more than once\n" unless $opt_q;
    }
  }
}

Obtaining literate source

For some reason

wget -U "Mozilla" -O- http://www.jsoftware.com/jwiki/AndrewNikitin/Jtangle?action=raw | perl jtangle.pl -s jtangle.ijs >z.ijs

garbles end of lines. Other than that this command is a reasonably valid way to obtain latest source.

Alternatively, raw source may be downloaded into the local file and converted separately

wget -U "Mozilla" -Oz http://www.jsoftware.com/jwiki/AndrewNikitin/Jtangle?action=raw
perl jtangle.pl -s jtangle.ijs z >z.ijs

I keep copies of my own LPs on local hard drive. When I want to make change I download raw source from Wiki (using variant of above command), do compare, incorporate changes and save LP back to Wiki. This way Wiki acts as a kind of version control system.

jadeful hack

Performing a manual step of extracting code portion from literate source before execution may be just enough bother to discourage its use altogether.

Suggested hack replaces the standard load utility in system\extras\util\jadefull.ijs and/or system\extras\util\jadecon.ijs to perform this extraction step automatically if needed.

The hack recognizes files with .lit extension as needing special treatment. It extracts first .ijs section (or just first section) from it and runs it from noun using 0!:100 or 0!:101 foreigns. Note that script and scriptd are more than just 0!:0 and 0!:1, but this simplistic approach should work for now.

«jadeful hack»=
load_z_=: 3 : 0
0 load y
:
fls=. getscripts_j_ y
fn=. ('script',x#'d')~
for_fl. fls do.
  if. DISPLAYLOAD_j_ do. smoutput > fl end.
  if. '.lit' -: _4 {. > fl do.
    NB. special treatment for .lit files
    NB. modify location of jtangle.ijs if different
    require '~nsg/literate/jtangle.ijs'
    0!:(100+x) readlit_jtangle_ fl
  else.
    fn fl
  end.
  LOADED_j_=: ~. LOADED_j_,fl
end.
empty''
)

Contributed by AndrewNikitin

Discussion

(Notwithstanding that it's work in progress.) Literate/Wiki Tool is implemented as a MoinMoin plugin. It has, naturally, its own "tangle". Which is also naturally implemented in Python. It's not a question about whose choice of langauge of implementation is better, but of practical nature: Wouldn't having another an alternative Perl implementation be duplicating the effort? The Literate Wiki Tool will be evolving and it only make sense to have the same code base for tanlge, that will used in both places: stand-alone script and Wiki plugin. -- OlegKobchenko 2007-03-20 19:01:55

I need perl script for my internal process anyway. Besides, I do not have python installed on any of my machines and will not have in forseeable future and "duplicating effort" on one page script does not seem like such a waste to me. BTW, if you post your python parser, preferably in literate form, I will try to ensure that perl and j implementations match it as close as possible. -- AndrewNikitin 2007-03-20 19:10:13

It is published, where it should: in parser market of MoinMoin. Making it Literate is a good idea. I don't know how complicated parser installation process is at MoinMoin web site, but having some experience and a few rounds of improvements here at J Wiki, will help them get convinced. -- OlegKobchenko 2007-03-20 19:47:40


CategoryLiterate CategoryWorkInProgress

AndrewNikitin/Jtangle (last edited 2010-05-27 19:34:31 by AndrewNikitin)