Richard Loveland
2012-11-05 03:31:22 UTC
Greetings scsh cognoscenti,
I've got a script throwing errors under Scsh 0.7 that runs under 0.6.
It's a word counting script that uses the `rx' package (code is below).
* Shell output
Here are the outputs from issuing the script command at the shell. The
first is for 0.7, the second for 0.6. The file is a plaintext Emacs org
file.
wc-0.7.scm < jelec-02.org
assertion-violation [ascii->char] with no handler in place: not an ASCII
code195
stack template id's: 3016 <- 1720 <- 3015 <- 914 <- <- <- <- <- <-
<- 1720 <- <- 1720 <- 6624 <- 3309 <-
wc-0.6.scm < jelec-02.org
25337
* Source of the script
...is as follows (also attached in case it gets garbled -- need to set
up a plaintext email program, apologies!):
#!/usr/local/bin/scsh \
-e main -s
!#
(define wc-rx (rx (: ; begin a matching sequence
(* whitespace) ; beginning with zero+ spaces,
(+ alphanumeric) ; match 1 or more [0-9a-zA-Z]
(? ; then, optionally match:
(or #\' #\`) ; - apostrophe or backtick
(* alphanumeric)) ; - 0 or more [0-9a-zA-Z]
(* whitespace)))) ; finally, 0 or more spaces end
(define (main prog+args)
(display
(awk (read-line) (line) ((words 0))
(#t (+ words
(length
(regexp-fold-right wc-rx (lambda (m i lis)
(cons (match:substring m 0) lis))
'() line))))))
(newline))
* Checking `rx' sources
Finally, grepping the rx package's sources provide this output, which
parse.scm: ((1) (values `(,(ascii->char from)
parse.scm: ,(ascii->char to)
parse.scm: ((2) (values `(,(ascii->char from)
parse.scm: ,(ascii->char (+ from 1))
parse.scm: ,(ascii->char to)
parse.scm: `((,(ascii->char from) .
parse.scm: ,(ascii->char to))
parse.scm: (if (char-set-contains? cset (ascii->char i))
posixstr.scm:(define *nul* (ascii->char 0))
posixstr.scm: ((> start end) (values (list (ascii->char c)) '())) ;
Empty range
posixstr.scm: (values (list (ascii->char c) (ascii->char start))
posixstr.scm: (values (list (ascii->char c) (ascii->char start)
(ascii->char end))
posixstr.scm: (else (values (list (ascii->char c))
posixstr.scm: (list (cons (ascii->char start) (ascii->char end)))))))
rx-lib.scm: (cs cset (char-set-adjoin! cs (ascii->char i))))
scsh-read.scm:; Ascii stuff: char->ascii, ascii->char,
ascii-whitespaces, ascii-limit
scsh-read.scm:(define bel (ascii->char 7))
scsh-read.scm:(define bs (ascii->char 8))
scsh-read.scm:(define ff (ascii->char 12))
scsh-read.scm:(define cr (ascii->char 13))
scsh-read.scm:(define ht (ascii->char 9))
scsh-read.scm:(define vt (ascii->char 11))
scsh-read.scm: (ascii->char (+ (* 64 d1)
(+ (* 8 d2) d3)))))
scsh-read.scm: (ascii->char (+ (* 16 d1)
d2))))
scsh-read.scm: (string-set! p-c-v i (p-c (ascii->char i)))))
spencer.scm: (cset cset (char-set-adjoin! cset (ascii->char j))))
My layman's reading of the above is that I should not do any
regexp-matching on files which contain non-ASCII characters with 0.7. Is
that correct? Although it's not clear why 0.6 works.
Any help from a kindly Scsh wizard would be much appreciated.
Best,
Rich
I've got a script throwing errors under Scsh 0.7 that runs under 0.6.
It's a word counting script that uses the `rx' package (code is below).
* Shell output
Here are the outputs from issuing the script command at the shell. The
first is for 0.7, the second for 0.6. The file is a plaintext Emacs org
file.
wc-0.7.scm < jelec-02.org
assertion-violation [ascii->char] with no handler in place: not an ASCII
code195
stack template id's: 3016 <- 1720 <- 3015 <- 914 <- <- <- <- <- <-
<- 1720 <- <- 1720 <- 6624 <- 3309 <-
wc-0.6.scm < jelec-02.org
25337
* Source of the script
...is as follows (also attached in case it gets garbled -- need to set
up a plaintext email program, apologies!):
#!/usr/local/bin/scsh \
-e main -s
!#
(define wc-rx (rx (: ; begin a matching sequence
(* whitespace) ; beginning with zero+ spaces,
(+ alphanumeric) ; match 1 or more [0-9a-zA-Z]
(? ; then, optionally match:
(or #\' #\`) ; - apostrophe or backtick
(* alphanumeric)) ; - 0 or more [0-9a-zA-Z]
(* whitespace)))) ; finally, 0 or more spaces end
(define (main prog+args)
(display
(awk (read-line) (line) ((words 0))
(#t (+ words
(length
(regexp-fold-right wc-rx (lambda (m i lis)
(cons (match:substring m 0) lis))
'() line))))))
(newline))
* Checking `rx' sources
Finally, grepping the rx package's sources provide this output, which
grep "ascii->char" *
parse.scm: ((0) (values (cons (ascii->char from) loose)parse.scm: ((1) (values `(,(ascii->char from)
parse.scm: ,(ascii->char to)
parse.scm: ((2) (values `(,(ascii->char from)
parse.scm: ,(ascii->char (+ from 1))
parse.scm: ,(ascii->char to)
parse.scm: `((,(ascii->char from) .
parse.scm: ,(ascii->char to))
parse.scm: (if (char-set-contains? cset (ascii->char i))
posixstr.scm:(define *nul* (ascii->char 0))
posixstr.scm: ((> start end) (values (list (ascii->char c)) '())) ;
Empty range
posixstr.scm: (values (list (ascii->char c) (ascii->char start))
posixstr.scm: (values (list (ascii->char c) (ascii->char start)
(ascii->char end))
posixstr.scm: (else (values (list (ascii->char c))
posixstr.scm: (list (cons (ascii->char start) (ascii->char end)))))))
rx-lib.scm: (cs cset (char-set-adjoin! cs (ascii->char i))))
scsh-read.scm:; Ascii stuff: char->ascii, ascii->char,
ascii-whitespaces, ascii-limit
scsh-read.scm:(define bel (ascii->char 7))
scsh-read.scm:(define bs (ascii->char 8))
scsh-read.scm:(define ff (ascii->char 12))
scsh-read.scm:(define cr (ascii->char 13))
scsh-read.scm:(define ht (ascii->char 9))
scsh-read.scm:(define vt (ascii->char 11))
scsh-read.scm: (ascii->char (+ (* 64 d1)
(+ (* 8 d2) d3)))))
scsh-read.scm: (ascii->char (+ (* 16 d1)
d2))))
scsh-read.scm: (string-set! p-c-v i (p-c (ascii->char i)))))
spencer.scm: (cset cset (char-set-adjoin! cset (ascii->char j))))
My layman's reading of the above is that I should not do any
regexp-matching on files which contain non-ASCII characters with 0.7. Is
that correct? Although it's not clear why 0.6 works.
Any help from a kindly Scsh wizard would be much appreciated.
Best,
Rich