Errors in script running with Scsh 0.7 vs. 0.6

Discussion:

Richard Loveland

2012-11-05 03:31:22 UTC

Greetings scsh cognoscenti,

I've got a script throwing errors under Scsh 0.7 that runs under 0.6.
It's a word counting script that uses the `rx' package (code is below).

* Shell output

Here are the outputs from issuing the script command at the shell. The
first is for 0.7, the second for 0.6. The file is a plaintext Emacs org
file.

wc-0.7.scm < jelec-02.org
assertion-violation [ascii->char] with no handler in place: not an ASCII
code195
stack template id's: 3016 <- 1720 <- 3015 <- 914 <- <- <- <- <- <-
<- 1720 <- <- 1720 <- 6624 <- 3309 <-

wc-0.6.scm < jelec-02.org
25337

* Source of the script

...is as follows (also attached in case it gets garbled -- need to set
up a plaintext email program, apologies!):

#!/usr/local/bin/scsh \
-e main -s
!#

(define wc-rx (rx (: ; begin a matching sequence
(* whitespace) ; beginning with zero+ spaces,
(+ alphanumeric) ; match 1 or more [0-9a-zA-Z]
(? ; then, optionally match:
(or #\' #\`) ; - apostrophe or backtick
(* alphanumeric)) ; - 0 or more [0-9a-zA-Z]
(* whitespace)))) ; finally, 0 or more spaces end

(define (main prog+args)
(display
(awk (read-line) (line) ((words 0))
(#t (+ words
(length
(regexp-fold-right wc-rx (lambda (m i lis)
(cons (match:substring m 0) lis))
'() line))))))
(newline))

* Checking `rx' sources

Finally, grepping the rx package's sources provide this output, which

grep "ascii->char" *

parse.scm: ((0) (values (cons (ascii->char from) loose)
parse.scm: ((1) (values `(,(ascii->char from)
parse.scm: ,(ascii->char to)
parse.scm: ((2) (values `(,(ascii->char from)
parse.scm: ,(ascii->char (+ from 1))
parse.scm: ,(ascii->char to)
parse.scm: `((,(ascii->char from) .
parse.scm: ,(ascii->char to))
parse.scm: (if (char-set-contains? cset (ascii->char i))
posixstr.scm:(define *nul* (ascii->char 0))
posixstr.scm: ((> start end) (values (list (ascii->char c)) '())) ;
Empty range
posixstr.scm: (values (list (ascii->char c) (ascii->char start))
posixstr.scm: (values (list (ascii->char c) (ascii->char start)
(ascii->char end))
posixstr.scm: (else (values (list (ascii->char c))
posixstr.scm: (list (cons (ascii->char start) (ascii->char end)))))))
rx-lib.scm: (cs cset (char-set-adjoin! cs (ascii->char i))))
scsh-read.scm:; Ascii stuff: char->ascii, ascii->char,
ascii-whitespaces, ascii-limit
scsh-read.scm:(define bel (ascii->char 7))
scsh-read.scm:(define bs (ascii->char 8))
scsh-read.scm:(define ff (ascii->char 12))
scsh-read.scm:(define cr (ascii->char 13))
scsh-read.scm:(define ht (ascii->char 9))
scsh-read.scm:(define vt (ascii->char 11))
scsh-read.scm: (ascii->char (+ (* 64 d1)
(+ (* 8 d2) d3)))))
scsh-read.scm: (ascii->char (+ (* 16 d1)
d2))))
scsh-read.scm: (string-set! p-c-v i (p-c (ascii->char i)))))
spencer.scm: (cset cset (char-set-adjoin! cset (ascii->char j))))

My layman's reading of the above is that I should not do any
regexp-matching on files which contain non-ASCII characters with 0.7. Is
that correct? Although it's not clear why 0.6 works.

Any help from a kindly Scsh wizard would be much appreciated.

Best,
Rich

Roderic Morris

2012-11-05 13:47:55 UTC

Permalink

Should be easy to fix. Could you send a small example it breaks on?

-Roderic

Post by Richard Loveland
Greetings scsh cognoscenti,
I've got a script throwing errors under Scsh 0.7 that runs under 0.6.
It's a word counting script that uses the `rx' package (code is below).
* Shell output
Here are the outputs from issuing the script command at the shell. The
first is for 0.7, the second for 0.6. The file is a plaintext Emacs org
file.
wc-0.7.scm < jelec-02.org (http://jelec-02.org)
assertion-violation [ascii->char] with no handler in place: not an ASCII
code195
stack template id's: 3016 <- 1720 <- 3015 <- 914 <- <- <- <- <- <-
<- 1720 <- <- 1720 <- 6624 <- 3309 <-
wc-0.6.scm < jelec-02.org (http://jelec-02.org)
25337
* Source of the script
...is as follows (also attached in case it gets garbled -- need to set
#!/usr/local/bin/scsh \
-e main -s
!#
(define wc-rx (rx (: ; begin a matching sequence
(* whitespace) ; beginning with zero+ spaces,
(+ alphanumeric) ; match 1 or more [0-9a-zA-Z]
(or #\' #\`) ; - apostrophe or backtick
(* alphanumeric)) ; - 0 or more [0-9a-zA-Z]
(* whitespace)))) ; finally, 0 or more spaces end
(define (main prog+args)
(display
(awk (read-line) (line) ((words 0))
(#t (+ words
(length
(regexp-fold-right wc-rx (lambda (m i lis)
(cons (match:substring m 0) lis))
'() line))))))
(newline))
* Checking `rx' sources
Finally, grepping the rx package's sources provide this output, which

grep "ascii->char" *

parse.scm: ((0) (values (cons (ascii->char from) loose)
parse.scm: ((1) (values `(,(ascii->char from)
parse.scm: ,(ascii->char to)
parse.scm: ((2) (values `(,(ascii->char from)
parse.scm: ,(ascii->char (+ from 1))
parse.scm: ,(ascii->char to)
parse.scm: `((,(ascii->char from) .
parse.scm: ,(ascii->char to))
parse.scm: (if (char-set-contains? cset (ascii->char i))
posixstr.scm:(define *nul* (ascii->char 0))
posixstr.scm: ((> start end) (values (list (ascii->char c)) '())) ;
Empty range
posixstr.scm: (values (list (ascii->char c) (ascii->char start))
posixstr.scm: (values (list (ascii->char c) (ascii->char start)
(ascii->char end))
posixstr.scm: (else (values (list (ascii->char c))
posixstr.scm: (list (cons (ascii->char start) (ascii->char end)))))))
rx-lib.scm: (cs cset (char-set-adjoin! cs (ascii->char i))))
scsh-read.scm:; Ascii stuff: char->ascii, ascii->char,
ascii-whitespaces, ascii-limit
scsh-read.scm:(define bel (ascii->char 7))
scsh-read.scm:(define bs (ascii->char 8))
scsh-read.scm:(define ff (ascii->char 12))
scsh-read.scm:(define cr (ascii->char 13))
scsh-read.scm:(define ht (ascii->char 9))
scsh-read.scm:(define vt (ascii->char 11))
scsh-read.scm: (ascii->char (+ (* 64 d1)
(+ (* 8 d2) d3)))))
scsh-read.scm: (ascii->char (+ (* 16 d1)
d2))))
scsh-read.scm: (string-set! p-c-v i (p-c (ascii->char i)))))
spencer.scm: (cset cset (char-set-adjoin! cset (ascii->char j))))
My layman's reading of the above is that I should not do any
regexp-matching on files which contain non-ASCII characters with 0.7. Is
that correct? Although it's not clear why 0.6 works.
Any help from a kindly Scsh wizard would be much appreciated.
Best,
Rich
- wc.scm

Roderic Morris

2012-11-05 14:22:19 UTC

Permalink

nvm. Got one. should be fixed now.

-Roderic

Post by Roderic Morris
Should be easy to fix. Could you send a small example it breaks on?
-Roderic

grep "ascii->char" *

parse.scm: ((0) (values (cons (ascii->char from) loose)
parse.scm: ((1) (values `(,(ascii->char from)
parse.scm: ,(ascii->char to)
parse.scm: ((2) (values `(,(ascii->char from)
parse.scm: ,(ascii->char (+ from 1))
parse.scm: ,(ascii->char to)
parse.scm: `((,(ascii->char from) .
parse.scm: ,(ascii->char to))
parse.scm: (if (char-set-contains? cset (ascii->char i))
posixstr.scm:(define *nul* (ascii->char 0))
posixstr.scm: ((> start end) (values (list (ascii->char c)) '())) ;
Empty range
posixstr.scm: (values (list (ascii->char c) (ascii->char start))
posixstr.scm: (values (list (ascii->char c) (ascii->char start)
(ascii->char end))
posixstr.scm: (else (values (list (ascii->char c))
posixstr.scm: (list (cons (ascii->char start) (ascii->char end)))))))
rx-lib.scm: (cs cset (char-set-adjoin! cs (ascii->char i))))
scsh-read.scm:; Ascii stuff: char->ascii, ascii->char,
ascii-whitespaces, ascii-limit
scsh-read.scm:(define bel (ascii->char 7))
scsh-read.scm:(define bs (ascii->char 8))
scsh-read.scm:(define ff (ascii->char 12))
scsh-read.scm:(define cr (ascii->char 13))
scsh-read.scm:(define ht (ascii->char 9))
scsh-read.scm:(define vt (ascii->char 11))
scsh-read.scm: (ascii->char (+ (* 64 d1)
(+ (* 8 d2) d3)))))
scsh-read.scm: (ascii->char (+ (* 16 d1)
d2))))
scsh-read.scm: (string-set! p-c-v i (p-c (ascii->char i)))))
spencer.scm: (cset cset (char-set-adjoin! cset (ascii->char j))))
My layman's reading of the above is that I should not do any
regexp-matching on files which contain non-ASCII characters with 0.7. Is
that correct? Although it's not clear why 0.6 works.
Any help from a kindly Scsh wizard would be much appreciated.
Best,
Rich
- wc.scm

Richard Loveland

2012-11-06 21:12:29 UTC

Permalink

Awesome, thanks!

Post by Roderic Morris
nvm. Got one. should be fixed now.
-Roderic

Post by Roderic Morris
Should be easy to fix. Could you send a small example it breaks on?
-Roderic

Post by Richard Loveland
Greetings scsh cognoscenti,
I've got a script throwing errors under Scsh 0.7 that runs under 0.6.
It's a word counting script that uses the `rx' package (code is

below).

Post by Richard Loveland
* Shell output
Here are the outputs from issuing the script command at the shell. The
first is for 0.7, the second for 0.6. The file is a plaintext Emacs

org

Post by Richard Loveland
file.
wc-0.7.scm < jelec-02.org (http://jelec-02.org)
assertion-violation [ascii->char] with no handler in place: not an

ASCII

Post by Richard Loveland
code195
stack template id's: 3016 <- 1720 <- 3015 <- 914 <- <- <- <- <- <-
<- 1720 <- <- 1720 <- 6624 <- 3309 <-
wc-0.6.scm < jelec-02.org (http://jelec-02.org)
25337
* Source of the script
...is as follows (also attached in case it gets garbled -- need to set
#!/usr/local/bin/scsh \
-e main -s
!#
(define wc-rx (rx (: ; begin a matching sequence
(* whitespace) ; beginning with zero+ spaces,
(+ alphanumeric) ; match 1 or more [0-9a-zA-Z]
(or #\' #\`) ; - apostrophe or backtick
(* alphanumeric)) ; - 0 or more [0-9a-zA-Z]
(* whitespace)))) ; finally, 0 or more spaces end
(define (main prog+args)
(display
(awk (read-line) (line) ((words 0))
(#t (+ words
(length
(regexp-fold-right wc-rx (lambda (m i lis)
(cons (match:substring m 0) lis))
'() line))))))
(newline))
* Checking `rx' sources
Finally, grepping the rx package's sources provide this output, which

grep "ascii->char" *

parse.scm: ((0) (values (cons (ascii->char from) loose)
parse.scm: ((1) (values `(,(ascii->char from)
parse.scm: ,(ascii->char to)
parse.scm: ((2) (values `(,(ascii->char from)
parse.scm: ,(ascii->char (+ from 1))
parse.scm: ,(ascii->char to)
parse.scm: `((,(ascii->char from) .
parse.scm: ,(ascii->char to))
parse.scm: (if (char-set-contains? cset (ascii->char i))
posixstr.scm:(define *nul* (ascii->char 0))
posixstr.scm: ((> start end) (values (list (ascii->char c)) '())) ;
Empty range
posixstr.scm: (values (list (ascii->char c) (ascii->char start))
posixstr.scm: (values (list (ascii->char c) (ascii->char start)
(ascii->char end))
posixstr.scm: (else (values (list (ascii->char c))
posixstr.scm: (list (cons (ascii->char start) (ascii->char end)))))))
rx-lib.scm: (cs cset (char-set-adjoin! cs (ascii->char i))))
scsh-read.scm:; Ascii stuff: char->ascii, ascii->char,
ascii-whitespaces, ascii-limit
scsh-read.scm:(define bel (ascii->char 7))
scsh-read.scm:(define bs (ascii->char 8))
scsh-read.scm:(define ff (ascii->char 12))
scsh-read.scm:(define cr (ascii->char 13))
scsh-read.scm:(define ht (ascii->char 9))
scsh-read.scm:(define vt (ascii->char 11))
scsh-read.scm: (ascii->char (+ (* 64 d1)
(+ (* 8 d2) d3)))))
scsh-read.scm: (ascii->char (+ (* 16 d1)
d2))))
scsh-read.scm: (string-set! p-c-v i (p-c (ascii->char i)))))
spencer.scm: (cset cset (char-set-adjoin! cset (ascii->char j))))
My layman's reading of the above is that I should not do any
regexp-matching on files which contain non-ASCII characters with 0.7.

Post by Richard Loveland
that correct? Although it's not clear why 0.6 works.
Any help from a kindly Scsh wizard would be much appreciated.
Best,
Rich
- wc.scm

--
Using Opera's revolutionary email client: http://www.opera.com/mail/