List mode was constrained to the BMP. This change introduces
the following new list mode convention, using Go string literal syntax:
Non-printing ASCII characters display as \xhh.
Non-ASCII characters in the BMP display as \uhhhh.
Characters beyond the BMP display as \Uhhhhhhhh.
Runes in Plan 9 were limited to the 16-bit BMP when I drew up
the RPC protocol between graphical programs and devdraw
a long time ago. Now that they can be 32-bit, use a 32-bit wire
encoding too. A new message number to avoid problems with
other clients (like 9fans.net/go).
Add keyboard shortcut alt : , for U+1F602, face with tears of joy,
to test that it all works.
ASAN can't deal with the coroutine stacks.
In theory we can call into ASAN runtime to let it know about them,
but ASAN still has problems with fork or exit happening from a
non-system stack. Bypass all possible problems by just having
a full OS thread for each libthread thread. The threads are still
cooperatively scheduled within a proc (in thos mode, a group of OS threads).
Setting the environment variable LIBTHREAD=pthreadperthread
will enable the pthreadperthread mode, as will building with
CC9FLAGS='-fsanitize=address' in $PLAN9/config.
This solution is much more general than ASAN - for example if
you are trying to find all the thread stacks in a reproducible crash
you can use pthreadperthread mode with any debugger that
knows only about OS threads.
If mk gets into a bad state, it's not obvious that you can
remove the binary to force the rebuild. Also, not rebuilding
means that bugs in mkmk.sh are not noticed.
Just rebuild from scratch every time. It doesn't take too long
compared to the rest of INSTALL.
Also, if CC9FLAGS includes -fsanitize=address (ASAN),
predefine PLAN9PORT_ASAN for use by programs that need
to know (mainly libthread).
The 9c script used to have a variable called ngflags, which
was ccflags except -g (ng stood for "no g"), but nothing needs
it split out anymore, so simplify to just ccflags.
getdirentries(2) has been deprecated on macOS since 10.5 (ten releases ago).
Using it requires disabling 64-bit inodes, but that in turn makes binaries
incompatible with some dynamic libraries, most notably ASAN.
At some point getdirentries(2) will actually be removed.
For both these reasons, switch to opendir/readdir.
A little clunky since we have to keep the DIR* hidden away
to preserve the int fd interfaces, but it lets us remove a bunch
of OS-specific code too.
This fixes at least one shell script (printfont) that expected
'x'`{y}'z'
to mean
'x'^`{y}^'z'
as it now does. Before it meant:
'x'^`{y} 'z'
One surprise is that adjacent lists get a free carat:
(x y z)(1 2 3)
is
(x1 y2 z3)
This doesn't affect any rc script in Plan 9 or plan9port.
The old yacc-based parser is available with the -Y flag,
which will probably be removed at some point.
The new -D flag dumps a parse tree of the input,
without executing it. This allows comparing the output
of rc -D and rc -DY on different scripts to see that the
two parsers behave the same.
The rc paper ends by saying:
It is remarkable that in the four most recent editions of the UNIX
system programmer’s manual the Bourne shell grammar described in the
manual page does not admit the command who|wc. This is surely an
oversight, but it suggests something darker: nobody really knows what
the Bourne shell’s grammar is. Even examination of the source code is
little help. The parser is implemented by recursive descent, but the
routines corresponding to the syntactic categories all have a flag
argument that subtly changes their operation depending on the context.
Rc’s parser is implemented using yacc, so I can say precisely what the
grammar is.
The new recursive descent parser here has no such flags.
It is a straightforward translation of the yacc.
The new parser will make it easier to handle free carats
in more generality as well as potentially allow the use of
unquoted = as a word character.
Going through this exercise has highlighted a few
dark corners here as well. For example, I was surprised to
find that
x >f | y
>f x | y
are different commands (the latter redirects y's output).
It is similarly surprising that
a=b x | y
sets a during the execution of y.
It is also a bit counter-intuitive
x | y | z
x | if(c) y | z
are not both 3-phase pipelines.
These are certainly not things we should change, but they
are not entirely obvious from the man page description,
undercutting the quoted claim a bit.
On the other hand, who | wc is clearly accepted by the grammar
in the manual page, and the new parser still handles that test case.
Version 10 of gcc enforces -fno-common which breaks a lot of things.
This fix reverts to the pre-10 behaviour. The real fix is to clean up
stray redefinitions which should be declarations.