We can move the pcibios code to its own module
and just provide a stub pcibiosinit() function
for pc64 so we do not have to pull in that code
and the data structures.
also lets us clean up mkdevlist hacks.
we should properly zero-pad the ethernet frames we send out
instead of sending random garbage.
for ppp -> ethernet, the ethernet header is fixed, so we
can generate it once before the read()/write() loop.
Use a 1514 byte buffer for all ethernet frames as the
maximum, and limit the mtu to 1514-14-8 -> 1492.
For xhci, we want to keep the hubs around until all its
attached devices have been released, so have the Udev
take a reference to its parent hubs ep0.
This also means that we can now use just a pointer
to the tthub instead of duplicating the properties
needed for xhci and ythe code becomes trivial.
Do a non-recursive implementation of putep() to
conserve stack space.
For device detaches, we want to immediately cancel
all I/O so that the driver can release the device
as soon as possible.
For this, we add epstop callback in the Hci struct.
Implement prefix delegation by requesting a
prefix and populate ipnet=val entry (val
passed from -i option).
Before, DHCPv6 was just implemented for stateless
one-shot operation, just exiting once we got out
IA address.
Moodies mediacom-enterprise-enterprise-ISP...
... they actually do enterprise-grade dyanmic dhcpv6
so here we are, implementing renewals...
The IA options where not parsed properly, assuming option 5
is the first option.
For managed networks, we might not get any prefix info
options, but dhcpv6 needs a gateway, so use source address
of the RA.
The Routehint is embedded into the Translation struct
at an offset, so setting the Translation *q pointer to
nil results in a non-nil Routehint* pointer passed to
ipoput4() generating a crash.
When a low/full speed device is connected to a USB2.0 hub,
the USB2.0 hub needs to be send special split transaction
protocol messages to communicate with the device below.
This also applies if the fullspeed/lowspeed device is not
directly connected to the USB2.0 hub, but has a fullspeed
hub in between like:
rootport -> usb2.0 hub -> usb1.1 hub -> fs/ls device
In this case, the tranaction translator is actually
the first hub, nut the direct parent of the device.
This was all totally wrong in the hci drivers drivers.
Also, with the new interface, usbd passes number of
ports, and TT properties in the "hub" ctl message,
so the port-count, TT Think-Time and Multi-TT properties
can be properly applied by the xhci driver.
Another bug was that the xhci route string was not
correct if a hub has more than 15 ports. A USB2.0
hub can have more than 15 ports and the standard sais
that in this case a value of 15 should be used in
its bit routestring nibble.
Also, check the hub depth. We should not exceed more
than 5 hubs.
For xhci, it turns out the hub parameters where
actually never properly applied, as the spec sais
only the first "create endpoint" command applies
the hub parameters. The "evaluate context" command
does not work.
Some pikeshedding in devusb:
- fix the freaking locking.
- remove redundant parameters (isroot -> depth < 0, ishub -> nports > 0)
- add TT properties to usb device struct
With these changes, the wired "middle port" issues
on mnt-reform xhci are gone.
In an upcoming commit, the interface for how to create hubs
and how to update endpoint parameters is going to change.
Device/endpoint properties should not be modified while
the data file is open (device is being used).
This also applies to control endpoint when changing
packet size.
The motivation here is to clean up the xhci driver
and not do these stupid hacks like parsing control
messages. It is easier to just have the hci drivers
apply everything at open time and being guaranteed
that properties do not change under them.
For this we need to make sure to only do these devctl's
while the data file is not open.
For hubs, the command changes and some parameters.
Primarily the number of ports (required for xhci) which
will let devusb do some error checking and the
USB2.0 -> USB1.1 transaction translator properties.
In usb/lib, the "isroot" property is redundant and is
replaced by depth < 0.
For usb3.0 hub descriptor, the led indicator fields are
different from usb2.0 descriptors.
Rest is pikeshedding.
This is an error probably when converting from libthread
to classid plan9 procs. umsrequest() used to just sysfatal()
once the error counter reached some value. But this leaves
9p procs (created by srvrelease()) around keeping the
device hanging around.
Instead, reply first, then attempt some recovery.
If that fails, kill our notegroup.
Also, for upcoming devusb changes, make sure we do
devctl() while the endpoint is not in use.
Instead of having the driver allocate the temporary
READSTR buffer (and messing up the error handling),
allocate it in devether (netif) and pass the driver
start and end pointers to it.
Also, systematically check that the ifstat()
function checks the zero-length read (meaning
it is supposed to just update statistics counters
for a stats file read).
Drivers where allocating a READSTR size buffer,
then readstr() it. But readstr() can raise an
error on pagefault, resulting in the buffer to
be leaked.
Instead, we change the interface and allocate
the buffer in devuart read handler, passing
the driver start and end pointers into it.
Also, provide a default implementation (when
status == nil), avoiding some duplication.
A user can create a large demand paged segment
and then do write to a ctl file with a very large buffer
driving the kernel into an out-of-memory condition.
For all practcal purposes, limit the input buffer size
to something reasonable. READSTR is 8000 bytes, which
would be enougth for even the largest ctl messages.
We have to ensure that we do the putep() loop
only once for detach, so serialize the state
transition using ep0 qlock().
Furthermore, once the state is Ddetach, we
must ensure never to set it to something else
(such as Dreset or Denabled).
usbid's where globally allocated with a generation counter,
but it would not free usbid's when freed out of order
resulting in overflow.
instead, we use a different scheme, where we allocate the
next higher id until we run out and then allocate the next
lowest id.
properly maintain epmax as well when putep() when out of
order.
make newdev() and newdevep() return the new Ep* with a
reference taken, preventing someone from freeing the ep
under us.
fix the locking, so once we release the epslock, all endpoints
have the ep->dev set properly and remove impossible checks.
remove the annoying "dump" ctl that spams the console.
The test just called date twice assuming they all
execute in the same second. This causes false positives
with the following errors (usually just 1 second
difference):
term% while(){./zones.rc}
/adm/timezone/US_Arizona Sun, 06 Oct 2024 09:09:12 -0700 1728230953 1728230952 are not equal
/adm/timezone/Uruguay Sun, 06 Oct 2024 14:09:17 -0200 1728230958 1728230957 are not equal
/adm/timezone/Japan Mon, 07 Oct 2024 01:09:19 +0900 1728230960 1728230959 are not equal
/adm/timezone/Iran Sun, 06 Oct 2024 19:39:25 +0330 1728230966 1728230965 are not equal
/adm/timezone/Australia_West Mon, 07 Oct 2024 00:09:27 +0800 1728230968 1728230967 are not equal
/adm/timezone/US_Eastern Sun, 06 Oct 2024 12:09:29 -0400 1728230970 1728230969 are not equal
/adm/timezone/GMT Sun, 06 Oct 2024 16:09:31 +0000 1728230972 1728230971 are not equal
/adm/timezone/local Sun, 06 Oct 2024 18:09:34 +0200 1728230975 1728230974 are not equal
/adm/timezone/Mexico_BajaSur Sun, 06 Oct 2024 09:09:36 -0700 1728230977 1728230976 are not equal
The fix is to get the current time once, with date -n
and then pass that to date to format the time and
then concert back and compare.
remove the global statistics counters from taslock.c
as they'r not particularily usefull nor precise
and just cause unneccessary cache traffic.
if we want them back, we should place them into
the Mach structure.
also change the lock() function prototype to return void.
We cannot use lock() from screenputs() because lock calls
lockloop(), which would try to print() which on very slow
output (such as qemu) can cause kernel stack overflow.
It got triggered by noam with his rube-goldberg qemu setup:
lock 0xffffffff8058bbe0 loop key 0xdeaddead pc 0xffffffff80111114 held by pc 0xffffffff80111114 proc 339
panic: kenter: -40 stack bytes left, up 0xffffffff80bdfd00 ureg 0xffffffff80bddcd8 at pc 0xffffffff80231597
dumpstack
ktrace /kernel/path 0xffffffff80117679 0xffffffff80bddae0 <<EOF
We might want move this locking logic outside of screenputs()
in the future. It is very similar to what iprint() does.
git/save gets a list of paths (added or removed)
passed to it, and we have to ALWAYS stat the
file in the working directory to determine the
effective file-type.
There was a bug in the "skip children paths"
loop that would compare the next path element
instead of the full path prefix including
the next element.
reproducer:
git/init
touch a
git/add a
git/commit -m 'add a' a
rm a
mkdir a
touch a/b
git/add a/b
git/commit -m 'switch to folder' a a/b
For handling route invalidations, we have to allow
short bursts of traffic. Therefore we keep track
of the number of ra's received in the ra interal
and only start dropping packets when reaching 100
packets.
No idea who committed this in 2022 as its "glenda@9front.local",
but as qid.vers is incremented for each write and we definitely
should not use it as the cache tag.
Also, the initial code was stolen from du.c as the comment says,
and that one does the right thing.
We want to run test before we do the installation
into the system.
So do a temporary install into test/$cputype.git/
direcotry and bind it on /bin/git, that way,
all the scripts run the local source version.
When skipping objects, we need to process the full queue,
because some of the objects in the queue may have already
been painted with keep. This can cost a small amount of time,
but should not need to advance the frontier by more than
one object, so the additional time should be proportional
to the spread of the graph.
the previous bug wasn't a missing clamp, but a
mishandling of the 1-based closed intervals that
we were genrating internally, and some asserts
that assumed open intervals.
Before we would refuse to recurse, but would still give
a response with hints back. Some nefarious clients will interpret the
lack of a Refused response code as us being an open resolver.
When clunking a Fid while the file-system is read
only, dont just free the Amsg, but also drop the
references to dent and mnt.
Make clunkfid() nil fid->rclose, so no reuse
after free is possible.
Make clunkfid() always set the return pointer,
avoid missing prior initialization.
Do not abuse fidtab lock for serializing
clunking.
The clunk should serialize on Fid.Lock
instead, so add a canlock check here.
The lock order is strictly:
Fid.Lock > Conn.fidtab[x].Lock
The AuthRpc was attached to the Fid, but this doesnt
work as it does not handle dupfid() properly.
Instead attach the AuthRpc to the directory entry,
which is refcounted.
The dupfid() function retuns the new Fid* struct with
an extra reference. If we don't use it, we have to
putfid() it.
Use ainc()/adec() consistently and dont mix it with
agetl().
Before, we would just leak all the "Conn"
structures.
fshangup() could cause problems as it just
forcefully closes the file descriptors,
not considering someone else going to
write them afterwards.
Instead, we add a "hangup" flag to Conn,
which readers and writers check before
attempting i/o.
And we only close the file-descriptors
when the last reader/writer drops the
connection. (Make it ref-counted).
For faster teardown, also preserve the
"ctl" file descriptor from listen()
amd use it to kill the connection quickly
when fshangup() is called.
When at the task of decomposing a rune into its
"button" and "rune", also consider the keyboard
map table with the escaped scancodes.
This fixes Shift + left/right combinations in
drawterm.
For each connection, remember if authentication
protocol ran successfully and only then, allow
attach as 'none' user.
This prevents anonymous remote mounts of none.
The 'none' user also shouldnt attach to the dump
file system.
The Tauth for "none" should always fail,
but Tattach should only succeed when
the channel ran a successfull authentication
before.
Also, prevent "none" from attaching "dump".
We want to implement "none" attaches for hjfs,
but only if the 9p-"channel" ran a successfull
authentication before, to prevent anonymous
remote mounts as "none".
Add a flag to the Srv struct for this.
Before this it was possible to Tauth and Tattach with one
user name and then authenticate with factotum using a different
user name. To fix this we now ensure that the uname matches the returned
cuid from AuthInfo.
This security bug is still pending a cute mascot and theme song.
when appending to a directory, the copy of the offset in the dent was
set to the offset that the write was supposed to happen at -- however,
for DMAPPEND files the offset is always at the end of the file. Until
the file was closed, stat would show the wrong directory info.
Implement a hangup ctl command that flushes the
queues, but keeps the filter around.
This can be usefull for low-overhead traffic blocking,
as only the file-descriptor needs to be kept around
and the queues can be flushed.
No user-space process is needed to consume packets
and no buffers are wasted.
example:
aux/dial -e -o hangup 'ipmux!ver=4;src=8.8.8.8' rc -c 'echo 0 > /srv/blocked'
rm /srv/blocked
It seems some protocols are unprepared to
deal with ipoput*() raising an error
(thrown from ifc->m->bwrite()).
so catch it and return -1 (no route) instead.
We were accidentally searching the key for '&', instead of the value.
Inferno received this exact fix at some point, but it never made it back to Plan 9.
the installed version of git has a bug; removing
this file will trigger some spurious removals of
test files, so hold off deleting it until people
have time to install a fixed git
directories need to sort as though they end with a '/',
when running through them for comparison, otherwise we
flag files as added and removed spuriously, leading to
them incorrectly getting deleted when merging commits.
Instead of Proc { Mach *mp; Mach *wired; },
track affinity by an integer representing
the mach number instead.
This simplifies the code as it avoids needing
to compare with MACHP(m->machno).
Wiering a process to a processor is now done
by just assigning affinity and then set a flag
that it should not change.
Call procpriority() when we want to change
priority of a process instead of managing
the fields directly.
The idea is:
When we call sched() with interrupts disabled,
it must not return with them re-enabled.
The code becomes easier to reason about if
we make sched() preserve interrupt status,
which lets us call sched() from an interrupt
handler (with interrupts disabled) without
risking preemption by another interrupt once
sched() returns which can pump-up the stack.
This allows removing Proc.preempted flag as
it is now impossible for interrupts to
preempt each other in preempted().
Extra cleanups:
make interrupted() _Noreturn void
and remove unused Proc.yield flag.
the symptom is that ping is apparently skipping
transmits which recover with the next send,
resulting in exactly send-period spikes in
the ping rtt.
It appears that the core seems to reorder writes
to uncached memory, which can result in the doorbell
being written before the descriptor status bits
are written.
put a coherence() barrier before writing doorbell
fixes it.
thanks sigrid for reporting the issue!
On a multiprocessor, the scheduler can run into
a very unfair distribution of processes to cpus
when there are more long-running processes than cpus.
Say we have a 4 cpu machine and we run 4 long-running
proesses, each cpu will pick up a single process
and each process will get 100% of its fair share.
Everything is good so far.
If we start more long-running processes, all these
processes are going to be picked up by the cpu core
that runs most sporanic / bursty work loads as it
is calling sched() more often.
This results in all the extra long-running prcoesses
to cluster around the same core resulting in very
unfair sharing of load.
The problem is that once a process runs on a cpu,
it stays on that cpu as processor affinity
is never reset.
Process migration only happens when a cpu cannot
find any process to run, given the affinity
constrains, but this can never happen when
the system is under full load and each cpu
always has a low-priority long running
process to run.
How do we fix this?
The idea of this hack is to reset processor
affinity in ready() when its priority changes or,
when it appears to be a long-running process.
That way, we give every cpu a chance to pick
it up and share the load.
This is not a ideal solution of course. Long term,
we should probably have separate runqueues per cpu
and do the balancing explicitely.
In the sched() function, the call to reprioritize()
must be done before we set up, as reprioritize()
calls updatecpu(), which determines if the process
was running or not based on p == up. So move
the call to runproc() itself.
while rehashing the same files over and over will work
just fine, it can be slow with a large number of large
files; this makes 'git/comit .' perform much better in
repos with a large number of large binary blobs.
compress when the log doubles in size, rather than
using a fixed size heuristic; this means that we
don't start compressing frequently as the log gets
big and the file system gets fragmented.
this happens in libframe:
/sys/src/libframe/frutil.c:80: x -= (x-f->r.min.x)%f->maxtab;
but there's no way to control when the user changes the
maxtab value, so it's samterm's responsibility to
sanitize it.
we were copying the owner and group of the parent dir into the fid
on create, but we forgot the mode; fix that, so that we don't check
perms against the wrong dir.