ppp causing Fatal data abort in ng_snd_item on Raspberry Pi 3

I am trying to migrate from my FreeBSD 10.1-RELEASE #0 r274401 amd64 system to a FreeBSD 12.0-CURRENT #0 r313109M arm64 (Raspberry Pi 3) system. So much has gone so well, but now I've reached a real show-stopper. My link to the Internet is PPPoE. I have migrated my ppp.conf file to the RP3, but when I run "/usr/sbin/ppp -ddial CenturyLink" like I do on my amd64 system, I get "Fatal data abort" (register dump) "Stopped at ng_snd_item+0x31c: ldrb w10, [x8]". x8 contains - you guessed it - 0x0. I guess this is similar to what we used to call a NULL pointer reference.

Any suggestions of where to go from here?
 
Tier2 platform = not supported for production use. Only meant for developers and adventurous people. Normal users should wait until a platform becomes Tier 1 (this could take years), or they should look at other options (like learning how to become a developer).
 
Certainly agree. Exactly! Learning to become a developer is absolutely an avenue I would like to pursue, and would consider that an appropriate answer within the solution space. I have recently moved the only thing that could be called "production" off of my home server, so it is more of a playground again. Started this out as an experiment - and then got close enough to success that I got hopeful. You could call me a hobbyist and former FreeBSD contributor. So - yes - learning how to become a developer is something I would like to explore, and if you can point me to the bootstrap for that, I'd really appreciate it, and is just the sort of help I'm looking for. I'm just afraid that a lot of the FreeBSD internals and processes that I knew from FreeBSD 0.9 are irrelevant, and the learning curve may be significant, but I'm definitely willing to use my amd64 system as a cross-development platform for the ARM64 - I just need to find an entry point. I have written device drivers and device firmware for a living - and contributed kernel code for 0.9 - so I'm not entirely unfamiliar with what makes things tick.
 
Just in case anyone else is on the same journey as me, I'll leave some breadcrumbs. I first ran into trouble with my amd64 system because I am running FreeBSD 10.1, and pkg broke itself, and I needed "svn" to get the current system source distribution. I got around that by using /usr/bin/pkg-static - so I was able to get svn installed and check out the release. I got the idea to try pkg-static from https://www.digitalocean.com/community/tutorials/how-to-upgrade-freebsd-from-version-10-2-to-10-3. The article on cross-compiling which has been so helpful was, perhaps predictably, https://wiki.freebsd.org/FreeBSD/arm/crossbuild and I currently have a cross-build buildworld running for the arm64 on my amd64, and a native build running on the Raspberry PI 3. I am using -j3 on the amd64 2-core, and a -j4 on the arm64 4-core. We'll see how they go. The amd64 system has spinning disk, and the Raspberry is doing the build on a flash drive. If nothing else, it's a lot of fun. We'll see if I actually get the point where I can contribute something. I hope so.
 
Couple of false starts - I took the script about cross-compiling and forgot to change the TARGET_ARCH from arm6 to aarch64. It probably successfully built the arm6 32-bit architecture, but that wasn't what I needed, so I started over with the cross-compile. That SEEMS to have worked. I copied the "date" program over from the amd64 cross-development to the Raspberry, and it runs. I haven't crossed the bridge of creating the bootable image yet, but it seems like I may be on the right track for the cross-development. On the native side, definitely not as successful. It was exciting to see it max out all four CPU threads with "-j4" on the buildworld, but it kept failing at crazy spots with "sh" errors
Code:
Apr 23 14:42:41 tobias kernel: pid 41294 (sh), uid 1002: exited on signal 11 (core dumped)
Apr 23 15:16:33 tobias Àáÿÿÿÿ: stack overflow detected; terminated
Apr 23 15:16:48 tobias kernel: pid 49694 (sh), uid 1002: exited on signal 6 (core dumped)
Apr 23 20:44:40 tobias sh: stack overflow detected; terminated
Apr 23 20:45:06 tobias kernel: pid 64540 (sh), uid 1002: exited on signal 6 (core dumped)
./lib/libc/tests/stdlib/sh.core
./sbin/restore/sh.core
(note that while it failed on different files in the stdlib directory, it failed there twice, so that's why there is only two core files)
I was not too surprised that the native build did not work since I think this is uncharted territory, but I was VERY surprised to find segmentation faults and stack overflows from sh! That doesn't really make sense to me. I'm rerunning the native build without the "-j4" and I swear, it is going going less than 1/4th as fast. I really thought that each of the CPU threads on the arm was about 1/2 the speed of each of the cores on the AMD64, but that does not seem to be the case. The buildworld on the amd64 took three hours flat, and the buildworld on the arm is entering the 18th hour and still going. I am very concerned about what is going on on the arm. The one stack overflow message shows a corrupted program name. The fact that sh is getting memory faults on anything is scary. I'm running the native one again just to see if it will eventually successfully complete. I'll be very intrigued if it actually succeeds in building the things it failed at when it was running fully parallel. In that case, I'm going to wonder if my little raspberry is overheating, or if this is some strange artifact of running out of memory, or what. It claims to have ~150MB active and ~300MB free right now. I stopped mysql server and apache just to free up as many resources as possible. It doesn't seem to have recompiled the programs/files it was working on when it failed - but I don't understand the buildworld all that well yet, so this could be a lack of understanding on my part. I'm just getting back into this after decades of absence. It may even be that it is reporting the wrong program - this is a Tier 2 environment, so the territory is somewhat uncharted. I'm thrilled as much works as does, really.

The script which did the apparently successful buildworld and buildkernel cross-build was:
Code:
#!/bin/sh

export BASEDIR=$PWD
export MAKEOBJDIRPREFIX=$BASEDIR/obj
cd $BASEDIR/src
make -j3 buildworld TARGET_ARCH=aarch64 UBLDR_LOADADDR=0x00200000 && \
   make -j3 buildkernel TARGET_ARCH=aarch64 KERNCONF=GENERIC
I am not going to claim that this is "perfect". My "obj" tree has a crazy number of levels to it which doesn't seem optimal, but maybe there's a good reason for it I haven't figured out yet. Basically, it is $BASEDIR/obj/arm64.aarch64/$BASEDIR/src and I have no idea while $BASEDIR is represented twice in that path. I don't know that that is the expansion that created it - I'm just using that construct to represent the parts of the path that are duplicated, which by no small coincidence, is the BASEDIR. I don't see a reason for it, but as I said, I'm still learning.

If anyone else is going on this journey with me, or if there's a better forum for me to make these posts, just let me know. Anyway, we'll see what happens with the native build.
 
Last edited by a moderator:
Well, that was spectacular. First off, the build filled up the 4GB flash drive I was using as development space. I dug up an 8GB flash drive and copied everything to that. Then I did "swapon" for the 4GB so that I could run all four cores. It was using the swap, but only to the tune of about 2.7MB when it totally panicked. The keyboard doesn't work in the kernel debugger (this is unfortunate) so I couldn't even see where the stack trace started. Not only that, but the filesystem on the flash drive was so badly corrupted that fsck didn't recognize it. This was also strange - mount knew it was a filesystem, and knew where it had last been mounted, but fsck didn't recognize it as a filesystem. So, I did a newfs, I am copying all of src from the cross-development amd64 platform, and we'll try this again. It might be a blessing in disguise. I had so many false starts on the last image, maybe being forced to start with a clean copy is actually good. We'll see if it survives another attempt. Who knows - I may eventually get to the point where I can work on my pppoe problem.
 
As a safety/sanity/consistency check, I dismounted the flash drive after copying src, and fsck still didn't recognize it. If I use fsck_ffs directly, however, it does recognize and process it. I think there is the distinct possibility that the fsck "wrapper" program needs to be fixed. If I ever get that far, I'll look into it. So - be aware - fsck may be broken.
 
Good news / bad news. The good news is that the fsck I just built on the cross-development platform works. The bad news is that the native build failed with a coredump on sh again:
Code:
Apr 25 17:50:36 tobias kernel: pid 2712 (sh), uid 1002: exited on signal 6 (core dumped)
So - I think I'm going to give up on the native build for now and see if I can actually get a bootable image from my cross-development platform. I still want to be able to build FreeBSD natively, but just being able to get caught up and have more things working is progress.
Here's the output of "file" on the working and non-working fsck programs, if you're interested (the first one is the broken one):
Code:
/sbin/fsck: ELF 64-bit LSB executable, ARM aarch64, version 1 (SYSV), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 12.0 (1200020), FreeBSD-style, stripped
./fsck:     ELF 64-bit LSB executable, ARM aarch64, version 1 (FreeBSD), dynamically linked, interpreter /libexec/ld-elf.so.1, for FreeBSD 12.0 (1200029), FreeBSD-style, stripped
The other weird thing is that they are dramatically different SIZES, even though both claimed to be dynamically linked and stripped
Code:
ls -l /sbin/fsck fsck
-r-xr-xr-x  1 root  wheel   20120 Feb  2 17:19 /sbin/fsck
-r-xr-xr-x  1 john  lind   200128 Apr 24 16:23 fsck
 
Oh jolly. I thought crochet was just to build the image. Now I see that it actually expects to do the buildworld and buildkernel as well. It is not obvious how to get it to use the target system I've already created... There's got to be a better forum for me to be participating in - but I can't find it.
 
OK. By doing a bunch of "mv" work I have managed to put the source and, maybe, the obj tree where crochet wants them. Crochet seems to need its reference libraries and files in self-relative folders, so you have to make your directory structure under it. I created my own crochet config file as follows (look at config.sh.sample for the definitions):
board_setup RaspberryPi3
FREEBSD_SRC=/home/john/freebsd/crochet/tobias/src
WORKDIR=/home/john/freebsd/crochet/tobias/work
IMG=${WORKDIR}/FreeBSD-${KERNCONF}.img
WORLDJOBS="3"
KERNJOBS="3
"
The last settings are for parallelism. They will default (sysctl -n hw.ncpu), but I used the recommendation I found in the cross-development wiki to use 1.5 times that number if you are using traditional spinning disk, and that seems to work well. I let the RaspberryPi3 board settings take care of everything else.

So - my directory tree that correcponds to that looks like this - starting with my "project" folder
/home/john/freebsd/
crochet/ -- this is where the crochet.sh master script lives and seems to expect to have its reference files in subdirectories
board/
... (crochet stuff)
lib/
... (crochet stuff)
tobias/
src/ -- The FreeBSD source tree from FreeBSD 12.0
work/ -- this is the spot I moved the work directory to with the config directives
obj/ -- my original obj directory, which I moved into "work" for crochet compatiblity (used to be a peer of "src")
arm64.aarch64/

and this is where my hope to leverage my existing "obj" tree fell apart. Because I moved "tobias" into "crochet" - "crochet" is now part of the build path... it's that whole deal where the process repeats parts of the tree ..

So I have paths that look like:
/home/john/freebsd/crochet/tobias/work/obj/arm64.aarch64/home/john/freebsd/crochet/tobias/src/tmp/home/john/freebsd/crochet/tobias/src
Once you get to the "arm64.aarch64" part, I don't know why it has to replicate the entire path down to that point all over again. Perhaps there is a good reason - but I wonder if these amazingly long paths are why my "sh" keeps coredumping when I try to build this natively on the RP3.

But, anyway - due to the fact that it replicates the components of the path that way, moving my original "obj" directory bought me nothing - it is recompiling everything with even LONGER path names...

I hope my experiences help someone avoid some of the pitfalls I'm facing.
 
Back
Top