Solved rsync code 13 after upgrade to 14.1

Dear All,

Having weird rsync issues after upgrading FreeBSD from 13.2 to 14.1. I cannot find any solution. Setup: doing rsync backup from the root ZFS to an SSD mounted to /media/da0p1. The rsync is initiated as root using /etc/crontab. The log:

Code:
2024/11/15 08:30:01 [61842] building file list
<... a few hundred files listed here ..>
2024/11/15 08:30:41 [61842] sent 15069073 bytes  received 77859 bytes  total size 104786993196
2024/11/15 08:30:41 [61842] rsync error: errors with program diagnostics (code 13) at log.c(245) [sender=3.3.0]

I assume code 13 is related to permissions, but given rsync is started as root, why would this happen? The same rsync commands worked on all previous versions of FreeBSD. It also fails with the same error to a remote SSH host. If I run my rsync backup scripts manually as root then everything works fine, so the issue appears only when the script is initiated by cron.

/etc/crontab entry:
Code:
30    *    *    *    *       root    /home/blackhaz/scripts/rsync-to-external-drive

rsync command in the script:

Code:
if mount | grep $TARGET_DISK; then
       nice rsync --log-file=$RSYNC_LOG_FILE \
       --archive --hard-links --delete --delete-excluded --sparse --xattrs --numeric-ids --acls --progress \
       / /media/$TARGET_DISK/freebsd-backup

Would appreciate any pointers.
 
Last edited:
I assume code 13 is related to permissions
Code 13 appears to be a generic error message, which could be a permission issue, but it could also be other reasons.

Some related errors on other forums seem to suggest the --hard-links might be using a lot of memory if you have many hard links, so you could be running out of memory?
 
It is likely that there was a specific error message between those plain file log lines.

Hardlink support builds a full list of all files in memory, basically with reduced stat(2) info. That is regardless of whether you actually have any hardlinks at all.
 
Triple-checked - no error messages, which is confusing indeed. I'll give it a go without --hard-links, but if I can run the script manually and it succeeds then why would it fail if ran by cron?

Update: same error without --hard-links.
 
If rsync is working when you run it manually, but not working from cron, then I believe it should be possible to "work the ends to the middle" to figure out why they are behaving differently.

I suggest double-checking that you are actually using the exact same arguments in both cases. Might also be informative to compare environment variables between cron and your manual invocation (in particular, $TARGET_DISK).

Whenever I've tried to call rsync with the destination-directory below the source-directory and using recursion (e.g.: -r), I've had problems where it can "loop" recursing unless using an exclude for the dest-dir; e.g.:
--exclude=/media/$TARGET_DISK/freebsd-backup
so that the source tree doesn't "include" (traverse-into) the destination directory. If you use an exclude like this, I'm guessing you probably don't want to use --delete-excluded either, but not certain about that.

Just a guess.. Hope this helps..
 
Hmmm.

Code:
static void filtered_fwrite(FILE *f, const char *in_buf, int in_len, int use_isprint, char end_char)
{
        char outbuf[1024], *ob = outbuf;
        const char *end = in_buf + in_len;
        while (in_buf < end) {
                if (ob - outbuf >= (int)sizeof outbuf - 10) {
                        if (fwrite(outbuf, ob - outbuf, 1, f) != 1)
                                exit_cleanup(RERR_MESSAGEIO);
                        ob = outbuf;
                }
                if ((in_buf < end - 4 && *in_buf == '\\' && in_buf[1] == '#'
                  && isDigit(in_buf + 2) && isDigit(in_buf + 3) && isDigit(in_buf + 4))
                 || (*in_buf != '\t' && ((use_isprint && !isPrint(in_buf)) || *(uchar*)in_buf < ' ')))
                        ob += snprintf(ob, 6, "\\#%03o", *(uchar*)in_buf++);
                else
                        *ob++ = *in_buf++;
        }
        if (end_char) /* The "- 10" above means that there is always room for one more char here. */
                *ob++ = end_char;
        if (ob != outbuf && fwrite(outbuf, ob - outbuf, 1, f) != 1)
                exit_cleanup(RERR_MESSAGEIO);
}

It is invoking the last line.

Is this invoked only for logging, aka is the error in logging?
 
Having weird rsync issues after upgrading FreeBSD from 13.2 to 14.1. I cannot find any solution. Setup: doing rsync backup from the root ZFS to an SSD mounted to /media/da0p1. The rsync is initiated as root using /etc/crontab. The log:

I confirm, although I manually perform rsync, but I get a similar error (UFS fs). I didn't have time to investigate yet, and I quickly used scp.
 
What I suspect is going on is that it doesn't warn about anything wrong with the file transfers, but about logging somewhere.

We had other reports of cron oddities with process output during the last year.
 
Confirming cracauer@'s findings, after looking at the rsync 3.3.0 sources, exit code 13 is only ever generated without a corresponding error explanation message in log.c. All other places in the rsync source that exit with RERR_MESSAGEIO (13) appear to already print an error message before exit.

IMHO, it's a bug that rsync does not print an error message in all cases before exit 13, so I am attaching a patch that can be dropped into to /usr/ports/net/rsync -- this patch prints an error message before every exit 13.

I verified that the patch applies and compiles cleanly on my 14.1-RELEASE ports tree. YMMV, use at your own risk. :)

To apply the patch:
  1. Download the 2 attached files (below)
  2. Run the following commands as root:
cp Makefile-rsync.pat.txt /usr/ports/net/rsync
cp extrapatch-log.c.txt /usr/ports/net/rsync/files
cd /usr/ports/net/rsync/files
mv extrapatch-log.c.txt extrapatch-log.c
cd ..
patch < Makefile-rsync.pat.txt
rm -rf work
make && make reinstall && echo Success


To revert back to the "pre-patched" rsync:
cd /usr/ports/net/rsync
mv Makefile.orig Makefile
rm -rf work
make && make reinstall && echo Success


The patched code should output the new error messages to stderr (not the log specified by --log-file="x") (FWIW, I did it this way because this patch is within the rsync logging-code, and I didn't want to risk the error-logging code recursion-looping)

So when running the patched rsync from cron, I suggest adding something like "2>/tmp/rsync.log" so that the error message on stderr will not just vanish. Then please be sure to check that logfile.

Thanks in advance to the OP (or anybody having the same problem) who tries this patch and reports back the error immediately-preceding the exit -- it suspect it will give us a much better understanding of the true root-cause.
 

Attachments

M F Thank you very much! Here's the rsync.log of a patched rsync:
Code:
filtered_fwrite(): fwrite() error: Broken pipe [sender]
rsync error: errors with program diagnostics (code 13) at log.c(248) [sender=3.3.0]
rsync error: received SIGUSR1 (code 19) at main.c(1621) [generator=3.3.0]
 
I would guess it is the write speed. Try adding
Code:
--bwlimit=1000
which is 1/10 normal and it may run to completion, and if so you can bump up the number to just before a failing one.
 
M F Thank you very much! Here's the rsync.log of a patched rsync:
Code:
filtered_fwrite(): fwrite() error: Broken pipe [sender]
rsync error: errors with program diagnostics (code 13) at log.c(248) [sender=3.3.0]
rsync error: received SIGUSR1 (code 19) at main.c(1621) [generator=3.3.0]

Thank you for providing the log -- I think this is an important step in understanding the root-cause of the problem..

An EPIPE (broken pipe) error is not what I expected, I'm now trying to figure-out why rsync's log descriptor is a pipe() (versus a plain file corresponding to --log-file=x). One theory I have is that rsync fork()'d internally (resulting in separate sender sub-process), and the child process might be trying to write a log message back to the parent over the pipe(), but the descriptor on the other end was closed somehow, resulting in the EPIPE error. Just a theory..

I'm trying to reproduce the problem on my end in the hope I can trace what's actually happening in more detail.

blackhaz: are you still running exactly the script that you posted at the beginning of this thread?
 
M F Thank you very much! Here's the rsync.log of a patched rsync:
Code:
filtered_fwrite(): fwrite() error: Broken pipe [sender]
rsync error: errors with program diagnostics (code 13) at log.c(248) [sender=3.3.0]
rsync error: received SIGUSR1 (code 19) at main.c(1621) [generator=3.3.0]

I reconfigured my 14.1-RELEASE system so I could run your rsync command script, and I was able to reproduce your problem!

Now I think I understand and can explain what's happening:
  • With the specific rsync options in your script, rsync will recurse into /media/da0p1/freebsd-backup multiple times, until the destination filesystem runs out of disk space, ultimately causing rsync to error-out.
  • To clarify, when the source-dir is "/" and the destination is "/media/da0p1/freebsd-backup", the first (normal and expected) copy will be made, but then when rsync's source-traversal descends into "/media/da0p1/freebsd-backup" and the destination then becomes "/media/da0p1/freebsd-backup/media/da0p1/freebsd-backup", another copy will be created recursively (almost certainly not what you want / expect), and this continues making deeper recursive copies until the fs fills.
  • It's a mystery to me how your script (when run from command-line) would not have eventually filled your destination filesystem.
  • If $RSYNC_LOG_FILE was under /media/da0p1/freebsd-backup, it's possible that any final error messages may not have been written to the log due to lack of disk space.
My recommendations:
  • Get rid of the extra/bogus recursive copies first by running:
    rm -rf /media/da0p1/freebsd-backup/media/da0p1
  • Add the option: --exclude=/media/da0p1/freebsd-backup (this will make it so rsync's source-traversal does not descend into the /media/da0p1/freebsd-backup directory, because you don't want to copy anything from that directory)
  • Add the option: --exclude=/dev (this will prevent copying device-nodes, which aren't needed or useful in your copy)
  • Get rid of --delete-excluded (otherwise, it's likely to delete files in your destination after the file-copying is done; probably not what you want)
  • If after making the above changes to your script and re-running it, it's still not working like you expect, check /var/log/messages to see if the last run generated any new system-level errors (filesystem full, out of memory, etc)
  • Maybe the net/rsync port maintainer might be willing to integrate this rsync log.c patch, so that future similar error messages for all rsync users are not similarly "lost"??
I tested your script with the changes I outlined, and it's working exactly as-expected for me from cron, and exiting with status code 0.

Hope this helps!
 
jb_fvwm2 Same error with --bwlimit.
M F I am terribly sorry, I should have included the full script, but I am excluding /media, /etc and other directories. The full script is:

Code:
#!/bin/sh

TARGET_DISK="da0p1"
RSYNC_LOG_FILE="/home/blackhaz/scripts/last-external-backup.log"

umount /dev/$TARGET_DISK
fsck -y /dev/$TARGET_DISK
mount -t ufs /dev/$TARGET_DISK /media/$TARGET_DISK

if mount | grep $TARGET_DISK; then
       nice rsync --log-file=$RSYNC_LOG_FILE \
       --archive --hard-links --delete --delete-excluded --sparse --xattrs --numeric-ids --acls --progress \
       --exclude=/usr/home/blackhaz/.cache --exclude=/dev --exclude=/tmp --exclude=/media \
       --exclude=/mnt --exclude=/proc --exclude=/var/cache --exclude=/compat/linux \
       --exclude="/var/log" \
       / /media/$TARGET_DISK/freebsd-backup
       echo "`date` rsync $TARGET_DISK code $?" >> /home/blackhaz/scripts/last-raspberry-backup
else
       echo "`date` $TARGET_DISK is not mounted" >> /home/blackhaz/scripts/last-raspberry-backup
fi

No error messages in /var/log/messages. I can see the script is being started by the cron (I've moved it to 26th minute for testing purposes):

Code:
Nov 16 08:26:00 freebsd /usr/sbin/cron[77113]: (root) CMD (/home/blackhaz/scripts/rsync-to-external-drive 2>/home/blackhaz/scripts/rsync.log)

After that, no errors there, but still Broken pipe in the rsync.log.
 
No error messages in /var/log/messages. I can see the script is being started by the cron (I've moved it to 26th minute for testing purposes):
[...]
After that, no errors there, but still Broken pipe in the rsync.log.

Thank you for the full script and the status update.

I ran your script multiple times (from cron).. FYI it's worked perfectly every time for me.

ATM, all I can think-of is to ask a few more questions:
  • What % of the time does rsync-to-external-drive script fail?
  • Have you compared the logs from one run to the next to see if it's failing at or near the same spot each run?
  • After you did the 13.2->14.1 OS upgrade, did you do a "pkg upgrade" or delete / reinstall all desired packages? If neither, then maybe rsync is using a library from an un-upgraded dependency -- incompatible API differences between versions could cause an unexpected rsync exit() that could explain a "broken pipe".
  • Is it possible that your script ever failed when you ran from an interactive shell, and you just didn't notice the failure?
  • What architecture is your system?
  • Is there any chance that your system might be overheating under high load?
Because it's apparently working fine for me and not for you, I'm left guessing what might be relevant differences between our setups.. Hmmm, do you have a free drive on which you could install a fresh 14.1-RELEASE, and if so, would you be willing / able to temporarily boot from that drive just long enough to run your script? Not suggesting you setup your entire user environment, just base OS and necessary packages to run your script. I only ask because I think it would be helpful to "partition the problem" by understanding if something went wrong with your 13.2->14.1 upgrade process, if it's a hardware problem, a software problem, or something else..

So now any changes you can make on your end that cause it to behave differently, or any changes on my end that get a broken pipe could really help home-in on the root-cause..
 
  • What % of the time does rsync-to-external-drive script fail?
100% - fails every time.
  • Have you compared the logs from one run to the next to see if it's failing at or near the same spot each run?
Fails on different file on each attempt, but they're all in the same directory.
  • After you did the 13.2->14.1 OS upgrade, did you do a "pkg upgrade" or delete / reinstall all desired packages? If neither, then maybe rsync is using a library from an un-upgraded dependency -- incompatible API differences between versions could cause an unexpected rsync exit() that could explain a "broken pipe".
Yes. pkg reports all packages are up to date.
  • Is it possible that your script ever failed when you ran from an interactive shell, and you just didn't notice the failure?
No. Script finishes fine when running manually. I am sure.
  • What architecture is your system?
% uname -a
FreeBSD freebsd.my.domain 14.1-RELEASE-p5 FreeBSD 14.1-RELEASE-p5 GENERIC amd64
  • Is there any chance that your system might be overheating under high load?
No, temperature is fine. The system is perfectly stable. It's my daily driver laptop. Everything else works fine.
Because it's apparently working fine for me and not for you, I'm left guessing what might be relevant differences between our setups.. Hmmm, do you have a free drive on which you could install a fresh 14.1-RELEASE, and if so, would you be willing / able to temporarily boot from that drive just long enough to run your script?
I might try this later, if nothing else helps. Thanks for the suggestion!

Really peculiar situation.
 
Gents,

Found the issue. It was --progress. Without it the script runs fine under cron. Very grateful for all the pointers provided and readiness to help! Amazing community.
 
Gents,

Found the issue. It was --progress. Without it the script runs fine under cron. Very grateful for all the pointers provided and readiness to help! Amazing community.

Great news!! Thank you for the update. You're welcome..

cracauer@: Maybe cron does something odd when it reads stdout/stderr lines from the children that are longer than some buffer length?
 
--progress just logs the same way as other logging facilities. The error will probably pop up the same way on the regular logging lines (filenames etc) eventually, on the install that sees the error with --progress now.

No, there is something in how cron handles processes and pipes thereto. I dunno whether it is just an overflow triggered by the carriage return chars without newline, but I doubt it. In any case the cron case worked with --progress for at least one person here, right?
 
--progress just logs the same way as other logging facilities. The error will probably pop up the same way on the regular logging lines (filenames etc) eventually, on the install that sees the error with --progress now.

No, there is something in how cron handles processes and pipes thereto. I dunno whether it is just an overflow triggered by the carriage return chars without newline, but I doubt it. In any case the cron case worked with --progress for at least one person here, right?

Yes -- when I included --progress and ran from cron, it worked fine for me, but I may not have the same pathname lengths of the OP.

It appears that rsync --progress has different style output when combined with --log-file=x (versus absence of that option); it does not appear to do the 1-second percentage updates followed by carriage-return-only.. You can try it and see for yourself..

I found that even with --log-file=x, rsync can still write occasional messages directly to stderr (something like "file X has vanished") that do not go to the x logfile.
 
Back
Top