Solved rsync code 13 after upgrade to 14.1

cracauer@:

Correction!

I just noticed that rsync --progress does output the normal 1-second progress followed by carriage-return to stdout (in addition to what is output via --log-file=x), but I failed to see that my root mailbox was getting huge emails from the rsync --progress cron output!

So now I think it's quite likely that the --progress output by rsync is indeed generating hugely-long lines, and I'll bet there is some corner-case where cron doesn't handle it properly. This easily could be the root-cause of the OP's issue and the other cron reports you've heard.
 
marq, let me know if you need anything from me if you'd like to investigate this further.

Can you wrap a `truss -f -o whatever` around rsync and does it still error in cron?

If so I would like to see that truss output.

The big question I have is whether the socket that is closed and causing the pipe error is connected to cron, or whether it is a connection between rsync parent and child.
 
Here you go, starting from some Firefox profile file it tried to backup - that's where rsync failed with code 13.

Code:
99585: openat(AT_FDCWD,"usr/home/blackhaz/.mozilla/firefox/3iw162tg.default-1533074132327/datareporting/glean/db/data.safe.bin",O_RDONLY,00) = 4 (0x4)
99585: fstat(4,{ mode=-rw-r--r-- ,inode=2273737,size=210781,blksize=131072 }) = 0 (0x0)
99585: write(1,"\r            107 100%    0.79kB"...,4096) ERR#32 'Broken pipe'
99585: issetugid()                 = 0 (0x0)
99585: fstatat(AT_FDCWD,"/usr/share/nls/C/libc.cat",0x8211d6520,0x0) ERR#2 'No such file or directory'
99585: fstatat(AT_FDCWD,"/usr/share/nls/libc/C",0x8211d6520,0x0) ERR#2 'No such file or directory'
99585: fstatat(AT_FDCWD,"/usr/local/share/nls/C/libc.cat",0x8211d6520,0x0) ERR#2 'No such file or directory'
99585: fstatat(AT_FDCWD,"/usr/local/share/nls/libc/C",0x8211d6520,0x0) ERR#2 'No such file or directory'
99585: write(2,"filtered_fwrite(): fwrite() erro"...,56) = 56 (0x38)
99585: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
99585: sigaction(SIGUSR1,{ SIG_IGN 0x0 ss_t },{ 0x8245f3890 SA_SIGINFO ss_t }) = 0 (0x0)
99585: sigprocmask(SIG_SETMASK,{ },0x0)         = 0 (0x0)
99585: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ }) = 0 (0x0)
99585: sigaction(SIGUSR2,{ SIG_IGN 0x0 ss_t },{ 0x8245f3890 SA_SIGINFO ss_t }) = 0 (0x0)
99585: sigprocmask(SIG_SETMASK,{ },0x0)         = 0 (0x0)
99585: wait4(99586,{ SIGNALED,sig=SIGFPE },WNOHANG,0x0) = 0 (0x0)
99585: getpid()                     = 99585 (0x18501)
99585: kill(99586,SIGUSR1)             = 0 (0x0)
99586: select(4,{ 3 },{ },{ 3 },{ 60.000000 })     ERR#4 'Interrupted system call'
99585: getpid()                     = 99585 (0x18501)
99586: SIGNAL 30 (SIGUSR1) code=SI_USER pid=99585 uid=0
99586: sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0)     = 0 (0x0)
99585: write(3,"2024/11/17 18:24:38 [99585] sent"...,96) = 96 (0x60)
99586: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ SIGUSR1 }) = 0 (0x0)
99585: getpid()                     = 99585 (0x18501)
99586: sigaction(SIGUSR1,{ SIG_IGN 0x0 ss_t },{ 0x8245f3890 SA_SIGINFO ss_t }) = 0 (0x0)
99585: write(3,"2024/11/17 18:24:38 [99585] rsyn"...,112) = 112 (0x70)
99586: sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0)     = 0 (0x0)
99585: write(2,"rsync error: errors with program"...,84) = 84 (0x54)
99586: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ SIGUSR1 }) = 0 (0x0)
99586: sigaction(SIGUSR2,{ SIG_IGN 0x0 ss_t },{ 0x8245f3890 SA_SIGINFO ss_t }) = 0 (0x0)
99586: sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0)     = 0 (0x0)
99586: getpid()                     = 99586 (0x18502)
99586: kill(99587,SIGUSR1)             = 0 (0x0)
99587: select(1,{ 0 },{ },{ 0 },{ 60.000000 })     ERR#4 'Interrupted system call'
99587: SIGNAL 30 (SIGUSR1) code=SI_USER pid=99586 uid=0
99586: write(2,"rsync error: received SIGUSR1 (c"...,74) = 74 (0x4a)
99587: sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0)     = 0 (0x0)
99587: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ SIGUSR1 }) = 0 (0x0)
99587: sigaction(SIGUSR1,{ SIG_IGN 0x0 ss_t },{ 0x8245f3890 SA_SIGINFO ss_t }) = 0 (0x0)
99585: write(1,"\r            107 100%    0.79kB"...,4096) ERR#32 'Broken pipe'
99587: sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0)     = 0 (0x0)
99587: sigprocmask(SIG_SETMASK,{ SIGHUP|SIGINT|SIGQUIT|SIGILL|SIGTRAP|SIGABRT|SIGEMT|SIGFPE|SIGKILL|SIGBUS|SIGSEGV|SIGSYS|SIGPIPE|SIGALRM|SIGTERM|SIGURG|SIGSTOP|SIGTSTP|SIGCONT|SIGCHLD|SIGTTIN|SIGTTOU|SIGIO|SIGXCPU|SIGXFSZ|SIGVTALRM|SIGPROF|SIGWINCH|SIGINFO|SIGUSR1|SIGUSR2 },{ SIGUSR1 }) = 0 (0x0)
99587: sigaction(SIGUSR2,{ SIG_IGN 0x0 ss_t },{ 0x8245f3890 SA_SIGINFO ss_t }) = 0 (0x0)
99587: sigprocmask(SIG_SETMASK,{ SIGUSR1 },0x0)     = 0 (0x0)
99585: exit(0xd)                
99585: process exit, rval = 13
99587: nanosleep({ 0.100000000 })         = 0 (0x0)
99586: nanosleep({ 0.100000000 })         = 0 (0x0)
99586: exit(0x13)                
99586: process exit, rval = 19
99587: exit(0x13)                
99587: process exit, rval = 19
 
So it is failing to write to fd 1, which normally is stdout.

It is possible to connect stdout back to the parent.

In this case it is the topmost rsync pid to fail (the parent), so I think it is actual stdout.

It later successfully writes to fd 2.

I need to think about this for a moment. It is odd that fd1 is broken and fd2 is not. Unless fd2 is the logfile. Can you go through the whole log and look for an open(write) call that returns fd 2?
 
I need to think about this for a moment. It is odd that fd1 is broken and fd2 is not. Unless fd2 is the logfile. Can you go through the whole log and look for an open(write) call that returns fd 2?

No, that can't be it. --log-file opens on 3 according to a quick test. No attempt is made to change stdout or stderr.

So it is cron closing one pipe too early but not the other, no?
 
It appears cron's child_process() function (in do_command.c) simply reads and copies [one character at a time via putc()] everything that was read from the cron-spawned process's stdout and writes it to the mail-command-pipe (i.e.: "dma"). FWIW I noticed that the putc() return code (error) is not checked by cron.

So I theorize that the following might be happening:
  1. rsync outputs --progress to stdout.
  2. cron reads the parent rsync's stdout one character at a time, and writes each character to the mail-pipe.
  3. If the mail command (earlier spawned by cron) exits for any reason, I theorize that cron's "runner process" may receive an uncaught SIGPIPE signal and exit.
  4. When rsync writes the next character of --progress output to it's stdout, the cron runner process at the other end of that pipe is no-longer there, so the rsync process itself is likely to receive a SIGPIPE signal.
I made a program to write lines where each one is 32 megabytes long, ran it from cron, and the runner did not abnormal-exit, so I don't think it's simply long lines. There could be some specific sequence of characters in the OP's stdout stream that might be causing sendmail (dma) to error/exit (possible, given the carriage-returns intermixed in that stream). Worth notiing that I haven't applied any freebsd-update patches on my system, so there could be a difference in cron and/or dma between the 14.1-RELEASE I'm running vs the 14.1-RELEASE-p5 that the OP is running.

It would be nice to root-cause this problem, even though the OP has a workaround.

blackhaz:

Could you change your script to add --progress back in (temporarily) and re-run from cron once after adding
>/tmp/cron.stdout
to the end of your /etc/crontab command-line? Then gzip & direct-message me with cron.stdout.gz? Then I could use that to try to reproduce your problem on my end by running a cron job that just cat's your specific output, and see what happens. No problem if this is too much effort and/or you're uncomfortable with sending me the file for any reason.
 
With >tmp/cron.stdout the script is now finishing fine without errors. I am not sure if something has changed - a file list is different now? I'll give it a few more runs tomorrow.
 
But cron connects both stdout and stderr to mail. The email you get has both.

I don't understand why one is closed and the other is not.

Hmmm, is there any evidence that only one of stdout/stderr is closed while the other remains open?

I'm saying both would be closed simultaneously if the mail process dies abnormally for any reason. I think it would be like dominoes: I believe the next write to the mail pipe by the cron runner process would generate an uncaught SIGPIPE (causing the cron runner to exit), and then when the program spawned by cron does it's next write on the pipe (to the cron runner (via either stdout or stderr), the cronjob itself likely would also then get a SIGPIPE.

It's likely I don't fully understand what you're getting-at and/or thinking, so please clarify if this doesn't click with you..
 
With >tmp/cron.stdout the script is now finishing fine without errors. I am not sure if something has changed - a file list is different now? I'll give it a few more runs tomorrow.

Sorry, I should have stated explicitly that I expected no SIGPIPE with >/tmp/cron.stdout. FWIW, this is because cron then directly-connects stdout to /tmp/cron.stdout -- specifically this bypasses sending it through the cron/mail pipeline and hopefully captures exactly what I need to reproduce the SIGPIPE problem on my end.

If your cron.stdout.gz is small enough to attach to a PM, please attach & send it.
 
Yes, the write to 2 succeeds after the one to 1 fails.

Hmmm..

Are you basing your conclusion from blackhaz's log (where rsync logged the SIGPIPE error that was obviously on stderr)?

If so, do you recall that the rsync stderr stream was redirected to a separate file via the crontab command-line? I am confident this means that rsync would have been writing stderr directly to the file /tmp/rsync.log, and therefore stderr would not have been going through the cron/mail pipe machinery (even though stdout still was). So I am wondering if we really can conclude that the write to 2 succeeding after the write to 1 failed means that the cron/mail pipeline was actually still open?

Maybe I'm missing something obvious here, and it could become clear to me if you would walk me through it.
 
Hmmm..

Are you basing your conclusion from blackhaz's log (where rsync logged the SIGPIPE error that was obviously on stderr)?

If so, do you recall that the rsync stderr stream was redirected to a separate file via the crontab command-line? I am confident this means that rsync would have been writing stderr directly to the file /tmp/rsync.log, and therefore stderr would not have been going through the cron/mail pipe machinery (even though stdout still was). So I am wondering if we really can conclude that the write to 2 succeeding after the write to 1 failed means that the cron/mail pipeline was actually still open?

Maybe I'm missing something obvious here, and it could become clear to me if you would walk me through it.

Good point but I verified with a test run that fd 2 is not connected to the logfile.
 
OK.. Sounds like you have a local reproducible test-case where you can regularly receive SIGPIPE in the process you're running from cron? If so, can you send me a copy of your program or script? I'd like to be able to reproduce it on my system..
 
OK.. Sounds like you have a local reproducible test-case where you can regularly receive SIGPIPE in the process you're running from cron? If so, can you send me a copy of your program or script? I'd like to be able to reproduce it on my system..

No. I just ran truss on rsync with the logfile option to see whether the logfile would end up on fd2, that's all.
 
Sorry, I should have stated explicitly that I expected no SIGPIPE with >/tmp/cron.stdout. FWIW, this is because cron then directly-connects stdout to /tmp/cron.stdout -- specifically this bypasses sending it through the cron/mail pipeline and hopefully captures exactly what I need to reproduce the SIGPIPE problem on my end.

If your cron.stdout.gz is small enough to attach to a PM, please attach & send it.
marq Just got the script running via this cron entry:
Code:
9    *    *    *    *       root    /home/blackhaz/scripts/rsync-test >/tmp/cron.stdout 2>/tmp/rsync.log

cron.stdout has no messages from rsync whatsoever - only routine messages from fsck.
rsync.log is clean - script has finished successfully. It appears that if I do >/tmp/cron.stdout then the script runs fine and I no longer get code 13 errors. This is with --progress.
 
Back
Top