Solved find - does not seem to return filenames as they are found?

byrnejb · Jun 19, 2023

We have a server with an excessively large /var/spool/clientmqueue. I am trying to clear this out. The technique I tried was:

Code:

find /var/spool/clientmqueue -type f -exec rm -f {} \;

This command exhibits behaviour that I was not expecting. I anticipated that this would remove files as find encountered them. This does not seem to be the case. There was an interruption due to a network error and when I checked the directory size it had not changed although the find had been running for a significant amount of time.

I then used this command to see what was going on:

Code:

find /var/spool/clientmqueue -type f -ok rm {} \;

This command eventually, after a very, very long time, began to request permission to delete files. The problem is that it took an extraordinary length of time before it began the delete process. I had previously believed that find acted upon files as it encountered them. I therefore expected the first confirmation request to be nearly instantaneous. Either my belief is unfounded or something else is going on. Does not findexecute as it encounters qualified files? If so then what is causing the inordinate delay. If not then is there a utility that does exactly that?

schweikh · Jun 19, 2023

No matter what is going on, my take would be to forget find's -exec rm in favor of xargs,

Code:

find /var/spool/clientmqueue -type f -print0 | xargs -0 -r rm

which saves a ton of forks.

bakul · Jun 19, 2023

If you try find /var/spool/clientmqueue -type f -exec echo {} \; you will see that files are echoed immediately. Something else must be going on. However, as schweikh says, pipe to xargs instead of using -exec.

covacat · Jun 19, 2023

you can rm -fr /var/spool/clientmqueue and recreate it

kent_dorfman766 · Jun 19, 2023

my WAG: filesystem operations are probably being cached. Don't look at the filesystem from another shell and expect 100% syncronous behaviour.

I see this all the time when processing large datasets. It does in line with that concept where if you delete a file that someone else has open then the dirent may go away, but the file still occupies cluster space until the other user closes the file.

And in OPs use case they are messing with a queue directory, which undoubtedly has other daemons concurrently accessing that directory tree.

ralphbsz · Jun 20, 2023

byrnejb said:
The technique I tried was:

As others have said, this is just about the least efficient technique. Doing "rm -r ..." would be much more efficient, and find | xargs would still be better. If you have to be selective (only delete certain things, for example in your find you select only files, not directories): Run ls -l into a file, then filter that file, and use the filtered file as input into xargs, or turn the file into a script. When doing this, minimize the number of times the rm executable needs to be started, so every time you run rm, give it hundreds or thousands of files at once.

and when I checked the directory size it had not changed

What do you mean by "directory size"? Did you do an "ls -l" on the directory itself, and look at its filesize in bytes? Be careful with that, since the file size of a directory is not a well defined concept. In ZFS, the size in bytes is roughly the number of entries in the directory (typically +2 for . and .., and if any of the directory entries are directories themselves, there might be an extra +1 for each subdirectory's backlink, but I'm not sure). On UFS, the size is measured in sectors, and the rules for when directories are compacted are interestingly complex, so don't expect directories to shrink immediately.

This command eventually, after a very, very long time, began to request permission to delete files. The problem is that it took an extraordinary length of time before it began the delete process.

Something is going on with your underlying file system. As you said, the find should start pretty much immediately. You didn't by chance run top while it was running?

kent_dorfman766 said:
my WAG: filesystem operations are probably being cached. Don't look at the filesystem from another shell and expect 100% syncronous behaviour.

Yes and no. Deleting a file is synchronous from the viewpoint of the name space: when rm finishes (or equivalently when the unlink system call returns), the directory entry (the name of the file) shall be gone. Any ls (or readdir() or equivalent operation, such as the find command) shall not see the directory entry. From the viewpoint of the space usage of the file, the story is more complex. As you said, if the file is open by another process (or has another name = hardlink), then the underlying storage can not be released. And cleaning up storage is theoretically allowed to happen later. I think UFS will immediately release it in the unlink call. With ZFS, the log cleaner adds a lots of complexity, and I don't know whether delete operations are synchronous or not by default. Even if they are synchronous, a large number of deletes can create a huge backlog for log cleaners to work on, which will slow them down. Now add dedup and snapshots, and the answer is complex.

userxbw · Jun 20, 2023

did you try

Code:

find /var/spool/clientmqueue -type f -exec rm -fv {} \;

byrnejb · Jun 20, 2023

covacat said:
you can rm -fr /var/spool/clientmqueue and recreate it

I was thinking about doing that. The problem arose from an inadvertent start of the sendmail service on a remote host. Sendmail was not configured and so there were a lot of local warning messages generated before the problem was noticed. I thought it best to leave unfamilar parts of the file system alone and just clean out the files. The unexpected results turned this into a learning experience for me.

byrnejb · Jun 20, 2023

bakul said:
If you try find /var/spool/clientmqueue -type f -exec echo {} \; you will see that files are echoed immediately. Something else must be going on. However, as schweikh says, pipe to xargs instead of using -exec.

Sadly, this does not work for me as suggested. This also results in a long delay with no visible output.

byrnejb · Jun 20, 2023

kent_dorfman766 said:
my WAG: filesystem operations are probably being cached. Don't look at the filesystem from another shell and expect 100% syncronous behaviour.

I see this all the time when processing large datasets. It does in line with that concept where if you delete a file that someone else has open then the dirent may go away, but the file still occupies cluster space until the other user closes the file.

And in OPs use case they are messing with a queue directory, which undoubtedly has other daemons concurrently accessing that directory tree.

Sendmail and its associated daemons are not running. We use dma on that host. I believe that /var/spool/clientmqueue is specific to sendmail. I was only checking whether or not there were any changes happening at all.

byrnejb · Jun 20, 2023

schweikh said:
No matter what is going on, my take would be to forget find's -exec rm in favor of xargs,

Code:

find /var/spool/clientmqueue -type f -print0 | xargs -0 -r rm

which saves a ton of forks.

That worked very well. Thanks.

Eric A. Borisch · Jun 20, 2023

Late to the party, but find(1) also has -delete, removing the need for another execution at all. (Edit: fixed single-dash)

yuripv79 · Jun 20, 2023

I see that the thread is marked as "solved", however was the original question really answered?

userxbw · Jun 20, 2023

yuripv79 said:
I see that the thread is marked as "solved", however was the original question really answered?

now that's the real question.

byrnejb · Jun 22, 2023

yuripv79 said:
I see that the thread is marked as "solved", however was the original question really answered?

Not really. But the underlying operational requirement was satisfied.