pkg is fundamentally slow

Hi

I noticed that pkg was downloading stuff really slower than axel on the same computer or other windows and Linux machines and even my Android phone on the same network
searching the forums I saw a lot of posts about pkg being slow and always users answer was that "the problem is the mirrors not the pkg tool itself"
While believing in that the mirrors could be a problem sometimes I doubted that pkg was not a slow on itself
So I did some testing
At first I did a "pkg update -f" meanwhile using netstat to get the IP of the mirror that pkg used And measuring the time and speed of pkg downloading only the "packagesite.pkg" file the downloading speed was 172.6 KB/s downloading the 6.42 MB file in 36 seconds then I copied the repo URL in /etc/pkg/FreeBSD.conf And started downloading packagesite.pkg from it with -n 32 argument for axel (Exact file that pkg downloads in repo/FreeBSD:FreeBSDVersion:architecture/latest/packagesite.pkg with exactly the same size)
I used the IP that netstat gave just to make sure both pkg was, and axel is, downloading from the same mirror, and they were (IPs were the same)
axel downloaded the same file in 12 seconds at 523.65 KB/s!
I tested that multiple times and always axel was at least 3 times faster.
It's a simple test that you can do too.
I always add -n 16 to 32 for axel just to speed it even more
Another benefit of axel is that it has a better resume compatibility it seems like pkg will sometimes prompt a" size mismatch, fetching from remote" when it's resuming a download
also I have a problem that I don't have an explanation for, sometimes when downloading a package after while the download speed decrees over time resting at 16.4 KB/s I don't have a good explanation for it, nor I did enough testing for that, so I can't say that it's a pkg problem but all I can say is that same doesn't happen with axel or aria2 manually downloading from mirrors

Pkg is slow and that's a fact,
so I suggest a solution : like the ports that you can change the fetch commands with the FETCH_CMD please add an option in pkg.conf to change it for pkg.
You may ask if it is possible to change the port's fetch command, why not use that instead of pkg ? I acutely do that and for many small programs it's nice speedy and lets you customize a lot but for some bigger programs you will get many dependencies sometimes going 10 layers deep so that not really efficient to use if you don't want to customize the software

Thanks in advance!
 
Two generic things to consider:
  • Caching is often relevant, downloading the same file twice is expected to be faster. So for your comparison, it's relevant which one you try first.
  • I don't know that tool you're using, but if this -n option opens mutliple connections (as I assume), you're hogging more resources on the mirror, making things worse for other people using the same mirror *) IMHO, something like this should never be supported by pkg.
edit: and btw, neither pkg nor fetch are "slow". I have my own pkg repo on my LAN, with devices connected on different lines (e.g. GBit Ethernet, 100 MBit Ethernet, different Wifis) and downloads always reach speeds between 50% and 100% of what the connection can support, just as expected with a single TCP connection.

---
*) well, unless everyone would use such a tool, which would make things slightly worse for everyone: Each connection has some overhead doing it's own congestion control etc, and of course the server side limit would be reached sooner.
 
Please take a look at my screenshot. I'm using my own package repository, and while the file transfer speed is nearly 4 GB/s (!) you can see one thing: Different download speeds, and none of it is as fast as expected. Instead: The smaller the package is, the slower it seems. And all of them took 00:01 - one second.

So if you measured the shortest knowable duration of 1 second for a 7 KiB file, the calculated speed will be 7.0kB/s. If the file size grows, my calculated speed will also grow. But in reality the download didn't took one second per package - it was a blink of an eye for all.

Nothing will fetch my packages faster, pkg isn't the limit here; It's about interpreting the messages pkg gives you.
 

Attachments

  • pkg.png
    pkg.png
    114 KB · Views: 247
Please take a look at my screenshot. I'm using my own package repository, and while the file transfer speed is nearly 4 GB/s (!) you can see one thing: Different download speeds, and none of it is as fast as expected. Instead: The smaller the package is, the slower it seems. And all of them took 00:01 - one second.
You're likely observing TCP slow-start in action.
 
Please take a look at my screenshot. I'm using my own package repository, and while the file transfer speed is nearly 4 GB/s (!) you can see one thing: Different download speeds, and none of it is as fast as expected. Instead: The smaller the package is, the slower it seems. And all of them took 00:01 - one second.

So if you measured the shortest knowable duration of 1 second for a 7 KiB file, the calculated speed will be 7.0kB/s. If the file size grows, my calculated speed will also grow. But in reality the download didn't took one second per package - it was a blink of an eye for all.

Nothing will fetch my packages faster, pkg isn't the limit here; It's about interpreting the messages pkg gives you.
Exactly for the same reason I measured the times with a stop watch and also the speeds are from what both pkg and axel reported, but I kept an eye on pkg and axel and the reported speed was a true and dividing the file size with time gives approximately same speed
Also a self-hosted repo is a different thing than a server over the net, but I wouldn't be surprised that if you used axel on your that repo it would be even faster than pkg (although hard to measure on a 4 GB/s network)
 
on low latency connections parallel downloading won't do much on an unused pipe
if you have other downloads in progress the parallel download will do better (will get n times it's fair share)
 
Two generic things to consider:
  • Caching is often relevant, downloading the same file twice is expected to be faster. So for your comparison, it's relevant which one you try first.
  • I don't know that tool you're using, but if this -n option opens mutliple connections (as I assume), you're hogging more resources on the mirror, making things worse for other people using the same mirror *) IMHO, something like this should never be supported by pkg.
edit: and btw, neither pkg nor fetch are "slow". I have my own pkg repo on my LAN, with devices connected on different lines (e.g. GBit Ethernet, 100 MBit Ethernet, different Wifis) and downloads always reach speeds between 50% and 100% of what the connection can support, just as expected with a single TCP connection.

---
*) well, unless everyone would use such a tool, which would make things slightly worse for everyone: Each connection has some overhead doing it's own congestion control etc, and of course the server side limit would be reached sooner.
For the first point: I don't know about the caching that you are talking about but before running these commands I ran them for at least 3 times
And for the second one : yes that's true -n is for more connections but axel even without that is faster than fetch and more reliable
And for the edit : pkg and fetch are both slow ! Because you tried them on LAN, and it was faster compared to pkg on the net doesn't mean pkg is fast it means pkg on LAN compared to pkg on net is faster for testing if It's really fast or not you need to compare it on the same network with its competitors (axel,aria2,wget and even curl) !
 
It make more sense to read before commenting. I was explicitly talking about speed relative to what the connection can support.

Using multiple connections, you're that jerk in the line hitting your elbows in other ppls stomachs.
 
Perhaps there should be a new category: "It Works Better On Linux Or Windows"

A tonne of threads could be moved there and comfortably ignored.
slower than axel on the same computer
?
*Same computer here meaning same computer with same os (FreeBSD)
*Also considering that it was the first example usealy meaning its the main one ?
 
Code:
[12:25:00] [ns!user]/tmp/dir$time axel -q  -n 8  https://speed.hetzner.de/1GB.bin

real    0m13.882s
user    0m1.285s
sys    0m6.476s
[12:25:24] [ns!user]/tmp/dir$time fetch   https://speed.hetzner.de/1GB.bin
1GB.bin                                               1000 MB   69 MBps    15s

real    0m14.608s
user    0m3.470s
sys    0m3.646s
 
What about `axel -n 1` ?
(To be fair in comparison...)
On an uncontended network path, I would expect parallel connections to give a slight advantage (the larger the file, the less relevant) because multiple connections using slow-start will saturate the path more quickly.

That still doesn't mean it's a good idea. It's, to say the least, impolite towards anyone else using the same shared resources.
 
How are folks actually measuring things? What covacat posted is effectively the time(s) for a command to complete, no? What does all that entail, shell forking/execing the command, libraries loaded, command running, exit, return to shell?
If axel is simply doing a "fetch the file from remote location", is the pkg fetch command doing "just" a fetch or does it do some more work like checking the remote repo for updates, fetching a file, verifying checksums?

I'm only asking because comparisons need to be apples to apples, not apples to kumquats.
I would think using something like wireshark (start capture before hitting enter) would let one actually time different pieces of the transaction. Say you can see when the first connection to the repo is, what is getting pulled first, then the time for actually fetching the files, etc.

I'm also with Zirias on this: setting up multiple connections to the server may be better for you, but has the potential to hurt others using that resource. It's "common knowledge" that you should wait a day or two after quarterly is done or "it's slow the day after a security advisory hits".

Above is all my opinion, based on nothing more than coffee and lack of sleep.
 
Code:
[12:25:00] [ns!user]/tmp/dir$time axel -q  -n 8  https://speed.hetzner.de/1GB.bin

real    0m13.882s
user    0m1.285s
sys    0m6.476s
[12:25:24] [ns!user]/tmp/dir$time fetch   https://speed.hetzner.de/1GB.bin
1GB.bin                                               1000 MB   69 MBps    15s

real    0m14.608s
user    0m3.470s
sys    0m3.646s
Code:
bsd# time axel -n 50 https://speed.hetzner.de/100MB.bin && time fetch https://speed.hetzner.de/100MB.bin
Initializing download: https://speed.hetzner.de/100MB.bin
File size: 100 Megabyte(s) (104857600 bytes)
Opening output file 100MB.bin
Starting download

Connection 1 finished
...
Connection 50 finished


[100%] [...........................................................] [ 587.7KB/s] [00:00]

Downloaded 100 Megabyte(s) in 2:54 minute(s). (587.69 KB/s)
axel -n 50 https://speed.hetzner.de/100MB.bin  1.43s user 0.35s system 1% cpu 2:55.95 total
100MB.bin                                              100 MB   39 kBps 43m02s
fetch https://speed.hetzner.de/100MB.bin  0.44s user 0.35s system 0% cpu 43:04.21 total
bsd#
 
These measurements make no sense at all, leading me to a different idea: Maybe it's just your IPv6 connectivity that sucks. Don't know about "axel", but "fetch" prefers IPv6 if available. Try fetch -4.

BTW, if you still see such a huge difference between using 1 connection or 50, better complain to your ISP. Again, this makes no sense. For comparison, result here with fetch is:
100MB.bin 100 MB 7969 kBps 13s.
 
Dear baaz,
for me it is about 5% of the times similar. In the past others have had reported that, too. One can select a different server by editing the configuration. As far as I remember there is also a tool in the ports which probe different servers. But the quality differs from time to time. Since I do not have to stare on the screen until a download is finished I have kept everything as default. Most of the time the upgrade performs well.
 
… like the ports that you can change the fetch commands with the FETCH_CMD please add an option in pkg.conf to change it for pkg. …

If you like, make a bug report; <{link removed}>

For pkg: before doing so, please also check that there's not already an issue or pull request (PR) here:

{link removed}

For some other points of discussion, answers might be in this recent topic:


a tool in the ports

– please see page 2.
 
Last edited:
Back
Top