Utility for finding source code for executable in FreeBSD?

Hi!

I've had a brief look at Plan 9 and it looks quite an interesting operating system. I find some utilities very nice and one of them is actually src(1), which finds a source code for an executable.

Honestly, I really miss this-like utility in FreeBSD: sometimes I slightly modify the source code in /usr/src to fit my needs, or sometimes I'm just curious how this or that program works and I want to look into its source. Usually, I do something like this to figure this out:
find /usr/src -name "ed" - in order to find sources for ed(1), for example. However, this particular approach doesn't always work, 'cause sometimes the directories (or even files) do not match the name of their executable.

I'm aware that there is devel/plan9port that provides "src" utility as well, but I'm a bit confused that I have to install all the rest of the programs that this port provides just in order to get only one I really need. Moreover, for some reason I feel that this particular program (the idea behind it - finding the sources) does not require "mimicking" another operating system and "stealing" its programs as is.

I haven't find anybody being interested in this, so I want to ask here: is there an utility in FreeBSD (maybe in the base, maybe in the ports) that is similar to Plan 9's src(1) tool? The reason I'm particularly interested about FreeBSD is that FreeBSD is shipped with its source code and I suppose it should be possible to implement such a tool and it's gonna fit right in, in my opinion.

Thank you in advance.

Artem
 
What doesn't work? Any other example? Never had problems with finding the source of world parts. If it's GNU, then it's in contrib, right?
 
What doesn't work? Any other example? Never had problems with finding the source of world parts. If it's GNU, then it's in contrib, right?
Hi MG,

I agree, cases where it's not that easy to find the source are rare. However, it may be hard to find sources for stuff like strip(1). It's at /usr/bin/strip, but the source for it is in /usr/src/contrib/elftoolchain/elfcopy, not in /usr/src/usr.bin. It's still possible to locate it, but find(1) trick won't work.

So I just thought, maybe there is a tool that can do the job unambiguously?
 
Code:
#!/bin/sh
TOP=/usr/src
for mf in $(find ${TOP}/bin/ ${TOP}/sbin/ ${TOP}/usr.bin/ ${TOP}/usr.sbin/ \
                 ${TOP}/libexec/ ${TOP}/secure/usr.bin/ ${TOP}/secure/usr.sbin/ \
                 ${TOP}/cddl/*bin \
                 -type f -name Makefile -maxdepth 3)
do
mfd=$(dirname $mf)
grep -q PROG= $mf && make .CURDIR=$mfd  -V '${BINDIR}/${PROGNAME} ${LINKS} ${.PATH}' -f $mf
done
crude approximation
for strip will output something like this
/usr/bin/objcopy /usr/bin/objcopy /usr/bin/strip . /usr/src/usr.bin/objcopy /usr/src/contrib/elftoolchain/elfcopy
where before "." are command aliases and after src dirs
 
Unfortunately, it doesn't always work as wanted.

Code:
% whereis -s cc
cc: /usr/ports/devel/py-pyperscan/files/cc
% whereis -s clang
clang: /usr/src/usr.bin/clang

In the example above, only the latter works as expected.
The wanted behavior for the former would be showing the source of hard link target (would be clang). This would be force tracking Makefile*.
 
Code:
#!/bin/sh
TOP=/usr/src
for mf in $(find ${TOP}/bin/ ${TOP}/sbin/ ${TOP}/usr.bin/ ${TOP}/usr.sbin/ \
                 ${TOP}/libexec/ ${TOP}/secure/usr.bin/ ${TOP}/secure/usr.sbin/ \
                 ${TOP}/cddl/*bin \
                 -type f -name Makefile -maxdepth 3)
do
mfd=$(dirname $mf)
grep -q PROG= $mf && make .CURDIR=$mfd  -V '${BINDIR}/${PROGNAME} ${LINKS} ${.PATH}' -f $mf
done
Looks cool, thank you!

I think I will upgrade your script a bit: I suppose the output of it may be dumped into say /usr/src/src.index file (or maybe somebody will suggest better location) and then another script can grep(1) targets we're interested in.
 
Code:
% whereis -s cc
cc: /usr/ports/devel/py-pyperscan/files/cc
% whereis -s clang
clang: /usr/src/usr.bin/clang

In the example above, only the latter works as expected.
Not a complete solution to the OP's problem, but in cases with hard links whereis(1) may still be useful:
Code:
$ whereis -qsx $(find /usr/bin -samefile /usr/bin/cc)
/usr/src/usr.bin/clang
For strip(1), however, you have to look into the Makefile anyway:
Code:
$ whereis -qsx $(find /usr/bin -samefile /usr/bin/strip)
/usr/src/usr.bin/objcopy
 
Hi MG,

I agree, cases where it's not that easy to find the source are rare. However, it may be hard to find sources for stuff like strip(1). It's at /usr/bin/strip, but the source for it is in /usr/src/contrib/elftoolchain/elfcopy, not in /usr/src/usr.bin. It's still possible to locate it, but find(1) trick won't work.

So I just thought, maybe there is a tool that can do the job unambiguously?
Never noticed. Not even a file with the name "strip" exists but the Makefile in /usr/src/contrib/elftoolchain/elfcopy refers to it:
Code:
LINKS=  ${BINDIR}/elfcopy ${BINDIR}/mcs         \
        ${BINDIR}/elfcopy ${BINDIR}/objcopy     \
        ${BINDIR}/elfcopy ${BINDIR}/strip

EXTRA_TARGETS=  mcs strip objcopy

Not sure if there's a standard for it. I think these references should somehow be indexed to easily find back the source that the executables are made from. It looks like this is the only mention.
In my opinion, this should already exist. Knowing the origin of all system files is relevant, with or without installed source
 
I think these references should somehow be indexed to easily find back the source that the executables are made from.
There seems to be no (completed) standards for it.
Some are in the form you mentioned for strip, but some have Makefile alone in /usr/src/usr.bin/ like xz.
Just in my humble opinion, every (non-builtin) commands in base should have a directory for which the command is installed 1:1 (like xz), and for something like strip / objcopy / mcs (hardlinked for multiple commands) should contain readme.txt (or readme.md) describing which directory to look into for its actual Makefile, and make whereis to understand it.
 
I can suggest to look into Makefiles with recursive grep

Bash:
grep -r -w --include 'Makefile' 'strip' .

and regexp could be modified with more complex and more correct statement
 
Instead of looking in src you might have a quicker way by searching for the executable in /usr/obj

Good idea, but there a cases when that won't work. There are binaries being produced at the installation phase (I suppose), and they do not exist in /usr/obj. For example, /usr/bin/penv
 
Here's an idea, but it is big, ugly and complicated: Anytime "the compiler (which includes all languages, plus the linker) processes any input file, it adds to its output file the following information: Path, mtime and cksum of what it read. Both for the main input file (like test.c) and all include files it read. If when reading the file it finds this information, it copies it to the output. Doing this requires nearly zero CPU time: compared to what is needed to parse and process the input, doing the cksum is super easy. This information is left in some debug string in the object file format, and as a debug string in the resulting ELF format executable. The amount of extra data is not huge: Say an executable is ultimately made from 100 source files (including all the includes), that data fits into 100 bytes per source file, adding 10K. Given that unstripped executables are typically many dozen to low hundreds of kilobytes, not a big deal. If it is too much space, it goes away when the strip command is used. As all the tools such as ar that manage these files already leave debug strings in place, the modification is really just to the compilers and the linker.

For some of my programs at home, I already do some of this manually (it's missing the checksum, but has mtime and mercurial commit ID):
Code:
Versions:
  eqctl:        a985fbf632ad (2024/06/18 16:35:00)
  eqmon.meas:   08ed7f1b1d7f (2024/09/01 19:31:41)
  eqmon_daemon: 08ed7f1b1d7f (2024/09/01 19:31:41)
At work, for a while I was pushing that any data artifact created by our group should have "provenance" data embedded in it, with the provenance concatenated when further processing it. I pretty much failed with that, because it turns out that SQL databases are not a convenient format for this.
 
Hi all again!

If anybody is interested in my solution - I've written a shell script that does the thing. I especially want to say thank you to covacat - I built my script on top of one he suggested (a few posts above).

I created a post about it in thread for useful scripts, if you're interested in it.

Now, some further thoughts on the topic: Plan9's src(1) has a feature to find source code even for a particular symbol. It uses their db(1) debugger for this. So I'm thinking: in this case, does it mean that all the binaries have to be compiled with some debugging symbols/information for it to work? Of course, it's a cool feature, but I personally would prefer my binaries to be stripped rather than filled up with debugging information.
 
If anybody is interested in my solution - I've written a shell script that does the thing.
Thank you.

Now, some further thoughts on the topic: Plan9's src(1) has a feature to find source code even for a particular symbol. It uses their db(1) debugger for this.
Exactly the same idea I proposed above. I may have subconsciously stolen it from Plan 9.

... but I personally would prefer my binaries to be stripped rather than filled up with debugging information.
Why? Does it make a measurable performance difference? Given the most executables use vanishingly little CPU time, I doubt it has a real-world effect. And even if it does, isn't it worth it to be able to find the real source code used?
 
Code:
{
 if ($2 == "Abbrev" ) {
  if($5 == "(DW_TAG_compile_unit)") on = 1; else on = 0;
  next;
 }
 if (!on) next;
 if($2 == "DW_AT_name") {
  src = $NF;
  sub("/[^/]*$","",src);
  if (match(src,"/usr/src/lib/")) next;
  if (dirs[src]) next;
  dirs[src] = 1;
  print src
  }
}
 
Back
Top