SCO's evidence of copying betweeen Linux and UnixWare
by Greg Lehey
Last updated: $Date: 2003/06/22 02:16:37 $
Note: The opinions expressed here are my own and have no relationship with the
opinions or official viewpoints of any organization with which I am associated
In early June 2003, SCO showed
code samples to non-programmers who confirmed that there were large passages, in one case
80 lines, of identical code, including comments. SCO did not show the revision logs for the
code, nor did they reveal where in the kernel it was supposed to be. Both of these facts
strengthen the viewpoint that SCO has something to hide.
There's probably no single stretch of 80 lines of code anywhere in the UNIX or Linux kernels
which has been developed at one time. There will be modifications. A cvs annotate on FreeBSD
kernel code shows stretches of up to 30 lines here and there which have been committed at the
same time in newer code. Older code, including code imported from 4.4BSD, seldom shows more
than 10 consecutive lines. This is all code that has been functional through all those
revisions. Therefore, looking at the revision history,
-
If all 80 lines show up at once, that suggests it wasn't written there.
-
If the 80 lines show a typical pattern of having been updated over a period of time, it
should be possible to at least compile each intervening revision.
It's unfortunate that Linux wasn't kept under revision control until a year or two ago. But
in this case there are so many versions out there, burnt on CD-ROM, that the history of the
code in Linux should not be difficult to prove more reliably than SCO can prove their case.
Programmers see the evidence
Since I first wrote this text, SCO has shown snippets of code to programmers, ostensibly under
a draconian non-disclosure agreement. Unfortunately, one lawyer in Germany forgot to get the
signature on the NDA, so the most interesting
information comes from Stefan Hildemann. There's also an English
translation of this page, but I haven't checked it for accuracy except for the bits below,
which I have quoted (with minor corrections) from that page.
Under the supervision of a notary public, 46 pages were shown, each containing, on one
half, code from Linux (for the most part, print-outs of posts taken directly from the
Linux-Kernel-Mailing List) and on the other half, listings of SCO. There's no way of knowing
whether these are indeed sources of System V, since they are taken out of context. Another
interesting thing is that all date and time details have been removed from both, even from
the comments.
It's funny that they quote code from the Linux kernel mailing list rather than the kernel code.
That's really no evidence at all. Also the lack of dates looks very suspicious.
The crunch, however, is a function of the scheduler, which is, over a length of about 60
lines, indeed identical except for slight differences. In this section, there is also a whole
lot of corresponding comments.
This is very interesting. In February
2002 I listened to a talk by Rusty Russell about kernel hacking. The example he used was
the Linux scheduler, and if I had been asked for an example of the biggest differences between
UNIX and Linux, I might well have chosen that. It looks strange, though Rusty has given
some good reasons for why it should look like that. If that code is in UnixWare, I would expect
it to come from Linux.
Ian Lance Taylor's visit
Ian's report
is less conclusive from a technical point of view, since he had to sign an NDA. He was also
given less code to look at, but he also noted significant similarities in the code he looked
at, which he was not allowed to identify, though he did say that it was not central to the
system. That definitely leaves out the scheduler, which is arguably the most central part of
the entire system. The other possibility that Stefan mentions are the memory allocation
routines. They're pretty central too. The only thing about both of these code segments is that,
precisely because they're so central, they don't impact on much other code and can thus be
easily changed.
So, did it happen?
The real question is: is the evidence valid? There are a number of possibilities:
-
Yes, some unscrupulous programmer did take UnixWare code and put it into Linux. It's
possible, but given the central nature of the scheduler, I consider it extremely
unlikely. The mail exchange on the LKML also suggests that the information they have was
taken from the mailing list rather from the kernel. Ian confirms that the code is in the
kernel as well, possibly in slightly modified form, but I can hardly believe that the code
was found in the kernel. Otherwise they would have just quoted it rather than looking
for the corresponding messages in the LKML; it's possible that they might not even have found
them.
-
Yes, some unscrupulous programmer did take Linux code and put it into UnixWare. This
seems much more probable: the best place to get them would be from the LKML, where they would
have not just an easily digestible piece of code, but also a discussion about why they should
be using. Both are missing in the Linux kernel sources.
-
The entire code came to both systems from a third source, such as BSD. I don't really see
anything which points in this direction. I've already mentioned that the BSD and Linux
schedulers look very different.
-
SCO has planted evidence. There's no evidence for this either, but their behaviour looks
very suspicious. They have specifically refused to show revision history.
I'm not a clairvoyant. I'll sit back and watch the fun.
Main SCO page SCO affair overview Greg's home page Greg's diary
$Id: evidence.html,v 1.3 2003/06/22 02:16:37 grog Exp $
