SCO's “evidence” of copying betweeen Linux and UnixWare

by Greg Lehey
Last updated: $Date: 2013/06/18 06:25:57 $

Note: The opinions expressed here are my own and have no relationship with the opinions or official viewpoints of any organization with which I am associated


I wrote this shortly after the report, but after careful examination of the evidence. I then continued my examinations and discovered that I was wrong.

I am now convinced that the code in question was, indeed, derived from UNIX System V.4, and not an earlier version of UNIX, as some other people have claimed. This does not mean that it was stolen. Nevertheless, this page may still be of interest, so I'm not deleting it. Here's the current analysis.

On 20 August 2003, Heise Verlag in Germany published an update to a report with the (translated) unemotional and objective title “SCO threatens to kill Open Source”. It's the only thing I've seen so far that refers to a presentation of textual similarities between UnixWare and Linux. I don't have time to translate the whole thing, but here's the important bit:

Assisted by his vice president, Chris Sontag, McBride showed examples from the code of Linux 2.5 and 2.6 which should prove that source code has been taken out of Unix without change–an example shown by SCO shows code commentaries ... Identical typing mistakes in the commentaries and unusual formulations had left traitorous traces, claimed Sontag. To prove this, McBride had hired a team for pattern recognition to hunt through tens of thousands lines of code. The few code sequences near the comments were made illegible to protect SCO's copyrights.
The is stupid. There's no code in common, just a comment which, admittedly, looks to be the same. I also don't see any “typing mistakes”. But where does it come from? On the SCO side, it includes another line (“The swap map unit is 512 bytes”). Maybe this is not correct for Linux. But people who copy comments so literally don't remove things just because they're wrong; they haven't fixed the broken indentation, for example, assuming that this is really broken indentation in the code, and not a badly prepared slide; there's every possibility that it's the latter.

But if two comments are the same except for an addition, which is the original? The one without the addition, obviously. I see this as an indication that the code was copied from Linux to UnixWare.

In addition, the alleged code sequences near the comments which were made illegible to protect SCO's copyrights are really additional commentary. You don't have to be a C programmer to recognize that comments start with /* and end with */, and that people frequently put a single * in multiline comments for stylistic reasons, something that the person who put together this slide obviously didn't consider important.

The comment is in English written in approximate Greek letters and reads:

As part of the kernel evolution toward modular naming, the functions malloc and free are being renamed to rmalloc and rfree. Compatibility will be maintained by by the following assembler code: (also see free/rfree below).
This comment is completely irrelevant to the Linux code to which the first half of the comment has been applied. The Linux version of both of these examples comes from the file arch/ia64/sn/io/ate_utils.c. There are a number of interesting things to note about this file:

Further down in the Heise report, you can read:

In total, SCO's testers claim to have found more than 800,000 lines of duplicate code–an example from SCO
OK, let's look at this example. In fact, it's a continuation of the previous example, the function atealloc in arch/ia64/sn/io/ate_utils.c. There are a number of things to note about it: In summary, I think that this “proof” in fact proves what I have been claiming for some time:

Main SCO page    SCO affair overview    Greg's home page    Greg's diary

$Id: wrong-code-comparison.php,v 1.6 2013/06/18 06:25:57 grog Exp $

Valid XHTML 1.0!