SCO's “evidence” of copying betweeen Linux and UnixWare

by Greg Lehey
Last updated: $Date: 2013/06/18 06:25:57 $

Note: The opinions expressed here are my own and have no relationship with the opinions or official viewpoints of any organization with which I am associated

IMPORTANT

I wrote this shortly after the report, but after careful examination of the evidence. I then continued my examinations and discovered that I was wrong.

I am now convinced that the code in question was, indeed, derived from UNIX System V.4, and not an earlier version of UNIX, as some other people have claimed. This does not mean that it was stolen. Nevertheless, this page may still be of interest, so I'm not deleting it. Here's the current analysis.

On 20 August 2003, Heise Verlag in Germany published an update to a report with the (translated) unemotional and objective title “SCO threatens to kill Open Source”. It's the only thing I've seen so far that refers to a presentation of textual similarities between UnixWare and Linux. I don't have time to translate the whole thing, but here's the important bit:

Assisted by his vice president, Chris Sontag, McBride showed examples from the code of Linux 2.5 and 2.6 which should prove that source code has been taken out of Unix without change–an example shown by SCO shows code commentaries ... Identical typing mistakes in the commentaries and unusual formulations had left traitorous traces, claimed Sontag. To prove this, McBride had hired a team for pattern recognition to hunt through tens of thousands lines of code. The few code sequences near the comments were made illegible to protect SCO's copyrights.

The is stupid. There's no code in common, just a comment which, admittedly, looks to be the same. I also don't see any “typing mistakes”. But where does it come from? On the SCO side, it includes another line (“The swap map unit is 512 bytes”). Maybe this is not correct for Linux. But people who copy comments so literally don't remove things just because they're wrong; they haven't fixed the broken indentation, for example, assuming that this is really broken indentation in the code, and not a badly prepared slide; there's every possibility that it's the latter.

But if two comments are the same except for an addition, which is the original? The one without the addition, obviously. I see this as an indication that the code was copied from Linux to UnixWare.

In addition, the alleged code sequences near the comments which were made illegible to protect SCO's copyrights are really additional commentary. You don't have to be a C programmer to recognize that comments start with /* and end with */, and that people frequently put a single * in multiline comments for stylistic reasons, something that the person who put together this slide obviously didn't consider important.

The comment is in English written in approximate Greek letters and reads:

As part of the kernel evolution toward modular naming, the functions malloc and free are being renamed to rmalloc and rfree. Compatibility will be maintained by by the following assembler code: (also see free/rfree below).

This comment is completely irrelevant to the Linux code to which the first half of the comment has been applied. The Linux version of both of these examples comes from the file arch/ia64/sn/io/ate_utils.c. There are a number of interesting things to note about this file:

It's copyrighted by Silicon Graphics. Nobody has mentioned them so far. But they have a UNIX source license for IRIX. And look at exactly what it says:
```
/* $Id: ate_utils.c,v 1.1 2002/02/28 17:31:25 marcelo Exp $
 *
 * This file is subject to the terms and conditions of the GNU General Public
 * License.  See the file "COPYING" in the main directory of this archive
 * for more details.
 *
 * Copyright (C) 1992 - 1997, 2000-2002 Silicon Graphics, Inc. All rights reserved.
 */
```
This is not new code, but the RCS identifier (the string starting with $Id$ in the second line) shows that it was incorporated by marcelo on 28 February 2002. marcelo is Marcelo W. Tosatti, the main person active in developing the Linux virtual memory system.
The code is related to the Intel ia64 architecture.
There is no reference to the functions malloc or free in this file, only to the Linux equivalents kmalloc and kfree. Neither are referenced in this function, so the comment would be completely inappropriate.

Further down in the Heise report, you can read:

In total, SCO's testers claim to have found more than 800,000 lines of duplicate code–an example from SCO

OK, let's look at this example. In fact, it's a continuation of the previous example, the function atealloc in arch/ia64/sn/io/ate_utils.c. There are a number of things to note about it:

Since it doesn't use malloc or free, it's clearly not the same function as the one that SCO claims it was copied from.
It uses Linux-specific SMP functions like mutex_spinlock. This wouldn't be so strange if it had been a stolen algorithm. For example, if somebody took System V code which uses malloc or rmalloc and replaced it with a call to the Linux-specific call kmalloc, I'd still consider that stolen code. The point here is that SCO are claiming that the entire code, including the call to mutex_spinlock, is stolen.
My books about UnixWare SMP locking suggest that in this kind of situation, the lock call would be LOCK, not mutex_spinlock. It returns a value of type pl_t, whereas mutex_spinlock returns a value of type int, and it takes two parameters. This information has been available for years in Vahalia, UNIX Internals: The New Frontiers (Prentice-Hall, 1996). page 213. If SCO now has functions like mutex_spinlock in the kernel, it would appear that they have thrown out their own SMP implementation and incorporated the Linux version.

In summary, I think that this “proof” in fact proves what I have been claiming for some time:

SCO is wrong.
SCO don't have enough understanding of the case to be able to lie convincingly. Indeed, they have so little understanding that it's difficult to accuse them of lying.
If there is any common code in UnixWare and Linux, this “evidence” tends to suggest that it came from Linux.

Main SCO page SCO affair overview Greg's home page Greg's diary

$Id: wrong-code-comparison.php,v 1.6 2013/06/18 06:25:57 grog Exp $