The C programming language

From: Sean 'Captain Napalm' Conner <spc_at_armigeron.com>
Date: Sat Mar 11 15:14:35 2000

It was thus said that the Great Hans Franke once stated:
>
> I started it, so now a big answer...
>
> > > I wouldn't consider C as anything 'grown'. maybe evolved in the
> > > sense of degeneration.
>
> > Hey, Hans, I don't get this. C is the most versatile, flexible, and portable
> > language ever devised. It permits complete control of hardware while at the
> > same time allowing elegance in program design and structure.
>
> Now, portable - I assume you talk about the portability of
> the compiler and not the programms (I did spend more than
> just a few hours to convert C programms between different
> platforms and compilers).

  I do have to wonder what other programmers mean by ``portability.'' C is
portable, in both the compiler and programs, but it comes at some cost.
Portability of programs is possible but most programmers don't have the
skill or knowledge to do it right. Mainly, it comes down to knowing the
language; what is allowed, and what isn't. For reference, I developed a
metasearch engine in C under Unix (Linux and IRIX initially). I paid
careful attention towards portability. As a result, the port to Windows
only required one line of code to be changed.

  But I've been programming under multiple platforms in C for 10 years now.
I've learned a thing or two about writing portable code. It's surprising
how little effort it actually takes to write portable code, but it does take
a different mind set that most programmers (in my experience) don't have.

> No, don't tell me that's just optimisation. It isn't. C unlike
> almost all other HLLs implies a specific CPU Model.

  Quite true. And there are optimizations that can't be performed because
of the language specification.

> If the CPU
> is not build acording to this model all code will suffer - even
> if the CPU offers all stuff needed to do the task. Lets take
> (the often used - but realy simple) example of string handling.
> Let's assume you want to move a 10 byte string. C strings are
> by definition byte orientated and terminated by a byte containing
> the NIL code (x'00').

  Careful, strings in C are *character* based (what's this ``byte'' you keep
talking about? 8-) This is one area where programmers don't quite grasp
portability issues. While it's true that characters in C must be at least 8
bits in size, that doesn't mean they *must* be 8 bits in size; an
implementation of C that uses Unicode natively could set the size of a
character to 16 bits (and there is also the issue of whether the character
is signed or unsigned---a plain char declaration is unspecified---it's
implementation dependant whether a char is signed or unsigned).

> Instead of
> a single MVC to move the 10 bytes, these 5 instructions have
> to be executed 10 times - resulting in 50 instructions instead
> of one - do you consider this as a good portable ? Even
> a UCSD P-Code machine can do better (and is an example of
> a even faster way to port a language).

  That's because Pascal uses counted strings---the first character of a
string is the count. Problem there is that strings are limited in size to
UCHAR_MAX. Another problem is getting that count to a usuable format. The
Intel 386 class takes a penalty if you execute 16-bit instructions in a
32-bit segment (or vice-versa). So, to move a counted string, you have:

        ; assume ESI and EDI point to src and dest respectively
                lodsb ; $AC
                cbw ; $66 98
                cwde ; $98
                mov ecx,eax ; $89 C8
                rep movsb ; $F2 A4

  Well, there is:

                movzx ecx,byte ptr [esi]
                rep movsb

  But that MOVZX isn't cheap---speed or sizewise. And besides, you want the
copy to take place as fast as possible, which means word based moves if
possible, which means more instructions to determine if the source is word
aligned, destination is word aligned, how many words to move, move any
remainder, etc etc (in fact, on the Pentiums, it appears to be faster to use
floating point loads/stores to move memory than register loads/stores---64
bit word transfers vs. 32 bit transfers).

  The ANSI C spec states that the Standard C functions can be understood by
the compiler and treated specially. At least in the 386 line, most str*()
and mem*() functions compile to inline code and avoid the function call
overhead (a friend of mine actually triggered a bug in GCC using nested
strcpy() calls).

> Controll of hardware ? My memory may be fading, just I can not
> see any reference to hardware controll in my K&R copy. All
> hardware dependant stuff is proprietary to the compiler you
> are using. And that's the same way as for example in PASCAL

  True---but it depends upon how the hardware is hooked up to the CPU---is
it memory mapped I/O or I/O mapped I/O? If the former, you just declare a
pointer to the memory location (mapped to the appropriate size) and go. If
it's I/O mapped I/O there is probably a wrapper function that the compiler
knows about and can inline.

  But C (the actual language) never defined built-in IO functions, leaving
I/O to subroutines (or functions). WRITELN is a language element of Pascal,
but printf() is just a function. Depending upon your view, that is either a
good thing or a bad thing (I think the lack of I/O statements in C is an
elegant solution myself).

> Things like messing up the whole programm by one wrong ; or }
> (something impossible on Assembly) or easyly produce memory
> leaks (hard to do on other HLL).

  Depends upon what you're used to. Pascal uses those pesky semicolons as
well, along with those annoying BEGIN and END statements. Assembly on the
other hand, is fairly structured and tend to avoid the cascade of errors
prone to compilers (although Microsoft's MASM is also prone to cascade
errors).

> First a language should help me writing a program - thats
> I 'just' have to make my idea clear and the compiler has to
> generate code - best possible code of course. To reach this
> goal I need rather sophisticated construct and not only the
> most basic ones - this also goes with the good code - if a
> language like C offers onla the most basic operations it is
> (almost) impossible for the compiler to guess what I'm about
> to do and optimize the code accordingly.

  It also depends upon the library of code you call. Over the past few
years I've been working on C code that allows me to do the following:

        MLLexer htmldoc; /* `ML' stands for Markup Language */
        MLToken token;

        htmldoc = DocumentOpen("http://www.conman.org/people/spc/");

        while(MLLexerNext(htmldoc,token) != EOF)
        {
          if ((MLTokenType(token) == T_TAG) && StringE(MLTokenValue(token),"A"))
          {
            /* get link info ... blah blah blah ... */
          }
        }

        DocumentClose(htmldoc);

  C code. Lot of work to get to this point though.
        
> With usable higher
> elements I may give more hints on my intention and the compiler
> may generate better machine code - example: C doesn't include
> any operation to move data other than the very basic elements
> (Byte, Int , Float). So if I order the programm to move a
> structure I use a loop to copy byte by byte - hard for a
> compiler to optimize this into a block move operation

  You're not using C then. While it's possible to do:

        char *pd = destpointer;
        char *ps = srcpointer;

        for (i = 0 ; i < sizeof(somestruct) ; i++)
                *pd++ = *ps++;

  That's going about things the hard way. Why not:

        memcpy(destpointer,srcpointer,sizeof(somestruct));

  Or even:

        *destpointer = *srcpointer;

  Yes, ANSI C allows the assignment of structures, and even allows functions
to return whole structures and not just pointers to them.

> And second: A programming trick is like Laws or Locks - it
> is about keeping honest people honest. If the effort to do
> it the bad / wrong way is bigger than to do it right, the
> number of 'faults' will be way reduced.

  I can see why you like Ada then 8-) Python is also good with respect to
Bondange and Discipline languages---it enforces good indentation.

> Dick, you say you use C to generate the basic stuff. Thats
> fine, just I can't see what an Assembler can't do. Let's
> first agree that Assembler equals to Macro Assembler - and
> than it's just a bit of _one_time_ work to make any C
> compimer senseless.

  One thing---I can't write Assembly on linus.slab.conman.org (an AMD 586)
and have it run on tweedledum.slab.conman.org (68040). C at least lets me
write code that will run on both machines.

  Not that I think C is the Be-All-End-All of languages. It's not and there
are limitations I hit all the time, but I've yet to come across a lanaugage
that is everything I want in a langauge.

  -spc (Pretty much convinced that programmers can't 8-)
Received on Sat Mar 11 2000 - 15:14:35 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:33:05 BST