82

What was the reasoning behind not explicitly storing an array's length with an array in C?

The way I see it, there are overwhelming reasons to do so but not very many in support of the standard (C89). For instance:

  1. Having length available in a buffer can prevent buffer overrun.
  2. A Java-style arr.length is both clear and avoids the programmer from having to maintain many ints on the stack if dealing with several arrays
  3. Function parameters become more cogent.

But perhaps the most motivating reason, in my opinion, is that usually, no space is saved without keeping the length. I would venture to say that most uses of arrays involve dynamic allocation. True, there may be some cases where people use an array allocated on the stack, but that's just one function call* - the stack can handle 4 or 8 bytes extra.

Since the heap manager has to track the free block size used up by the dynamically allocated array anyway, why not make that information usable (and add the additional rule, checked at compile time, that one can't manipulate the length explicitly unless one would like to shoot oneself in the foot).

The only thing I can think of on the other side is that no length tracking may have made compilers simpler, but not that much simpler.

*Technically, one could write some kind of recursive function with an array with automatic storage, and in this (very elaborate) case storing the length may indeed result in effectively more space usage.

VF1
  • 1,891
  • 6
    I suppose it could be argued, that when C included using structs as parameter and return value types, it should have included syntactic sugar for "vectors" (or whatever name), which would underneath be struct with length and either array or pointer to array. Language level support for this common construct (also when passed as separate arguments and not single struct) would have saved countless bugs and simplified standard library too. – hyde Apr 28 '14 at 19:38
  • 3
    You might also find Why Pascal is Not My Favorite Programming Language Section 2.1 to be insightful. –  Apr 28 '14 at 20:55
  • 34
    While all the other answers have some interesting points, I think the bottom line is that C was written so assembly language programmers would be able to write code easier and have it be portable. With that in mind, having an array length stored WITH an array automatically would have been a nuisance and not a shortcoming (as would have other nice candy-coating desires). These features seem nice nowadays, but back then it really was frequently a struggle to squeeze one more byte of either program or data into your system. Wasteful use of memory would have severely limited C's adoption. – Dunk Apr 28 '14 at 21:11
  • 1
  • 6
    The real part of your answer has already been answered many times in the way I would have, but I can extract a different point: "Why can't the size of a malloc()ed area be requested in a portable way?" That's a thing which makes me wonder several times. – glglgl Apr 29 '14 at 05:56
  • 5
    Voting to reopen. There's some reason somewhere, even if it's simply "K&R didn't think of it". – Telastyn Apr 30 '14 at 02:54
  • 2
    How is the question "What are the design reasons?" primarily opinion-based as those who closed it suggest? – VF1 Apr 30 '14 at 03:16
  • Bad analogy time: Why don't standard pencils have a cap or an eraser? – haylem Apr 30 '14 at 13:23
  • 1
    @MichaelT I'm not sure someone who would ask this question will be mollified by the claim that "The problem is how to handle the string argument of 'index'. The calls 'index('hello',c)' and 'index('goodbye',c)' cannot both be legal, since the strings have different lengths." Consider that he referenced Java, which it is, in fact, quite easy to write functions that operate on string literals of different lengths. – Casey Apr 30 '14 at 13:37
  • 1
    Well, C was pretty much meant to be a portable assembler replacement, wasn't it? There's been a lot of tricks to get the most out of the HW of the time (like the famous REPNEZ that gave us null terminated strings). Doing something that wasn't necessary most of the time wasn't an option back then, and the code was much shorter. The silly thing isn't that C didn't have length-prefixed arrays and bound checking - the silly thing was not switching to a different language (or improving C) when the performance hit no longer mattered (removing bounds checking only for performance critical spots). – Luaan Apr 30 '14 at 14:12
  • You have an statement in your question that isn't true. The heap manager does NOT have to track the length of each allocation. A slab allocator only has to know what slab the allocation is in. A 48 byte allocation goes in a 64 byte slab, but it could be 33 bytes or 51 bytes, the manager doesn't care. – Zan Lynx Apr 30 '14 at 23:03
  • 1
    @ZanLynx Good point. I was simplifying the situation to get my point across. The point is that no space would be wasted. Either the allocator already tracks the size, or it has room to do so. – VF1 May 01 '14 at 00:27

10 Answers10

109

C arrays do keep track of their length, as the array length is a static property:

int xs[42];  /* a 42-element array */

You can't usually query this length, but you don't need to because it's static anyway – just declare a macro XS_LENGTH for the length, and you're done.

The more important issue is that C arrays implicitly degrade into pointers, e.g. when passed to a function. This does make some sense, and allows for some nice low-level tricks, but it loses the information about the length of the array. So a better question would be why C was designed with this implicit degradation to pointers.

Another matter is that pointers need no storage except the memory address itself. C allows us to cast integers to pointers, pointers to other pointers, and to treat pointers as if they were arrays. While doing this, C is not insane enough to fabricate some array length into existence, but seems to trust in the Spiderman motto: with great power the programmer will hopefully fulfill the great responsibility of keeping track of lengths and overflows.

amon
  • 134,135
  • 13
    I think you mean to say, if I am not mistaken, that C compilers keep track of static array lengths. But this does no good for functions which just get a pointer. – VF1 Apr 28 '14 at 15:59
  • 25
    @VF1 yes. But the important thing is that arrays and pointers are different things in C. Assuming you're not using any compiler extensions, you can't generally pass an array itself to a function, but you can pass a pointer, and index a pointer as if it were an array. You're effectively complaining that pointers have no length attached. You should be complaining that arrays can't be passed as function arguments, or that arrays degrade to pointers implicitly. – amon Apr 28 '14 at 16:03
  • 1
    Fair enough - makes sense. – VF1 Apr 28 '14 at 16:08
  • 1
    I think this limitation (arrays are passed as pointers) stems from the fact that the original C only allowed function arguments and return values that fit in a processor register, i.e. simple types and pointers. If I remember correctly, in earlier versions of C one couldn't even pass a struct as a pointer, but had to pass a pointer to a struct. This limitation saved stack space and saved the time needed to copy the arguments onto the stack. – Giorgio Apr 28 '14 at 18:23
  • 37
    "You can't usually query this length" -- actually you can, it's the sizeof operator -- sizeof(xs) would return 168 assuming int's are four bytes long. To get the 42, do: sizeof(xs) / sizeof(int) – tcrosley Apr 28 '14 at 19:23
  • @tcrosley "usually" means every case where one can't use sizeof, I'm guessing, which is only useful if the array variable in question was created within the compiler's scope at compile-time. – VF1 Apr 28 '14 at 20:21
  • 15
    @tcrosley That only works within the scope of the array declaration - try passing xs as a param to another function then see what sizeof(xs) gives you... – Gwyn Evans Apr 28 '14 at 20:23
  • 1
    @GwynEvans then pass sizeof(xs) as another parameter if the called function needs the information. – tcrosley Apr 28 '14 at 20:35
  • 26
    @GwynEvans again: pointers are not arrays. So if you “pass an array as param to another function”, you aren't passing an array but a pointer. Claiming that sizeof(xs) where xs is an array would be something different in another scope is blatantly false, because the design of C does not allow arrays to leave their scope. If sizeof(xs) where xs is an array is different from sizeof(xs) where xs is a pointer, that comes as no surprise because you are comparing apples with oranges. – amon Apr 28 '14 at 20:47
  • 11
    @anon I understand what you mean, and, why, but I think you missed that I was trying to highlight the 'unexpected' behaviour that will be seen in the situations such as "int getSize(char a[]) { return sizeof(a); }" – Gwyn Evans Apr 28 '14 at 21:54
  • 4
    @tcrosley - I like sizeof(x)/sizeof(x[0]) -- explaining how that works was one of my favorite interview questions for C programmers. Surprisingly few understood how sizeof worked. – TomG Apr 29 '14 at 02:12
  • 1
    It would have been nice if they had made a difference between int getSize(char * a) { return sizeof(a); } returning the size of a pointer and int getSize(char a[4]) { return sizeof(a); } returning the size of the array the pointer is claimed to point to, reverting the decay into a pointer. – glglgl Apr 29 '14 at 05:56
  • @TomG yes, your construction is nicer, you don't have to know the type of the array in your case. I'll have to remember that. – tcrosley Apr 29 '14 at 07:56
  • 4
    “array length is a static property”— that depends on how you define “static property”. Since C99, int array[variable]; works as well. Still, sizeof(array) works within the same function as expected. – Holger Apr 30 '14 at 15:35
  • @amon I was under the impression that array length was stored one element prior the beginning (ie. @ arr[-1]). – KeyC0de Nov 04 '18 at 15:28
  • 1
    @Nik-Lz A compiler might do that as an implementation detail. In particular, if you malloc() some memory, the runtime might somewhere store the size of the allocated memory. But all of that is an implementation detail, and unnecessary except for VLAs (where we need to keep track of the stack frame size, not necessarily the size of the array). Accessing an out of bounds element like arr[-1] is UB. – amon Nov 04 '18 at 15:38
39

A lot of this had to do with the computers available at the time. Not only did the compiled program have to run on a limited resource computer, but, perhaps more importantly, the compiler itself had to run on these machines. At the time Thompson developed C, he was using a PDP-7, with 8k of RAM. Complex language features that didn't have an immediate analog on the actual machine code were simply not included in the language.

A careful read through the history of C yields more understanding into the above, but it wasn't entirely a result of the machine limitations they had:

Moreover, the language (C) shows considerable power to describe important concepts, for example, vectors whose length varies at run time, with only a few basic rules and conventions. ... It is interesting to compare C's approach with that of two nearly contemporaneous languages, Algol 68 and Pascal [Jensen 74]. Arrays in Algol 68 either have fixed bounds, or are `flexible:' considerable mechanism is required both in the language definition, and in compilers, to accommodate flexible arrays (and not all compilers fully implement them.) Original Pascal had only fixed-sized arrays and strings, and this proved confining [Kernighan 81].

C arrays are inherently more powerful. Adding bounds to them restricts what the programmer can use them for. Such restrictions may be useful for programmers, but necessarily are also limiting.

Adam Davis
  • 3,856
  • 4
    This pretty much nails the original question. That and the fact that C was being kept deliberately "light touch" when it came to checking what the programmer was doing, as part of making it attractive for writing operating systems. – ClickRick Apr 28 '14 at 21:37
  • 5
    Great link, they also explicitly changed storing the length of strings to use a delimiter to avoid the limitation on the length of a string caused by holding the count in an 8- or 9-bit slot, and partly because maintaining the count seemed, in our experience, less convenient than using a terminator - well so much for that :-) – Voo Apr 29 '14 at 12:46
  • 5
    The unterminated arrays also fits with the bare metal approach of C. Remember that the K&R C book is less than 300 pages with a language tutorial, reference and a list of the standard calls. My O'Reilly Regex book is nearly twice as long as K&R C. – Michael Shopsin Apr 29 '14 at 15:39
22

Back in the day when C was created, and extra 4 bytes of space for every string no matter how short would have been quite a waste!

There's another issue - remember that C is not object-oriented, so if you do length-prefix all strings, it would have to be defined as a compiler intrinsic type, not a char*. If it was a special type, then you would not be able to compare a string to a constant string, i.e.:

String x = "hello";
if (strcmp(x, "hello") == 0) 
  exit;

would have to have special compiler details to either convert that static string to a String, or have different string functions to take account of the length prefix.

I think ultimately though, they just didn't choose the length-prefix way unlike say Pascal.

Govind Parmar
  • 341
  • 3
  • 13
gbjbaanb
  • 48,585
  • 6
  • 103
  • 173
  • Well, it would have been only 4 "back in the day," and as I mentioned this would only apply to auto storage (or I suppose rodata in this case). But does anyone really have that many const char*s? – VF1 Apr 28 '14 at 15:53
  • I think there are many instances of static character arrays in C programs - think of all those macros and constant string compares. If all strings had to be length-prefixed, you wouldn't be able to compare one to a constant string... hmm, I feel an edit! – gbjbaanb Apr 28 '14 at 16:01
  • Ok, perhaps modern standards make me appreciate space less. – VF1 Apr 28 '14 at 16:03
  • @VF1 my first computer had 12k of RAM. (not Megabytes, kilobytes). It was enough to run space invaders, if you wrote it in assembly. I later got a 10k RAM expansion and could run it in "high" resolution. Today I have 4Gb RAM on my work laptop and its not quite enough to run VS :-( – gbjbaanb Apr 28 '14 at 16:10
  • Well, that still wouldn't be a problem - one could simply have strcmp skip the length prefix, which should be possible with overloads. – VF1 Apr 28 '14 at 16:10
  • @VF1 ah, but if you don;t have the length prefix on your static strings, how long are they? After all, to a compiler, "hello" is not much different to "hellohsg$d*.pfj;hgf" – gbjbaanb Apr 28 '14 at 16:13
  • 10
    Bounds checking also takes time. Trivial in today's terms, but something people paid attention to when they cared about 4 bytes. – Gort the Robot Apr 28 '14 at 19:44
  • 18
    @StevenBurnap: it's not that trivial even today if you are in an inner loop that goes over every pixel of a 200 MB image. In general, if you are writing C you want to go fast, and you don't want to waste time in a useless bound check at every iteration when your for loop was already set up to respect the boundaries. – Matteo Italia Apr 28 '14 at 20:59
  • Yes. In C, you push whatever type checking you can to the compile stage. – Gort the Robot Apr 28 '14 at 21:09
  • 4
    @VF1 "back in the day" it could well have been two bytes (DEC PDP/11 anyone?) – ClickRick Apr 28 '14 at 21:26
  • @StevenBurnap nails it. C, in a lot of ways, a portable shorthand for assembly language. You get very fast code out of it, but the tradeoff is that you have to understand exactly what you're doing. The language isn't going to prevent your mistakes beyond what the compiler complains about. – Blrfl Apr 28 '14 at 22:46
  • 1
    @MatteoItalia If you have a for loop that respects the boundaries, it's trivial to eliminate the bounds checking in that loop for the loop counter. Today if you're not using a compiler that can do that, you don't actually care if it goes fast. – prosfilaes Apr 29 '14 at 08:12
  • 1
    @prosfilaes: I wished that was so trivial, but it's not. Just the other week I had some code that had a bound-checked [] operator, which threw an std::out_of_range in case it failed. It slowed down everything (about 2x) because, due to bound checking and exception preparation, the method got bigger than some threshold, so the compiler refused to inline it. The result was one function call for each array access and bound checking always in place, even in already correct loops. IIRC both clang 3.4 and g++ 4.8 exhibited this problem. – Matteo Italia Apr 29 '14 at 08:26
  • @MatteoItalia Which is not really a problem that an array using a compiler for a language that had bounds-checking built-in would have. I highly suspect that you could not get GNAT to fail in the same way, because array indexing is built into Ada and wouldn't be emitted as an out-of-line function call in any case. – prosfilaes Apr 29 '14 at 09:20
  • 1
    @MatteoItalia or you add a compiler flag that turns them off wholesale (and/or #pragmas) – ratchet freak Apr 29 '14 at 09:37
  • 7
    Its not just "back in the day". The for the software that C is targetted at as a "portable assembly language" such as OS kernals, device drivers, embedded real time software etc.etc. wasting half a dozen instructions on bounds checking does matter, and, in many cases you need to be "out of bounds" (how could you write a debugger if you could not randomly access another programs storage?). – James Anderson Apr 29 '14 at 09:57
  • 3
    This is actually a rather weak argument considering that BCPL had length counted arguments. Just as Pascal though that was limited to 1 word so generally 8 or 9 bits only, which was a bit limiting (it also precludes the possibility to share parts of strings, although that optimization was probably way too advanced for the time). And declaring a string as a struct with a length followed by the array really wouldn't need special compiler support.. – Voo Apr 29 '14 at 13:21
  • algol arrays likewise predate C and had length encoded. this answer simply ignores history. – Pete Kirkham Jun 20 '14 at 20:37
11

In C, any contiguous subset of an array is also an array and can be operated on as such. This applies both to read and write operations. This property would not hold if the size was stored explicitly.

MSalters
  • 8,802
  • 6
    "The design would be different" is not a reason against the design being different. – VF1 Apr 28 '14 at 20:25
  • 7
    @VF1: Have you ever programmed in Standard Pascal? C's ability to be reasonably flexible with arrays was a huge improvement over assembly (no safety whatsoever) and the first generation of typesafe languages (overkill typesafety, including exact array bounds) – MSalters Apr 28 '14 at 20:30
  • 5
    This ability to slice an array is indeed a massive argument for the C89 design. –  Apr 28 '14 at 20:37
  • Old school Fortran hackers also ma[dk]e good use of this property (albeit, it requires passing the slice to an array in Fortran). Confusing and painful to program or debug, but fast and elegant when working. – dmckee --- ex-moderator kitten Apr 28 '14 at 21:22
  • 5
    There is one interesting design alternative that allows slicing: Don't store the length alongside the arrays. For any pointer to an array, store the length with the pointer. (When you just have a real C array, the size is a compile time constant and available to the compiler.) It takes more space, but allows slicing while maintaining the length. Rust does this for the &[T] types, for example. –  Apr 29 '14 at 10:43
  • @delnan: Separating out things that can accept slices from things that can't, and requiring things that can accept slices to receive two pieces of information would slightly impair the efficiency of things that can accept slices, but would improve the efficiency of things that can't since a compiler would know that array1[4] and array2[3] couldn't alias. – supercat Jan 29 '16 at 23:20
8

The biggest problem with having arrays tagged with their length is not so much the space required to store that length, nor the question of how it should be stored (using one extra byte for short arrays generally wouldn't be objectionable, nor would using four extra bytes for long arrays, but using four bytes even for short arrays might be). A much bigger problem is that given code like:

void ClearTwoElements(int *ptr)
{
  ptr[-2] = 0;
  ptr[2] = 0;
}
void blah(void)
{
  static int foo[10] = {1,2,3,4,5,6,7,8,9,10};
  ClearTwoElements(foo+2);
  ClearTwoElements(foo+7);
  ClearTwoElements(foo+1);
  ClearTwoElements(foo+8);
}

the only way that code would be able to accept the first call to ClearTwoElements but reject the second would be for the ClearTwoElements method to receive information sufficient to know that in each case it was receiving a reference to part of the array foo in addition to knowing which part. That would typically double the cost of passing pointer parameters. Further, if each array was preceded by a pointer to an address just past the end (the most efficient format for validation), optimized code for ClearTwoElements would likely become something like:

void ClearTwoElements(int *ptr)
{
  int* array_end = ARRAY_END(ptr);
  if ((array_end - ARRAY_BASE(ptr)) < 10 ||
      (ARRAY_BASE(ptr)+4) <= ADDRESS(ptr) ||          
      (array_end - 4) < ADDRESS(ptr)))
    trap();
  *(ADDRESS(ptr) - 4) = 0;
  *(ADDRESS(ptr) + 4) = 0;
}

Note that a method caller could, in general, perfectly legitimately pass a pointer to the start of the array or the last element to a method; only if the method tries to access elements which go outside passed-in array would such pointers cause any trouble. Consequently, a called method would have to first ensure the array was large enough that the pointer arithmetic to validate its arguments won't itself go out of bounds, and then do some pointer calculations to validate the arguments. The time spent in such validation would likely exceed the cost spent doing any real work. Further, the method could likely be more efficient if it was written and called:

void ClearTwoElements(int arr[], int index)
{
  arr[index-2] = 0;
  arr[index+2] = 0;
}
void blah(void)
{
  static int foo[10] = {1,2,3,4,5,6,7,8,9,10};
  ClearTwoElements(foo,2);
  ClearTwoElements(foo,7);
  ClearTwoElements(foo,1);
  ClearTwoElements(foo,8);
}

The concept of a type which combines something to identify an object with something to identify a piece thereof is a good one. A C-style pointer is faster, however, if it's not necessary to perform validation.

supercat
  • 8,445
  • 23
  • 28
  • If arrays had runtime size, then pointer to array would be fundamentally different from pointer to an element of array. Latter might not be directly convertible to former at all (without creating new array). [] syntax might still exist for pointers, but it would be different than for these hypothetical "real" arrays, and the problem you describe would probably not exist. – hyde Apr 28 '14 at 19:49
  • @hyde: The question is whether arithmetic should be allowed on pointers whose object base address is unknown. Also, I forgot another difficulty: arrays within structures. Thinking about it, I'm not sure there would be any to have a pointer type which could point to an array stored within a structure, without requiring each pointer to include not only the address of the pointer itself, but also upper and lower legal ranges it can access. – supercat Apr 28 '14 at 20:18
  • Interseting point. I think that this still reduces to amon's answer, though. – VF1 Apr 28 '14 at 20:30
  • The question asks about arrays. Pointer is memory address and would not change with question's premise, as far as understand the intention. Arrays would get length, pointers would be unchanged (except pointer to array would need to be a new, distinct, unique type, much like pointer to struct). – hyde Apr 28 '14 at 20:36
  • @hyde: If one sufficiently changed the semantics of the language, it might be possible to have arrays include an associated length, though arrays stored within structures would pose some difficulties. With semantics as they are, array bounds-checking would only be useful if that same checking applied to pointers to array elements. – supercat Apr 28 '14 at 20:40
  • The way I see this could have gone is, first strip arrays out of the language (leave pointers and memory blocks allocated with malloc and standardized alloca, and maybe the [] syntactic sugar for *(ptr+ind). This does not really change semantics much, I think. Then add arrays back with new array semantics which include length, and resembling structs more than raw pointers. – hyde Apr 28 '14 at 20:49
  • I guess another approach would be making all pointers boundary aware, essentially storing 3 values (start, current, end), and this is what you are probably thinking. Yeah, I agree that would be pretty big change to the language, and unfeasible (except maybe as extra type qualifier, checked char *ptr or something like that). – hyde Apr 28 '14 at 20:59
  • @hyde: A pointer-triple type would be a useful language feature, but those in control of the language seem hostile to innovation (and also to embedded or systems programming, it seems). – supercat Feb 10 '17 at 18:23
  • @supercat, instead of ptr[-2] ptr[2], why not do it with ptr[0] ptr[4]? – Pacerier Jul 22 '21 at 01:12
  • @Pacerier: My point was that the rules of the language would allow the former in cases where the function receives a pointer to the third or subsequent element, and accurately validating pointer index computations would require keeping track, for each pointer, of how many elements precede and follow the identified address. – supercat Jul 22 '21 at 14:06
7

Short answer:

Because C is a low-level programming language, it expects you to take care of these issues yourself, but this adds greater flexibility in exactly how you implement it.

C has a compile-time concept of an array that is initialised with a length but at runtime the whole thing is simply stored as a single pointer to the start of the data. If you want to pass the array length to a function along with the array, you do it yourself:

retval = my_func(my_array, my_array_length);

Or you could use a struct with a pointer and length, or any other solution.

A higher level language would do this for you as part of its array type. In C you're given the responsibility of doing this yourself, but also the flexibility to choose how to do it. And if all the code you're writing already knows the length of the array, you don't need to pass the length around as a variable at all.

The obvious drawback is that with no inherent bounds checking on arrays passed around as pointers you can create some dangerous code but that is the nature of low level/systems languages and the trade-off they give.

thomasrutter
  • 2,301
  • 1
    +1 "And if all the code you're writing already knows the length of the array, you don't need to pass the length around as a variable at all." – 林果皞 Sep 10 '15 at 16:40
  • If only the pointer+length struct had been baked into the language and standard library. So many security holes could have been avoided. – CodesInChaos Apr 08 '16 at 13:42
  • Then it wouldn't really be C. There are other languages that do that. C gets you low level. – thomasrutter Apr 12 '16 at 00:38
  • C was invented as a low-level programming language, and many dialects still support low-level programming, but many compiler writers favor dialects which can't really be called low-level languages. They allow and even require low-level syntax, but then try to infer higher-level constructs whose behavior may not match the semantics implied by the syntax. – supercat Feb 10 '17 at 18:20
7

One of the fundemental differences between C and most other 3rd generation languages, and all more recent languages that I am aware of, is that C was not designed to make life easier or safer for the programmer. It was designed with the expectation that the programmer knew what they were doing and wanted to do exactly and only that. It does not do anything 'behind the scenes' so you do not get any surprises. Even compiler level optimisation is optional (unless you use a Microsoft compiler).

If a programmer wants to write bounds checking in their code, C makes it is simple enough to do it, but the programmer must choose to pay the corresponding price in terms of space, complexity and performance. Even though I haven't used it in anger for many years, I still use it when teaching programming to get across the concept of constraint based decision making. Basically, that means you can choose to do anything you want, but every decision you make has a price that you need to be aware of. This becomes even more important when you starting telling others what you want their programs to do.

  • 3
    C wasn't so much "designed" as it evolved. Originally, a declaration like int f[5]; wouldn't create f as a five-item array; instead, it was equivalent to int CANT_ACCESS_BY_NAME[5]; int *f = CANT_ACCESS_BY_NAME;. The former declaration could be processed without the compiler having to really "understand" array times; it simply had to output an assembler directive to allocate space and could then forget that f ever had anything to do with an array. The inconsistent behaviors of array types stem from this. – supercat Apr 29 '14 at 18:20
  • 1
    Turns out that no programmers know what they're doing to the degree that C requires. – CodesInChaos Apr 08 '16 at 13:41
5

The problem of the extra storage is an issue, but in my opinion a minor one. After all, most of the time you are going to need to track the length anyway, although amon made a good point that it can often be tracked statically.

A bigger problem is where to store the length and how long to make it. There isn't one place that works in all situations. You might say just store the length in the memory just before the data. What if the array isn't pointing to memory, but something like a UART buffer?

Leaving the length out allows the programmer to create his own abstractions for the appropriate situation, and there are plenty of ready made libraries available for the general purpose case. The real question is why aren't those abstractions being used in security-sensitive applications?

Karl Bielefeldt
  • 147,435
  • 1
    You might say just store the length in the memory just before the data. What if the array isn't pointing to memory, but something like a UART buffer? Could you please explain this a little bit more? Also that something that might happen too often or it's just a rare case? – Mahdi May 02 '14 at 10:45
  • If I had designed it, a function argument written as T[] wouldn't be equivalent to T* but rather pass a tuple of pointer and size to the function. Fixed size arrays could decay to such an array slice, instead of decaying to pointers as they do in C. The main advantage of this approach isn't that it's safe by itself, but that's a convention on which everything, including the standard library can build. – CodesInChaos Apr 08 '16 at 13:50
1

From The Development of the C Language:

Structures, it seemed, should map in an intuitive way onto memory in the machine, but in a structure containing an array, there was no good place to stash the pointer containing the base of the array, nor any convenient way to arrange that it be initialized. For example, the directory entries of early Unix systems might be described in C as
struct {
    int inumber;
    char    name[14];
};
I wanted the structure not merely to characterize an abstract object but also to describe a collection of bits that might be read from a directory. Where could the compiler hide the pointer to name that the semantics demanded? Even if structures were thought of more abstractly, and the space for pointers could be hidden somehow, how could I handle the technical problem of properly initializing these pointers when allocating a complicated object, perhaps one that specified structures containing arrays containing structures to arbitrary depth?

The solution constituted the crucial jump in the evolutionary chain between typeless BCPL and typed C. It eliminated the materialization of the pointer in storage, and instead caused the creation of the pointer when the array name is mentioned in an expression. The rule, which survives in today's C, is that values of array type are converted, when they appear in expressions, into pointers to the first of the objects making up the array.

That passage addresses why array expressions decay to pointers in most circumstances, but the same reasoning applies to why the array length isn't stored with the array itself; if you want a one-to-one mapping between the type definition and its representation in memory (as Ritchie did), then there's no good place to store that metadata.

Also, think about multidimensional arrays; where would you store the length metadata for each dimension such that you could still walk through the array with something like

T *p = &a[0][0];

for ( size_t i = 0; i < rows; i++ )
  for ( size_t j = 0; j < cols; j++ )
    do_something_with( *p++ );
John Bode
  • 10,856
-2

The question assumes that there are arrays in C. There aren't. Things that are called arrays are just a syntactic sugar for operations on continuous sequences of data and pointer arithmetics.

The following code copies some data from src to dst in int-sized chunks not knowing that it is actually character string.

char src[] = "Hello, world";
char dst[1024];
int *my_array = src; /* What? Compiler warning, but the code is valid. */
int *other_array = dst;
int i;
for (i = 0; i <= sizeof(src)/sizeof(int); i++)
    other_array[i] = my_array[i]; /* Oh well, we've copied some extra bytes */
printf("%s\n", dst);

Why C is so simplified it doesn't have proper arrays? I don't know correct answer to this new question. But some people often say that C is just (somewhat) more readable and portable assembler.

aragaer
  • 656
  • 5
  • 12
  • 2
    I don't think you've answered the question. – Robert Harvey Apr 28 '14 at 15:47
  • 2
    What you said is true, but the person asking wants to know why this is the case. –  Apr 28 '14 at 15:53
  • IMHO "why" is that there is just no such thing as "array" other than language construct. – aragaer Apr 28 '14 at 15:54
  • I would like to clarify that I am indeed interested in the reasoning behind the C design decision as Robert Harvey implied. I am aware of how C arrays work presently. – VF1 Apr 28 '14 at 15:56
  • As far as I remember, C runtime has only one thing - all static variables are 0. After that everything is just pointers and bytes - there is no runtime construct called "array". Even dynamic allocation is implemented in a library. – aragaer Apr 28 '14 at 16:02
  • In other words: C is so simple it doesn't have arrays. The thing that is called "array" is just some syntactic sugar for pointers. – aragaer Apr 28 '14 at 16:12
  • 1
    But - Why? The question is why is C designed this way and not like Java/C# (or other language with string length encoded). – Petter Nordlander Apr 28 '14 at 16:13
  • So the question has changed from "Why C arrays have no length property?" to "Why C is so simplified it doesn't even has proper arrays?". Unfortunately I don't know the correct answer to that new question. – aragaer Apr 28 '14 at 16:28
  • 9
    Remember, one of the nicknames for C is "portable assembly." While newer versions of the standard have added higher level concepts, at its core, it consists of simple low level constructs and instructions that are common across most non-trivial machines. This drives most of the design decisions made in the language. The only variables that exist at runtime are integers, floats, and pointers. Instructions include arithmetic, comparisons, and jumps. Pretty much everything else is a thin layer build on top of that. –  Apr 28 '14 at 16:47
  • 8
    It's wrong to say C has no arrays, considering how you really can't generate same binary with other constructs (well, at least not if you consider use of #defines for determining array sizes). Arrays in C are "continuous sequences of data", nothing sugary about it. Using pointers like they were arrays is the syntactic sugar here (instead of explicit pointer arithmetic), not arrays themselves. – hyde Apr 28 '14 at 19:28
  • 2
    Yes, consider this code: struct Foo { int arr[10]; }. arr is an array, not a pointer. – Gort the Robot Apr 28 '14 at 21:31
  • "syntactic sugar for operations on continuous sequences of data". How would you call these continuous sequences of data, if not "arrays"? – glglgl Apr 29 '14 at 05:49
  • char buf[80*sizeof(int)]; int *array = buf; char (*lines)[80] = buf; - How many arrays are here? What are their element types? What are their lengths? – aragaer Apr 29 '14 at 07:05
  • @aragaer There is one array, buf, with element type char and length 80*sizeof(int). array, despite the name, is a pointer-to-char. lines is a pointer-to-array-of-80-char (which is not an array for the same reason float *data; is not a float). –  Apr 29 '14 at 10:45
  • For me there's one chunk of 80*sizeof(int) bytes and I'm free to do anything with it. I do have two separate pointers but I can use array notation to write a code that looks like I'm working with either array of 80 ints or array of sizeof(int) arrays of char. I could have used malloc instead of declaring an initial array. The only thing that would change is the location of that chunk of data. My point is - there are arrays in code, but once it's compiled there is no such thing as array anymore. – aragaer Apr 29 '14 at 11:33