24

I am learning programming on my own following standard programming language textbooks. I come from a math background. I learned C on my own, but never really got to the more advanced portions.

I tried taking a coding course in college for CS wannabe majors who have no coding experiences. They keep talking about the need for commenting and documenting your code. I never really appreciate or understand why, since in math, we are taught to produce proofs on our own and read proofs written by others. If the proofs are written badly or unclear, we usually go ask someone or the proof's author for clarifications.

In the case of programming, shouldn't programmers be able to read and understand a program's code syntax even without any kind of handholding, such as commenting in any kind of written form?

I know I am asking a very naive question and many of you think I am speaking from zero experience. I accept such criticisms. When I was taking the programming course, there were never examples of how to write documentations. Like, how detailed does it have to be? Does there need to be an English explanation for every line of code? How do I know what is good documentation versus bad documentation? There were not really examples being shown to us and about how to write well documentable code comments.

5anya
  • 3
  • 1
Seth
  • 381
  • 1
  • 7
  • 2
  • Math is an art (or philosophy, or area of knowledge, if you prefer). In its pure form, it's not concerned with tangibles at all. It's not a science, and it's not scientific; it's not engineering also. Writing software is an engineering process. It's about physical things, even if those are just pixels on a screen or data on a drive. That's the biggest difference here, I think. – user213769 Sep 05 '23 at 15:12
  • 7
    Being able to contact the person who wrote the code is not a guarantee. And even if you can contact them, they may not even be able to legally help you with it (exclusivity clauses are very common in contracts for software engineers), or may not even remember why they did things that way at the time. – Austin Hemmelgarn Sep 05 '23 at 17:28
  • 14
    "since in math, we are taught to produce proofs on our own and read proofs written by others" - look into the Wikipedia proofs of the Pythagorean theorem, Count the number of lines which are written in prose in each proof compared to the number of lines written as formulas. In computer program's, to give prosaic explanations, you add comments to the code. – Doc Brown Sep 05 '23 at 19:33
  • Sometimes, you might forget your own code or the motivation behind it. It happened to me recently, where I was going through a library I created for my own use, and I stumbled upon a function I have no memory of writing or what it could be used for. – Pierre Paquette Sep 05 '23 at 22:55
  • 7
    You ask "should programmers be able to read and understand a program's code syntax", but comments should almost never address syntax. Comments should explain the purpose of the code, not the syntax details. A comment on a function should explain what the function does, not the bare fact that it returns a value of type T. A non value returning function (aka a sub routine), should not explain that it actually "returns" a value by modifying one of the input parameters, it should explain why it modifies that input instead of returning a value. – jmoreno Sep 06 '23 at 01:14
  • Somewhat off-topic, but: When you like mathematics and programming, you might want to look at the Eiffel language, designed by a mathematician, following many principles of mathematics. – U. Windl Sep 06 '23 at 08:38
  • @U.Windl wait, I thought languages like Haskell are the first choice amongst all those academic math genius who like their code to feel like they are writing math proofs. I am not including the scientific computing or the numerical analyst crowd who are speed freaks. – Seth Sep 06 '23 at 08:46
  • 1
    @Seth All I wanted to say is that there are better languages than C to start with... – U. Windl Sep 06 '23 at 09:17
  • @U.Windl are there language like C that is not scripting. In C, there is a beginning, middle and an end for every program you write. At least it doesn't feel like some sort of video game. – Seth Sep 06 '23 at 09:19
  • 2
  • 2
    One big difference I see between Math and Coding is side-effects, usually a proof in math is self contained (I relies on other theorems, true, but that's not really relevant here, you don't see as many changes/improvements in existing proofs as in existing code). In coding you can have a lot of files that depend on each other, a comment like "Don't disable this option except if you do X in file Y" can be very useful in code as a change can have side-effects in other files – Kaddath Sep 06 '23 at 13:26
  • @Kaddath Perfect example of why development is an engineering exercise. In theory that is a code smell, if you have hidden dependecies that rely on comments. You should try to write so that they are more explicit, possibly breaking compilation when the dependency is broken. In practice, sometimes the effort needed for that is way too high and not really worth it, so the best option is a comment. Which is a way different mindset than what a mathematician would have. – bracco23 Sep 06 '23 at 14:37
  • @bracco23 yes ideally it would be side-effects in the same file, I exaggerated a little bit for the sake of the example, but we know it happens in real life ^^ – Kaddath Sep 06 '23 at 14:58
  • 2
    This is a good question, and a good answer could be quite long. You are not wrong to ask the question, as I would say the majority of experienced programmers are not going to have good answers (or good practice). – Preston L. Bannister Sep 06 '23 at 15:01
  • 4
    @Seth: let me add a note on your sentence here: "should programmers be able to read and understand a program's code syntax". For this, the answer is yes, but understanding the syntax of a language does not mean to understand the semantics of a program, especially when the program has a certain complexity. – Doc Brown Sep 06 '23 at 19:31
  • Seminal Hoare papers: Proof of a Program: FIND & Proof of correctness of data representations. Knowing how the concrete implementing state changes doesn't tell you how the abstract public state changes & vice versa. – philipxy Sep 07 '23 at 04:43
  • "shouldn't programmers be able to read and understand a program's code syntax even without any kind of handholding" how much your mortality and attention are you prepared to sacrifice on that altar? That argument is a non sequitur: just because we can it does not automatically follow that we should have to. – Jared Smith Sep 07 '23 at 14:27
  • You can't ask a programmer about their code after they have left the company. – Thomas Matthews Sep 07 '23 at 15:51
  • "in math, we are taught to produce proofs on our own and read proofs written by others." Note that this is taught to students because they need to learn math. If you are talking about working on code then re-implementing everything everytime takes a huge amount more time than just using what other people have implemented! Basically you suggest an O(n+m) algorithm with n=total size of codebase and m=size of changes while normal programmers use an O(log n + m) algorithm where the "log n" is the time it takes to read some comments/documentation instead of rewriting the code base – Bakuriu Sep 07 '23 at 16:43
  • @Bakuriu: not sure what you are trying to say - one can either write some text (which then is a product of the author) or read some text (where usually someone else is the author), That's pretty self-evident and doesn't say anything about reproducing someone else's work - quite the opposite. – Doc Brown Sep 08 '23 at 16:54
  • This is why "code review" is becoming common. Having another person look at your work and tell you exactly what is not crystal clear from what you have done so far, is invaluable because you cannot do this yourself (until you get a LOT of experience and then it is still hard). Commentaries tell future readers why the code is as it is. – Thorbjørn Ravn Andersen Sep 08 '23 at 16:57
  • @DocBrown I don't understand what you are saying. OP is saying that docmentation/comments is useless because you can just either rewrite the whole code base or re-read the whole code base and make the change. I argue that documentation and good comments allow you sublinear development time according to the size of the code base, i.e. they can act as some form of lookup table/compressed representation of knowledge to avoid reading the whole thing. OP argument about proofs is math is that a student needs to learnt to prove and doing so is way better than just reading proofs, same in development – Bakuriu Sep 08 '23 at 18:53
  • @Bakuriu: not sure if that is what the OP meant, maybe they did. But actually I think does not matter much, their whole line of thinking looks like a pretty huge misunderstanding. – Doc Brown Sep 08 '23 at 20:07

11 Answers11

50

If the proofs are written badly or unclear, we usually go ask someone or the proof author for clarifications. in the case of programming, should programmers be able to read and understand a program's code syntax even without any kind of handholding in any kind of written form?

If I give you 10 or 100 pages of mathematical formulae without any explanation and I tell you it is a proof, can you then reliably tell me what it proves and if it is correct? Or would you rather have an accompanying text from the author explaining what they are trying to prove and what their main reasonings are in coming up with this proof?

Documentation in a software project is similar. Programmers are expected to know the syntax of the programming language well enough that you don't have to repeat the exact code again in English. Documentation should be on a higher level, explaining the things that are less obvious from the code itself, such as why a particular solution is preferred over another possible solution, or to point out the forest that hides itself amongst all the trees.

Part of good documentation in software is how you name your variables, classes and functions. Comments can easily go "out of sync" with the code they once described, this is less so for programming constructs. I know that mathematicians love single-letter variables, but programmers find that a good descriptive name is way better than having to look elsewhere (even if it is just a few lines up) to see what s means in this function. If useful, you can start out writing the code with short variables and rename them at the end (your development environment should support this).

TAR86
  • 105
  • are there places online or books I can consult to see how to write good documentations for code along with examples of good and bad ones. – Seth Sep 05 '23 at 09:53
  • 34
    @Seth: Explain in your code documentation why, not how. And then write your code clearly enough so that, most of the time, you don't have to explain why. For the most part, code should be self-describing. – Robert Harvey Sep 05 '23 at 11:21
  • 11
    Regarding the single-letter variables, the important thing is that names should always be chosen for their meaning and suggestiveness, and terminology should be systematic. That is the main lesson for mathematicians, who may be accustomed to assigning letters arbitrarily. The rule is not specifically against single-letter variables, or any specific minimum length. – Steve Sep 05 '23 at 12:51
  • 6
    In regards to single-letter variables, there are certain conventions for using them. For example, "i", "j", and "k" are very often used as loop indices, while "x", "y", and "z" are coordinates in 3D space. – Mark Sep 05 '23 at 21:57
  • 8
    I think another difference is when a programmer works on existing code, they do not need to know how the entire system works, only the bit within the scope of their work. Some software programs are very large. Sometimes, the comments aren't even for other people - try writing complex code without comments, then put it away and come back a year later and try to figure out what you originally intended ... :D – Steve Sep 05 '23 at 22:07
  • 5
    Hmm, how inconvenient for there to be two different Steves commenting on one answer. – Steve Sep 06 '23 at 11:35
  • 2
    @Seth: get a copy of "Code Complete" by Stev Mc Connell, chapter 32 in that book is about self-documenting code and "good comments". – Doc Brown Sep 06 '23 at 19:40
  • @DocBrown thank you for the reference. I really appreciate it. – Seth Sep 06 '23 at 21:29
  • 2
    @Steve If I might be so bold, I believe mathematicians are actually in the wrong here, and they should get used to using whole-word variable names, for exactly the same reasons we programmers use them! How much of my confusion could have been solved in Statistics class around all the different "μ"s... – Nacht Sep 07 '23 at 00:14
  • 3
    @Nacht, I agree. Mathematical practice has developed in front of the blackboard (where economy of written reproduction is desirable), in a context where there are usually a limited number of variables involvded, and where juxtaposition often represents multiplication (precluding unambiguous multi-letter names). Programming is a different world. The crucial thing is not to judge things by length alone - programmers are not just wordier (in fact, reasonable shortness of names is still desirable), some substantial amount of our time and effort is spent on devising terminology and mnemonics. – Steve Sep 07 '23 at 07:22
  • @Seth Developer documentation is, unfortunately, often a side note in most books. I'd recommend Code Complete: 2nd Edition, Writing Solid Code, The Practice of Programming, and The Pragmatic Programmer: 20th Anniversary Edition. These are all a bit out of date, but still very relevant, especially for learning C; the details may have changed, but the basics are solid. They don't have a lot about documentation, a few pages each, but they will answer many many many questions you don't even yet know to ask. – Schwern Sep 08 '23 at 00:38
  • 1
    I think it is actually very similar to a mathematical paper. All steps which should be obvious to a member of the field don't need any explanation. But when you do something unexpected or extraordinary you give a short hint what you are doing. (this can be a function-name or a code comment) – Falco Sep 08 '23 at 08:53
  • @Steve: so you clearly disagree to what Nacht wrote, since you are saying mathematicians are not wrong, and programming is just a different world. And to that, I fully agree (disclaimer: I am both, a mathematician and a programmer ;-). – Doc Brown Sep 08 '23 at 16:47
  • @DocBrown, I'm saying I agree that "mathematicians are wrong", and I agree that "they should get used to using whole-word variable names" (if "whole-word" is to be contrasted with single-letter, but including acronyms and abbreviations which are generally regarded as non-words or word-parts). My comment about the "different world" is that things which are reasonable in mathematics - or were reasonable when it's norms and practices were first established - are definitely not reasonable in programming. The conflict includes variable naming, but many other things also differ - the worlds differ. – Steve Sep 08 '23 at 18:02
26

There's two types of code documentation: documenting the interface, and documenting the code itself. One is for people who use your functions, one is for people who maintain your functions.

When documenting an interface of published functions, classes, and methods, you're...

  • explaining what it's for
  • explaining how to use it
  • making promises about its behavior
  • pointing out any caveats
  • saving the user a lot of time

That last one is about scale. While you're learning, you're probably working on maybe a few hundred lines of code. A typical project is tens of thousands or hundreds of thousands of lines which relies on millions and millions of lines of dependent code. Programmers have better things to do than read every line of code they rely on, and it's simply impossible now.

For example, here's my documentation for a function which centers strings. It is written in perldoc, Perl's standard documentation language. It can then be rendered in a variety of formats, here it is in HTML. Most modern languages have a standard way of embedding documentation in the code (C does not, it is 50 years old).

=head3 center
my $centered_string = $string->center($length);
my $centered_string = $string->center($length, $character);

Centers $string between $character. $centered_string will be of length $length.

C<$character> defaults to " ".

say &quot;Hello&quot;-&gt;center(10);        # &quot;   Hello  &quot;;
say &quot;Hello&quot;-&gt;center(10, '-');   # &quot;---Hello--&quot;;

C<center()> will never truncate C<$string>. If $length is less than C<< $string->length >> it will just return C<$string>.

say &quot;Hello&quot;-&gt;center(4);        # &quot;Hello&quot;;

It contains examples of use, an explanation of what the function does, and a caveat about what happens if the length is shorter than the string.

Code which calls center will be relying on it behaving consistently. Without the documentation, they'd have to guess what it did based on reading the code. This can lead to misunderstanding its purpose, and it can also lead to relying on implementation details which may change in the future. By writing documentation you make a contract with the user that this is how you promise the function will always behave this way, and the user will only rely on the documented features. With this contract in hand, you are free to change the implementation without worrying that you're going to break someone else's code ("backwards compatibility"), and the user is free to use the function without worrying center will change and break their code.

Ideally, the users of your functions never have to look at the code at all; the documentation explains everything.


When documenting the code itself, you generally don't document what the code is doing, but why it is doing it. The why is not always obvious; that knowledge only lives in somebody's head (and maybe a commit log message). You cannot rely on being able to speak with the author, they might not be available, and that does not scale.

For example, the comment here explains the bulk of this function is working around a bug in another library. And it's also faster.

sub _get_datetime_timezone {
    state $local_tzfile = "/etc/localtime";
# Always be sure to honor the TZ environment var
return &quot;local&quot; if $ENV{TZ};

# Work around a bug in DateTime::TimeZone on FreeBSD where it
# can't determine the time zone if /etc/localtime is not a link.
# Tzfile is also faster to do localtime calculations.
if( -e $local_tzfile ) {
    # Could go through more effort to figure it out.  Meh.
    my $tzname = &quot;Local&quot;;
    if( -l $local_tzfile ) {
        if( my $real_tzfile = eval { readlink $local_tzfile } ) {
            $tzname = $real_tzfile;
        }
    }
    require DateTime::TimeZone::Tzfile;
    my $tz = DateTime::TimeZone::Tzfile-&gt;new(
        name     =&gt; $tzname,
        filename =&gt; $local_tzfile
    );
    return $tz if $tz;
}

return &quot;local&quot;;

}

Without that comment, every reader would have to puzzle it out for themselves wasting time, maybe getting it wrong. Maybe even concluding the code isn't necessary and deleting it reopening an old bug.


Sometimes you do comment on what the code is doing if it is not immediately obvious. For example... (this is Perl).

    time => sub {
        my ($class, $caller) = @_;
    require perl5i::2::DateTime;

    # Export our gmtime() and localtime() and time()
    (\&amp;perl5i::2::DateTime::dt_gmtime)-&gt;alias($caller, 'gmtime');
    (\&amp;perl5i::2::DateTime::dt_localtime)-&gt;alias($caller, 'localtime');
    (\&amp;perl5i::2::DateTime::dt_time)-&gt;alias($caller, 'time');
},

(\&perl5i::2::DateTime::dt_gmtime)->alias($caller, 'gmtime'); is a mouthful, even for Perl. The comment explains what this code is doing.

However, comments have a way of falling out of date and becoming wrong. It's better to rework the code so it can be plainly understood. Perhaps by writing a wrapper function that reads more clear.

    time => sub {
        my ($class, $caller) = @_;
    require perl5i::2::DateTime;

    alias(\&amp;perl5i::2::DateTime::dt_gmtime, as =&gt; 'gmtime', in =&gt; $caller);
    alias(\&amp;perl5i::2::DateTime::dt_localtime, as =&gt; 'localtime', in =&gt; $caller);
    alias(\&amp;perl5i::2::DateTime::dt_time, as =&gt; 'time', in =&gt; $caller);
},

To a Perl programmer that means it's making the function perl5i::2::DateTime::dt_time available as the function time in the namespace $caller. And they can look up the details in the documentation for alias.

Schwern
  • 1,188
  • 7
  • 11
  • I think you are putting in comments alongside your code. Is that consider also documentation, or does the term means I have to write a separate thing in plain English about what my code does? – Seth Sep 05 '23 at 20:40
  • 3
    @Seth There are two audiences, "users" who want to call your functions, and "maintainers" who want to change your functions (fix bugs, add features, optimize, etc); you need different documentation for each. Modern languages have ways of embedding user documentation in the code and rendering it. For example, here is the rendered documentation of center. If your user docs are good, users never have to look at the code. Comments are for maintainers, they help the maintainer understand the intentions of the code. Each has a different audience. – Schwern Sep 06 '23 at 17:49
  • @Seth Sometimes it makes sense to write the user documentation in a separate file. Example and rendered. Sometimes it makes sense to write it next to the code itself, called "embedded". Example and rendered. Note that the code also has a comment explaining the purpose of a block of code. – Schwern Sep 06 '23 at 17:58
  • IME, generally speaking, the closer the doc comments are to their code, the more likely they are to be accurate and up-to-date. (Not a hard-and-fast rule, of course; but if you can't help seeing the comments when you work on the code, then you're more likely to remember to check and update them too.) That's probably why embedded comments have become so popular! Documentation in a separate file is best kept for things like tutorials or other stuff that cuts across lots of functions/files. – gidds Sep 09 '23 at 21:29
23

A lot of your question is more of a rant than an actual question but I am somewhat sympathetic. In my decades of experience with development, I have found documentation that is useful but I've also seen a lot that was useless at best and often misleading.

If the proofs are written badly or unclear, we usually go ask someone or the proof author for clarifications.

What if they are not available? Much of my coding career has been spent working with code written by people who moved on years ago. Even if they are around, they might not remember. A standard programmer joke/experience is "Who wrote this garbage? [git blame], Oh, right. It was me."

Does there need to be an English explanation for every line of code etc.

No. This is stupid. If someone tells you this, you can safely ignore their advice on coding. This might make some sense if you were coding in assembly or some other really low-level language but there's no real good reason to do this in real production code using any popular contemporary language. I do see this in things like tutorials where the main point is to teach. This might give some people the impression that it's what they should always be doing, perhaps.

I actually think this is very bad practice. Commenting every bit of code is not only costly but such comments tend to be misleading if not outright wrong. They clutter up the code with a bunch of noise IMO. I tend to ignore trivial comments and delete them when I can.

How do I know what is good documentation versus bad documentation, etc.

To start, understand that there are many types of documentation for systems. The primary distinction I would make is that there are comments or inline documentation and there are separate documents which explain various things about a system including (but not limited to) data models, networking diagrams, tutorials, design descriptions, change notes, API specs, and requirements. I agree with the view that unit tests are a form of documentation. Developers are usually mostly responsible for inline documentation and unit tests but may be involved in some of the other forms of documentation. A person who spends a good part of their time creating designs is typically referred to as an 'architect'.

Good documentation explains non-obvious things about a system. Usually, it's at a fairly high-level. For example, it can be very hard, if not impossible to understand how various subsystems work together just from looking at code and configuration artifacts. It's often helpful to have comments at the class/module/namespace level which briefly explain their purpose and usage.

APIs or libraries that are intended for use by multiple applications should have extensive documentation. I shouldn't have to read the code to figure out how Python's re module interprets patterns, for example. This is also important for calling out what behaviors are intended and will be supported in updated versions. For example, if you notice that the order of your inputs to some function is retained in outputs somewhere else, you should be careful about depending on that behavior if there's nothing stated about it in the documentation. If you are writing such a library, you should document what you intend to support over the long-term.

There are two kinds of things that are often referred to a comments because they are written into the code but I would argue they are significantly different. There are comments placed near method declarations, classes, etc. which are often recognized by various tools. These, in my mind, should be classified as 'documentation'. If you are working in VSCode and hover over a method call, you will often see a popup with some text. That generally comes from these kinds of comments. They can be very useful as long as they explain things like what the method does, how the parameters are treated, and special cases. An example of a useless way of doing this is the classic JavaDoc: getFoo: gets the foo pattern. There's no good reason to simply restate the obvious in these types of comments.

The other kind of inline comments are written alongside the executable statements of code are meant for someone who is reading the code in detail. IMO Comments at this level should be rare and reserved for special scenarios. Sometimes I've run into strange things in APIs where you need to call some seemingly unrelated method first to get it to work. Someone (including myself) might come along later and think it's a mistake or some sort of cruft and remove it. In that case I usually put a warning about how it needs to be there and often a link to something explaining why. If the code is very hard to understand, that can be a reason as well, but refactoring is a better solution.

JimmyJames
  • 27,287
  • sorry if i come across as ranting. when I learn programming, i spend a lot of time learning about all the syntax commands. I thought after months of coding in that specific language syntax, i should be able to read another person's code fluently. For people who has degrees in CS. I am told they can pick up a new programming language in a few weeks and code something up that actually works. i am not sure if that means they are fluent in reading a program written in that language without the help of code comments. – Seth Sep 05 '23 at 20:32
  • 2
    @Seth No need to be sorry. I'm a bit of a ranter myself but I've had to try to control that (with varying levels of success.) It can get your question closed, though. And no, a degree in CS doesn't mean you can pick up a language in a few weeks. I have a degree in CS and I can assure you that CS is not really about programming. It's actually more akin to abstract mathematics. Once you learn one language, it becomes easier to learn another. Once you learn 2 or 3 more, you'll see the patterns ... – JimmyJames Sep 05 '23 at 20:54
  • 3
    @Seth ... Don't get me wrong. CS is really good knowledge. But it isn't really about programming per se. I'll try an analogy: programming is to CS as carpentry is to structural engineering. – JimmyJames Sep 05 '23 at 20:56
  • I edited my post to be more specific about what I am asking. Basically I did not know using English for explaining your code and alongside it is called commenting, and if the program is small, that is all one needs. I keep hearing about documentation and code commenting in that intro to coding class. I never heard that there are distinctions to be made between the two. Also what are good examples and practice of good commenting? – Seth Sep 05 '23 at 21:41
  • I made some edits to try to address your comment. – JimmyJames Sep 06 '23 at 14:38
  • thank you for the editing. I still make mistakes when composing in English even though I have been speaking and writing in it forever. – Seth Sep 06 '23 at 15:49
  • @Seth As do I ;) – JimmyJames Sep 06 '23 at 15:50
  • 3
    Even in assembly, people eventually realized that a comment like LDA #10 ;load 10 into the accumulator was a waste of space and a waste of a reader’s time. – VGR Sep 06 '23 at 17:08
  • @VGR Sure I can see that. I don't really know assembly. I was thinking of something like commenting a 'jump not zero' explaining the condition being checked. But that's machine code, right? It's been a while. – JimmyJames Sep 06 '23 at 19:24
  • Yes. Even in assembly, comments that explain why something is being done are valuable, and comments which merely parrot the obvious are noise. – VGR Sep 06 '23 at 19:30
  • 1
    @VGR Sure, LDA #10 ;load 10 into the accumulator is asinine. But LDA #10 ;10 widgets available or CMP eax, ebx ; Check if more foos than bars might be useful. – A. R. Sep 06 '23 at 19:30
  • 2
    @Seth Learning a language in a few weeks is probably doable with a CS degree. But the language is the easy part. Learning the standard library takes a lot longer. Learning the ecosystem of commonly used libraries and tools takes longer again. (And different programmers & projects will use different versions and subsets of the standard library and ecosystem, so even within the same language there can be a lot to learn when you shift contexts). Then there's the ins and outs of a specific project, which might be easy to learn or might be more knowledge than all of the above put together! – Ben Sep 06 '23 at 23:29
  • 1

    An example of a useless way of doing this is the classic JavaDoc: getFoo: gets the foo pattern

    A rule of thumb I generally stick to with coding is: "If the comment could have been auto-generated by your IDE then it has no value". In one of previous jobs the code was littered with inanities like var c = db.Customers.Where(x => x.Country == "UK); // Retrieve all customers in the UK

    – Richiban Sep 08 '23 at 11:17
8

The proof is in the pudding

Why are you asking this? Can't you figure it out for yourself?

Mathematics as an example

We can observe mathematical papers as an example. Their content heavily skews towards a written explanation rather than a "result dump" of the conclusion, specifically because it would otherwise require readers to redo the legwork that the paper should be proving has already been done.

I could rephrase your question about mathematics and ask why we bother teaching anything other than 1+1=2, can't these children think for themselves and figure it out? The rest is just working out, isn't it?

What are code comments?

It seems you've fallen in a very common trap. Not every comment is as useful and relevant. An example of a bad code comment would be:

// Adds one and returns the value
public int AddOneToInput(int input)
{
    // Add 1 to the input value
    var result = input + 1;
// Return the result
return result;

}

These are bad comments, because they explain something that was already trivially obvious from the code itself. Return the result is not meaningfully more informative than return result; already was.

Newcomers to development often write trivial comments. Partly, it's understandable. They aren't intuitively aware of simple syntax yet, so they write down their thoughts in a comment, and then they translate these lines to actual code.
That's perfectly okay, we all have to learn and get familiar with how to express ourselves in a new programming language, but the comments should not be kept around afterwards - definitely not in a professional context where it is assumed that readers understand the programming language already.

In this sense, yes you are correct that we shouldn't be writing these kinds of handholding comments, because we wouldn't do that in mathematics either. But the comparison to mathematics is unnecessary because it distracts you with ways in which the analogy fails, even though the specific thing that you wanted to point out isn't as incorrect.

However, let's consider what a good comment would be. First, without comment:

public bool IsEven(int n)
{
    return (n ^ 1) == (n + 1);
}

The method name tells you that we're checking if the number is even or odd, but have you understood how we are doing so?

public bool IsEven(int n)
{
    // Performs an XOR operation on the rightmost digit.
    // If the number increased, the rightmost digit was
    // initially 0 and therefore the number was even.
    return (n ^ 1) == (n + 1);
}

It's easy to pick at this example and tell me that I could've done n % 2 ==0, which required no real comment. I know that.

The goal here was to give you a fixed piece of logic that is not trivial to understand, thus showcasing how a comment can help make a difficult piece of logic more digestible (as opposed to the previous example where the comment added nothing of value).

The problem with showcasing the benefit of having comments is that comments become more necessary when the complexity increases, but the more complex an example is, the more time it takes me to explain to you all the intricacies that justify making this example so complex in the first place.

If you won't take my word for it and want to see a real complex situation where comments would've been helpful, start working in a professional environment and needing to deal with deadline and code written by others where readability by other developers is low on the list of priorities.

I'm sure you can approximate this by browsing GitHub (or similar sites) and trying to read codebases that tackle non-trivial problems.

Flater
  • 49,580
  • if a program contains multiple thousands of lines of code, isn't it better to create some sort of flowcharts. Something along the line of MIT Scratch? I mean that should be easier for other people to digest, who are unfamiliar with your code? – Seth Sep 06 '23 at 07:43
  • 1
    @Seth Scratch is a programming language. Are you suggesting having two implementations of each complex program? – Caleth Sep 06 '23 at 08:15
  • @Caleth I am suggesting laying out your code using flowcharts in blocks. – Seth Sep 06 '23 at 08:18
  • 5
    @Seth What generally happens if you try to do that, is that eventually the flow-charted will no longer be updated, or it will be updated incorrectly, leading to deviations between the actual program and the flow-chart, negating any of its benefits. – Mark Rotteveel Sep 06 '23 at 11:17
  • @Seth, it may be useful to document flows, but it has to be at a level that couldn't be easily inferred or followed in the code itself. It's never effective to simply replicate the code itself, and as Mark says, detailed flows usually go out of date and aren't trusted. The documentation should either be at a clearly different level of generality than the code itself - for example, representing a hundred pages of code as a one-page map - or it should be structured differently than the code - for example, covering the orchestration of a high-level procedure that involves scattered code. – Steve Sep 06 '23 at 13:20
  • @Seth: There's two ways to respond to this: (a) That would be documentation. Now we're really just talking about what form you want to see your documentation in, but the initial question was about the necessity of documenting in the first place. Those are two different questions. (b) Code is a flow chart. It starts at a defined entry point and from that point on you could technically read it in order (the only thing you can't account for are what values you receive from an external source). – Flater Sep 06 '23 at 22:48
  • 1
    @Seth: Be very careful with your last comment here because you're treading on "hey, [this suggestion] makes sense to me, why aren't all of us doing it then?" which is a way to naively think that you've reinvented the wheel. It's not uncommon for newcomers/juniors to stumble on this train of thought. It's human. But there is an entire world of nuance and historical experience that you are not yet aware of which is a significant driving force as to why things are the way they are. The drive for innovation is good, and don't lose that spark, but temper yourself when still learning the ropes. – Flater Sep 06 '23 at 22:51
  • @Flater, I guess the reason we have new practitioners always reinventing the wheel, is because we don't have a decent professional bible or teaching curriculum that either provides coherent ready-made answers or pre-empts common mistakes in reasoning. Too many trains of thought only hit the buffers a thousand miles down the road, way too far ahead to realise when first boarding, for anyone not equipped with knowledge by the previous generation of crash survivors! – Steve Sep 07 '23 at 08:52
  • @Steve: Not everything can be taught, some things must be learned/experienced. This touches on my closing comments in the answer about the difficulty of giving a simple example of necessary complexity (without it being trivially reduceable). It also very much depends on the person - personally I learn much better from experience instead of someone telling me something I don't already understand the need for. Thirdly, teaching the minutiae (a) is really boring and (b) has diminished returns relative to curriculum time, so leaving some things to be learned via experience is okay in my book. – Flater Sep 07 '23 at 22:35
  • @Flater, I'm the same in that I prefer to direct my own studies or learn from experience, but I still think the subject is greatly neglected and not taught well. After all, it's the responsibility of the educator to show why there is a need for something - to create and set out the relevant scenario, explain the important features of the scenario, and demonstrate the point in that context. Many other professions, including law and accountancy, intermingle theoretical training and practical experience. – Steve Sep 08 '23 at 09:34
5

Here are two corkscrews. One comes with documentation.

Sure you could disassemble the complex one, or figure it out by trial and error. But you aren't trying to understand the corkscrew. You are trying to open more wine for your project manager.

enter image description here

enter image description here

Ewan
  • 75,506
  • 3
    One comes with documentation” Which one? I'm not sure to understand the analogy between code and corkscrews. – A.L Sep 05 '23 at 19:34
  • 2
    Context matters. Suppose I've just invented the corkscrew, but I die immediately after. A random passerby finds this in a field. Are they going to figure out what it was designed to do? Probably not. Unless they find it in a room filled with opened wine bottles, then they might put two and two together; but I'd argue that that is a form of documentation as well. You're not wrong that complexity of the thing correlates with the need for documenting the thing, but the general advice is to err towards avoiding complexity rather than documenting it, so this answer is a bit too oversimplified. – Flater Sep 05 '23 at 23:23
  • 1
    @A.L the point is that the second corkscrew is so simple that it doesn't need do be documented. Anyone should be able to understand how to use it even if it's the first time they see it. If code is well structured and well written, it requires less documentation, or none, and another person (or yourself 6 months in the future) will be able to work on it and understand it. – user985366 Sep 06 '23 at 12:04
  • 1
    @user985366 right, I think that this should be added in the answer, otherwise readers have to guess the intent of this analogy. – A.L Sep 07 '23 at 13:36
  • its kinda the point of an analogy to make you think. – Ewan Sep 07 '23 at 16:39
  • Going by current software industry trends, I bet the wooden one comes with a manual :) – S.D. Sep 08 '23 at 08:22
  • is the first corkscrew broken ?? I can see two different pieces of it , a link to the manual would be helpful – pippo1980 Jan 29 '24 at 12:29
  • http://fantes.net/manuals/rabbit_corkscrew_instr.pdf no second little bit piece mentioned , maybe just a bug – pippo1980 Jan 29 '24 at 12:33
4

You seem to not appreciate the sheer size of software development compared with maths. Take the proof for Fermat's last theorem, maybe hundred pages. Take the kind of software I’m working on, easily a million lines of code, 20,000 pages, and there is plenty stuff of that size around. How many proofs have you worked through with even 200 lines?

The sheer amount means you have not a chance to read the code for even a medium sized project and understand it all. That’s why you need documentation, to give anyone a chance to know what the code does, within their life time.

gnasher729
  • 44,814
  • 4
  • 64
  • 126
  • 6
    I definitely agree with this, but I'd also add, the reality is that much of the information useful to a software developer is not usually incorporated in the code at all. So sometimes it doesn't matter how long you stare at it. This may come as a surprise to a mathematician like the OP - business applications are never self-contained proofs. – Steve Sep 05 '23 at 12:37
  • 2
    To be fair, this is also not an apples to apples comparison. To understand an end to end proof of Fermat’s last theorem, you need not only Wiles’ proof but also the work leading up to it. I would also wager the density is different even for skilled professionals in both fields, so comparing page count is not necessarily the best measure. Rather than size, I would argue the more important metric is complexity, which is to be expected from the objectives of each. – yanjunk Sep 05 '23 at 15:48
  • @gnasher729 I have never build software. I think the most complex thing I will ever do is creating algorithms in the context of scientific computing or in numerical analysis. I don't know the difference between code commenting vs code documentation. – Seth Sep 05 '23 at 20:38
  • @Seth, Code commenting refers to the comment lines/blocks you write within the source code. Code documentation is broader. It include code commenting, but also any externally written documentation and/or graphs that describe the code, how it is structured, why it is designed this way, etc. – Bart van Ingen Schenau Sep 06 '23 at 07:24
  • @BartvanIngenSchenau when is it necessary to have a separate and a more elaborate piece of document explaining your code? Even in a professional setting, a program that requires few hundred lines of code, like new algorithms. Would such be better just having code comments. But if we are designing an entire software including user interface, etc and the entire software includes the options of using multiple algorithms in some sequences or separately individually or otherwise, then in those situations, it might be better to have documentations? – Seth Sep 06 '23 at 07:41
  • @Seth, when to use external documentation depends to a large extent on the size of the code base and agreements within the team. As a rule-of-thumb, start with comment blocks within the sources and start looking at external documentation when you can't find a good place within the code where it makes sense to write the information you want to write down. – Bart van Ingen Schenau Sep 06 '23 at 08:36
  • 1
    @Seth, it's difficult to be comprehensive about all possible cases. You'd use code comments when the remarks concern something local to that piece of code. Some reasons to fall back to external documentation are (a) when the remarks aren't fully local to any piece of code (either because it concerns scattered code, or because it says something general about the design which has no proper locality anywhere in the source code), (b) when you need graphical facilities not text, (c) when the volume of information would clutter the code and justifies a separate article, or... (1/2) – Steve Sep 07 '23 at 11:45
  • ...(d) when the overall complexity handled by the documentation begets a large volume of documentation, and this volume of documentation requires structure, indexing, change control, publication, and general management and curation in its own right, in a way that doesn't necessarily align with how the source code is organised and probably isn't handled by the same staff. (2/2) – Steve Sep 07 '23 at 11:50
3

The need for documentation starts becoming obvious once you start producing software that's tens of thousands of lines of code, or more, and a team of people working on it.

Code should be well laid out, split into meaningful modules, with appropriate names for things, and comments where useful.

But once a project gets to a certain size, it's very difficult for a newcomer to work out even where to start. At that point a good software design document is a major help.

Once a project gets to a sufficient size and level of formality, you could end up with a library of documents. For example: software requirements, software coding standards, software design, software test plan, software test procedure, software build document, and so on.

Simon B
  • 9,621
2

One core tenet about comments:

If you can express it in code instead of in a comment, do that.

This aligns with the CppCoreGuidelines.

Useful ways to make comments unnecessary:

  • Proper and informative naming of identifiers for classes, functions, variables etc. This cannot be overestimated.
  • Custom types instead of generic types. Examples are Point instead of an array with two or three elements, Foot and Meter instead of float for both (plus it keeps your space probes from crashing (see below)), enums instead of bools. For the latter, compare the signatures of SomeFunction(bool, bool) with SomeFunction(LogRequested_E, ExitOnError_E) with the enums having members like { DONT_LOG, LOG } and { DONT_EXIT, EXIT }. If your coding rules allow that, such parameters can actually be used as bools in if (logRequested) so that the using code stays concise.
  • Establishing invariants in the constructor. Stroustrup mentions that a comment like // Always call init() before first use!! has become obsolete in C++.
  • Avoiding code that is gratuitously hard to understand. If code gets optimized later and therefore harder to understand, leave the original, slow and straight-forward code as a reference in a comment.

There are a number of reasons for that:

  • Comments aim at repeating information already present in code, creating redundancy. This implies that they can be or become wrong. Code maintenance must address the comments as well, creating more work.
  • Custom types and initialization make correctness automatic, or at least enforced by the compiler. For example, directly assigning feet to meters, as in the infamous Mars probe, becomes impossible with custom types.
  • Comments can never convey all information present in the code. The source code, when available, is the ultimate reference, which makes it important that it is intelligible. I have heard that TCP/IP became a success partly because BSD 4.2 had a working implementation whose source code could be inspected (even if it was not in the public domain at the time). But you could see how the system actually worked without re-engineering it.

By implication, this guideline also tells you that you should comment at what cannot be expressed in code.

For me, that's often

  • where a function used (although modern IDEs help finding usages, and as all comments these can age and become wrong as well);
  • non-obvious interactions with other code, if present;
  • non-obvious and unavoidable restrictions for parameters, when it can be called etc.
  • why a certain approach was chosen over another, any research which informed the decisions (benchmarking, papers);
  • what a maintainer should pay attention to later, if anything;
  • for longer functions or modules: An introduction into the design and the strategies chosen which help a maintainer to get an overall understanding of and mental framework for dealing with your code. While this information is in principle ideally expressed in the code (by proper modularization, naming, clear coding etc.) it is hard to extract from it by a newcomer.
  • Indeed! I came to say something very similar. It's not really about how many or few comments you write, or how long they are. The goal is to make your code as clear as possible. The best way of doing that is to use clear, simple, consistent naming, organisation, formatting, and (most importantly, and least commonly) thought processes. Then use comments for anything else readers might find helpful. – gidds Sep 09 '23 at 21:36
1

You should add comments which will be useful to the readers. The difficult part is defining "useful", and, to some extent, "readers".

Defining the audience can help work out what's useful. First, the readers always include you yourself. Make sure comments are useful for your future self, and you're probably 50% of the way there. Document (not necessarily in the code, but also in the project readme, issue tracker, or other knowledge repositories) what you think you will need to know when re-reading the code a year from now. Then, is it going to be read by other people you know? What do you think will be useful to them? And, finally, is it going to be read by complete strangers, possibly from other cultural backgrounds, fields, etc? This last one is really important. You need to be clear, factual, and concise, or your writing is guaranteed to be misunderstood. And a misunderstood comment is worse than no comment, because readers will have to eventually notice their confusion, correct for it, and by then will be annoyed at having lost time and effort for no gain.

l0b0
  • 11,433
  • 2
  • 44
  • 48
1

Things that are obvious do not need a comment.

However it's not always clear what's obvious: For example for someone reading C language code without knowing the C language, almost nothing is obvious.

Unfortunate there are sometimes stupid "company coding guidelines", so you might find code like

x = x + 1; /* add one to variable x */

(maybe also known as the COBOL mistake)

In contrast the line if (!((ch+1) & ch) || !*s) would benefit from some comment as only experienced programmers do recognize the pattern immediately.

IMHO it does not make any sense to add comments into the code, enabling (or trying to enable) people who do not understand the language (the basics at least) to understand the code. Maybe just add a high-level comment for each routine (following the classic top-down design principle), explaining what it does at an abstract level (as the details should be "obvious" from the actual code). It's named procedural abstraction.

Probably it's preferable to make the code readable (for humans, not for the compiler only) instead of commenting unreadable code. One of my instructors once said (AFAIR): "If you have a line of code you are especially proud of, that line will cause most trouble in the long run; re-write it to make it simple and understandable."

(Today you don't have to write ugly code for performance reasons as most compilers can do that four you.)

Also, some company guidelines require every function to have a comment explaining who changed that function last, and when he/she did that (possibly with a log of changes at the function level). I think with current source code management systems such comments are out of time, specifically when considering that the comments might exceed the actual code in size, effectively making it harder to read as a whole.

Maybe consider the drastic approach an exercise in corporate downsizing found in the Eiffel Style Guidelines by Betrand Meyer.

The other question is: What is the purpose of the comments?

  • Comments can help the original author to understand his/her own code when revising it after a long time
  • Comments can help programmers having to understand or update foreign code
  • If people different from programmers have to write program or user documentation, they might base their work on what the programmers "had left" for them.
  • Comments add details that cannot seen from the code directly (as algorithms being implemented, or references to literature). This could be important, specifically as there have been books with incorrect algorithms.
  • Some languages also know about formal comments that are part of the language specification. Typically the compiler or other development tools handle such comments in a special way, like generating interface descriptions from the code automatically.
U. Windl
  • 129
  • The first sentence is the core. That's even true for API documentation: Unless you work in an environment with formal documentation requirements I'd rather not comment what Line::SetLength(float mm) is doing. Note the naming of the variable: If we had called it len or the like we'd have had to comment on it, informing the user about the units. The old adage, emphasized by Bjarne Stroustrup: "If you can express the comment in code, do that and eliminate the (need for a) comment." I'll write an answer to that effect. – Peter - Reinstate Monica Sep 07 '23 at 09:22
0

Schwern’s answer is quite good, explaining that there are 2 kinds of comments/ documentation, but leaves a bit out.

API/interface documentation is formal, done in a prescribed manner and is generally concerned with letting the reader understand how to interact with the API. This is generally not intended for someone implementing the API, but for someone utilizing it. In fact it is quite common for an API to be written/implemented in an entirely different language from the language where it is most commonly used. It is distinct from and generally accessed entirely without reference to the implementing code. Code documentation is read whenever one wants to learn if and how an api can be leveraged to do a task.

Code comments although frequently discussed as “documenting intent” are NOT formal, are never stored or read separately from the code. Good code comments make the code easier to read by providing context for what it does and why it does it in a particular way. It’s possible that some syntax may need to be explained, but that is extremely rare. I recently had a bug-fix that could have been done by deleting one line of code, but which (IMO) was best done by making changes in 7-8 files, adding comments and renaming variables and functions so that my bug would not be reimplemented. I made those changes, so that anyone reading the code and comments would understand how things worked and would thus never even think to add that line of code. And if someone else (or future me) need to make some changes, they will understand the process.

As for how to write either kind of documentation…

Formal documentation as for an api, will have a defined format, choosing or being required to do it will lead you the format. Mainly what people will be looking for isn’t so much “good” as “through” and understandable. designing the API will be harder than documenting it. It’s relatively easy to do, just cover everything that is publicly shared about how it works.

Informal documentation aka code comments, is another matter entirely, large swaths of code need nothing more than good naming conventions, and standard patterns. The hard part is stepping back from what you know right now, and envisioning what you might need six months from now if you made some kind of mistake or need to tweak how things work. Frequently you won’t find out what comment you need to make, until you do come back to the code and need to understand it or figure out how to change it.

jmoreno
  • 10,853
  • 1
  • 31
  • 48