3

How did we arrive at the sigmoid function for calculating probabilities?

Why not use some other function that " squashes " the values to lie between [0, 1]. Maybe even just normalise the values so they all add up to one.

Vajjhala
  • 39
  • 3

1 Answers1

3

I think a really nice explanation for the popularity of the sigmoid function is in these lecture notes (http://www.stat.cmu.edu/~cshalizi/uADA/12/lectures/ch12.pdf)

  1. The most obvious idea is to let $p(x)$ be a linear function of $x$. Every increment of a component of $x$ would add or subtract so much to the probability. The conceptual problem here is that $p$ must be between $0$ and $1$, and linear functions are unbounded. Moreover, in many situations we empirically see “diminishing returns” — changing $p$ by the same amount requires a bigger change in $x$ when $p$ is already large (or small) than when $p$ is close to $1/2$. Linear models can’t do this.
  2. The next most obvious idea is to let $\log p(x)$ be a linear function of $x$, so that changing an input variable multiplies the probability by a fixed amount. The problem is that logarithms are unbounded in only one direction, and linear functions are not.
  3. Finally, the easiest modification of $\log p$ which has an unbounded range is the logistic (or logit) transformation, $\log (p(1−p))$ . We can make this a linear function of $x$ without fear of nonsensical results. (Of course the results could still happen to be wrong, but they’re not guaranteed to be wrong.)
Kiritee Gak
  • 1,799
  • 1
  • 11
  • 25
Lukas Biewald
  • 446
  • 2
  • 4
  • The problem is that logarithms are unbounded in only one direction, and linear functions are not.

    Aren't logarithms actually unbounded in both directions?

    – pX0r Jun 08 '18 at 06:42