I'm one year through both an undergraduate CS and undergraduate math degree, currently studying Big-O analysis, and am trying to develop a "mental checklist" as it were, or a kind of procedural algorithm, for performing Big-O analysis of mathematical functions.
In this post, I'm merely asking some kind soul to point out any obvious flaws in my understanding of Asymptotic Analysis. A detailed summary of Asymptotic Analysis based up my understanding of the subject follows ( a tl;dr of the summary is at the end ):
The First Step: Organize The Methods of Asymptotic Analysis
In mathematics, there are functions that return numbers; there are also functions that return sets of functions.
Using Java-like notation, we can express the Big-O functions like this:
interface AsymptoticAnalysis { public static Function littleOh(Function f); public static Function bigOh(Function f); public static Function littleOmega(Function f); public static Function bigOmega(Function f); public static TwoTuple littleTheta(Function f); public static TwoTuple bigTheta(Function f); }
Establish a Sorted List of Generic Mathematical Functions
Discussions of Big-O Analysis in Computer Science usually provide the following list of general, mathematical functions, sorted by resource requirements:
- $ f(n) = c $
- $ f(n) = log(n) $
- $ f(n) = n $
- $ f(n) = n log(n) $
- $ f(n) = n^c $
- $ f(n) = c^n $
- $ f(n) = n! $
- $ f(n) = n^n $
Let each number in the list above be the formal name for a category of functions. For example, let us say that functions whose asymptotic behavior can be generalized as $f(n) = c$ are of Category One.
For example, $f(n) = 3^{14} + 4^2 × 3 log_4(62) = c$. So, we could say this particular $f(n)$ is of Category One.
$f(n) = 3n^{0.5} = c n^{0.5}$, where $c = 3$. Therefore this $f(n)$ is of Category Five.
And so on.
Here are a few more examples of functions often taken as objects of asymptotic analysis in Computer Science:
While they can not all be reduced to polynomials, they are essentially sums of terms. Each term may or may not have a constant in its base or exponent. Each term may or may not have a base or constant whose value is determined by $n$.
Categorize Each Term in a Given Function
Let us analyze $f_a(n)$, above.
$$f_a(n) = 7 log_{10}(6n^3+3n^5)+n^{0.2}$$
$f_a(n)$ has two terms. The first term is:
$$7 log_{10}(6n^3+3n^5)$$
The second term is:
$$n^{0.2}$$
The first term is analogous to $log (n)$, and is therefore Category Four. The second term is analogous to $n^c$, and is therefore Category Five.
Category Five terms contribute more to the value of functions than Category Four terms as $n$ approaches infinity; they dominate the asymptotic behavior of the function. As a result, $f_a(n)$ itself behaves very much like a Category Five function at sufficiently large values, and is therefore also Category Five.
In general we can say, if every term in a function has a different categorical, asymptotic behavior as its input approaches infinity, the behavior of the highest order category determines the asymptotic behavior of the function.
Take $f_g(n)$ as an example:
$$f_g(n) = 3^n + 8^n + n$$
Of the function's three terms, two are Category Six and one Category Three. Therefore, $f_g(n)$ is Category Six; it's asymptotic behavior will be described using a function from the family of functions with the form $f(n) = c^n$
Revisit the Java Interface Described Above
Let us revisit the rough Java interface described above.
interface AsymptoticAnalysis { public static Function littleOh(Function f); public static Function bigOh(Function f); public static Function littleOmega(Function f); public static Function bigOmega(Function f); public static TwoTuple littleTheta(Function f); public static TwoTuple bigTheta(Function f); }
Each method in the interface takes an instance of some Function
class and returns either an instance of the Function
class, or a two-tuple of instances of the Function
class.
AsymptoticAnalysis#littleOh
and AsymptoticAnalysis#bigOh
both take in a specific function and produce another, specific function. The function produced is guaranteed to have an equal or larger value than the function passed in for all values of $n$ greater than some, specific value of $n$.
AsymptoticAnalysis#littleOmega
and AsymptoticAnalysis#bigOmega
work similarly, except the function produced is guaranteed to have an equal or smaller value than the function passed in for all values of $n$ greater than some, specific value of $n$.
AsymptoticAnalysis#littleTheta
and AsymptoticAnalysis#bigTheta
return a two-tuple of functions. The first function in the tuple is guaranteed to have an equal or smaller value than the function passed in for all values of $n$ greater than some, specific value of $n$. The second function in the tuple is guaranteed to have an equal or greater value than the function passed in for all values of $n$ greater than some, specific value of $n$. Both instances of the Function
class in this tuple may or may not have the same state; that is, they can represent equivalent functions.
Implementing the AsymptoticAnalysis
Class
The AsymptoticAnalysis#littleOmega
and AsymptoticAnalysis#bigOmega
methods both return a function that describes a lower bound for the function passed in as their first parameter. ( These methods differ in how tight the returned lower bound is known to be. )
The generic, hand-wavy way of establishing a lower-bound for a given function in Computer Science is to merely remove terms from the input function's mathematical expression.
For example, if the given function has terms of Category Seven, Three and One, return the same function with at least one of the terms removed:
$$f(n) = log_7(4) n! + 7n + 5^{0.4} \geq log_7(4) n! = g(n)$$
Above, I arbitrarily removed the lower-order terms, but any term with a non-negative value in the domain of $n$ could have been removed. Key is to arbitrarily make the value of $g(n)$ less than or equal to the value of the value of $f(n)$ in some subset of $n$ by any mathematically accurate means.
That subset of $n$ is defined by the largest $n$ at which $f(n)$ and $g(n)$ intersect. Namely, that subset of $n$ is all values of $n$ greater than or equal to the largest intersection of $f$ and $g$.
Next, use the same analysis used on the terms of $f$ to find the categorical type of the function $g(n)$:
$$g(n) = log_7(4) n! \cong c n! \implies g(n) \in \text{Category Seven}$$
In Java pseudocode:
Function inputFunction = new Function("log_7(4) n! + 7n + 5^{0.4}"); inputFunction.getCategory(); // 7 Function outputFunction = AsymptoticAnalysis.littleOh(inputFunction); outputFunction.toString().equals("n!"); // true outputFunction.getCategory(); // 7 outputFunction.getIntersections(inputFunction).max(); // Some number in [0, Infinity )
Another way of solving the above problem is to:
- Identify the category of $f$
- Return a function from a lower category
Function inputFunction = new Function("log_7(4) n! + 7n + 5^{0.4}"); inputFunction.getCategory(); // 7 Function outputFunction = AsymptoticAnalysisCheat.littleOh(inputFunction); outputFunction.toString().equals("5^n"); // true outputFunction.getCategory(); // 6 outputFunction.getIntersections(inputFunction).max(); // Some number in [0, Infinity ) outputFunction = AsymptoticAnalysisCheatMore.littleOh(inputFunction); outputFunction.toString().equals("n log(n)"); // true outputFunction.getCategory(); // 4 outputFunction.getIntersections(inputFunction).max(); // Some number in [0, Infinity ) outputFunction = AsymptoticAnalysisCheatALot.littleOh(inputFunction); outputFunction.toString().equals("1"); // true outputFunction.getCategory(); // 1 outputFunction.getIntersections(inputFunction).max(); // Some number in [0, Infinity )
If the input function has a lot of terms of the same category, it can be expressed using an indexed notation ( vis. $\sum$, $\prod$ ). To find a lower bound of such a function, simply make the number of indexes fewer.
For example:
$$f(n) = \sum_{i=1}^{45} log_i(5) \geq \sum_{i=1}^{44} log_i(5) = g(n)$$
$f(n)$ is indexed from one to forty five; $g(n)$ is indexed from one to zero to forty four. Otherwise, $f(n)$ and $g(n)$ are the same function. $g(n)$ is less than or equal to $f(n)$ because $g(n)$ has fewer terms than $f(n)$ at every value of $n$.
Make sure the terms omitted from your lower bound have a positive value: if your input function has a term like $-log_5(\pi)$, excluding this term from your output function will only serve to make your output function larger.
Example:
$$ f(n) = \sum_{i=1}^2 (-1)^i log_5^i(\pi) = log_5^2(\pi) - log_5(\pi) \ngeq log_5^2(\pi) = \sum_{i=2}^2 (-1)^i log_5^i(\pi) = g(n)$$
AsymptoticAnalysis#littleOh
and AsymptoticAnalysis#bigOh
both return an upper bound for the given function. They work similarly to AsymptoticAnalysis#littleOmega
and AsymptoticAnalysis#bigOmega
, except these methods are implemented by either increasing the value of at least one term, or increasing the category of the highest-order term(s) in the resulting function.
Finally, AsymptoticAnalysis#littleTheta
and AsymptoticAnalysis#bigTheta
are implemented by calling one of the omega methods for a lower bound, one of the oh methods for the upper bound, and returns the resulting functions in a two-tuple.
public static TwoTuple bigTheta(Function f) { Function lowerBound = AsymptoticAnalysis.bigOmega(f); Function upperBound = AsymptoticAnalysis.bigOh(f); return new TwoTuple(lowerBound,upperBound); }
If both functions in that tuple happen to be mathematically equivalent, we can say something like:
$$ f(n) = \theta ( g(n) )$$
Proofs
Lastly, in many computer science classes on Asymptotic Analysis, charts like the following show up:
In the chart above, $f$ is our input function and $g$ is our output function.
The limit notation is merely introduced in computer science classes on Asymptotic Analysis as a means of proof: if our AsymptoticAnalysis#littleOmega
method, given a function $f$, returns a function $g$--we can prove the AsymptoticAnalysis#littleOmega
method was correct by showing:
$$\lim_{n \rightarrow \infty} \frac{f(n)}{g(n)} = \infty$$
In other words, limit notation and L'Hopital are introduced in undergraduate CS courses on Asymptotic Analysis to serve as unit tests of our understanding of Big-O, and the other methods of Asymptotic Analysis.
It can also be used to demonstrate that the various categories of functions are sorted properly.
Tl;dr
- Given a mathematical function, categorize its terms by asymptotic behavior
- Note that the least efficient term dominates the asymptotic behavior of the function
- Use a lowering technique to find a lower bound of the given function
- Use a raising technique to find an upper bound of the given function
- If these two bounds are mathematically equivalent, we can say $f(n) = \theta(g(n))$. Note, however that $f(n) = \theta(g(n))$ is making a statement about three functions: $f(n)$, a lower bound, and an upper bound.