Questions tagged [machine-learning]

How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?

From The Discipline of Machine Learning by Tom Mitchell:

The field of Machine Learning seeks to answer the question "How can we build computer systems that automatically improve with experience, and what are the fundamental laws that govern all learning processes?" This question covers a broad range of learning tasks, such as how to design autonomous mobile robots that learn to navigate from their own experience, how to data mine historical medical records to learn which future patients will respond best to which treatments, and how to build search engines that automatically customize to their user's interests. To be more precise, we say that a machine learns with respect to a particular task T, performance metric P, and type of experience E, if the system reliably improves its performance P at task T, following experience E. Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.

3322 questions
18
votes
1 answer

Why does a radial basis function kernel imply an infinite dimension map?

I understand that each kernel implies a particular feature map. For instance for $x,z \in R^2$ the kernel $K(x,z)=(\textrm{dot}(x,z))^2$ implies a feature map $$\langle\phi(x_1), \phi(x_2)\rangle=\langle [x_1^2 , x_1 x_2 , x_1 x_2, x_2^2], [z_1^2 ,…
17
votes
3 answers

The median distance from the origin to the closest data point and the curse of dimensionality

I'm reading The Elements of Statistical Learning. I have a question about the curse of dimensionality. In section 2.5, p.22: Consider $N$ data points uniformly distributed in a $p$-dimensional unit ball centered at the origin. suppose we consider a…
chyojn
  • 503
16
votes
4 answers

How to calculate Vapnik-Chervonenkis dimension

it's my first post here, so I apologize if I broke a rule! I'm reading Introduction to Machine Learning and got stuck on VC dimension. Here's a quote from the book: "...we see that an axis-aligned rectangle can shatter four points in two dimensions.…
16
votes
3 answers

Gradients of marginal likelihood of Gaussian Process with squared exponential covariance, for learning hyper-parameters

The derivation of gradient of the marginal likelihood is given in http://www.gaussianprocess.org/gpml/chapters/RW5.pdf But the gradient for the most commonly used covariance function, squared exponential covariance, is not explicitly given. I am…
aaronqli
  • 527
12
votes
8 answers

Where to start Machine Learning?

I've recently stumbled upon machine learning, generating user recommendations based on user data, generating text teaser based on an article. Although there are tons of framework that does this(Apache Mahout, Duine framework and more) I wanted to…
12
votes
3 answers

Understanding “the mean minimizes the mean squared error”

I am trying to understand the sentence the mean minimizes the mean squared error. from wikipedia https://en.wikipedia.org/wiki/Average_absolute_deviation. From a previous post Formal proof that mean minimize squared error function I can see a…
12
votes
3 answers

Could somebody elaborate "dimensional space" and "hyperplane"?

I am reading a text related to SVM, and the mathematical language is giving me a little hard time. Here training vectors xi are mapped into a higher (maybe infinite) dimensional space by the function $\theta$. SVM finds a linear separating…
Karl
  • 875
10
votes
1 answer

Is there conjectures in deep learning theory?

I often read that deep learning suffers from a lack of theory, compared to classical machine learning. I mean that deep learning has shown to be a powerful tool in practice but there is no proof of this effect in theory. Which leads to my question:…
9
votes
3 answers

Derivation of gradient of SVM loss

Here is the loss function for SVM: I can't understand how the gradient w.r.t w(y(i)) is: Can anyone provide the derivation? Thanks
Nikhil Verma
6
votes
3 answers

Properties of VC dimension

I have some difficulties with understanding the notion of VC dimension. The following is the number of question I want to answer. Q: If there is a set $|S|=k$ such that hypothesis space $H$ doesn't shatter it, does it mean that $VC(H)
user16168
  • 697
6
votes
1 answer

Boltzmann "soft max" distribution

Formula is here: $$ p(i)=\frac{e^\frac{f(i)}{T}}{\displaystyle \sum_j e^\frac{f(j)}{T}} $$ Prove: 1) Each $p(i)$ is a number between $0$ and $1$, no matter what the fitness is (positive or negative). This scheme does not require that fitness has…
6
votes
1 answer

How to weight Jaccard Similarity

I'd like to calculate the similarity between two sets using Jaccard but temper the results using the relative frequency of each item within a corpus. Jaccard is defined as the magnitude of the intersection of the two sets divided by the magnitude of…
6
votes
2 answers

Linear Function: f(ax + by) = af(x) + bf(y) vs y = ax+b.

I studied a linear function definition in our Machine Learning course: f(ax + by) = af(x) + bf(y) Using this definition, we can prove that y = ax + b is NOT a linear function. This is counter-intuitive to me. For example: f(x) = 2x + 3 is NOT…
Di Wang
  • 511
5
votes
1 answer

Clustering with Nearest Neighbours algorithm

I have to make a clustering using Nearest Neigbours algorithm (I know that it used actually for classification, but anyway). Could you please provide me a good scientific articles for this kind of clustering, especially for big data. I cannot find…
5
votes
1 answer

SVM maximum-margin distance

I study SVM It's known that the distance between two hyperplanes is $\frac{2}{\left \| w \right \|}$. The problem is I cannot prove this. Let's start. We have two hyperplanes $w \cdot x +b = 1$ and $w \cdot x +b = -1$ A plane is defined by $n$ and…
com
  • 5,612
1
2 3
14 15