import numpy as np
from sklearn import preprocessing, cross_validation, neighbors
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_csv('Downloads/breast-cancer-wisconsin.data.txt',skiprows=1)
df.replace('?', -99999, inplace=True)
df.drop('id', 1, inplace=True )
X= np.array(df.drop(['class'],1))
y= np.array(df['class'])
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X,y,test_size=0.2)
#clf = neighbors.KNeighborsClassifier()
clf = LinearRegression(normalize=True)
clf.fit(X_train, y_train)
accuracy= clf.score(X_test, y_test)
print(accuracy)
example_measures = np.array([[4,2,1,1,1,2,3,2,1],[4,2,1,2,2,2,3,2,1]])
example_measures = example_measures.reshape(1,-1)
prediction = clf.predict(example_measures) ##(example_measures)
print(prediction)
Problem arises when I run the above command line at Ubuntu or Anaconda:
ValueError: query data dimension must match training data dimension
How to solve that problem ? I am sure that by method of isolating individual commandline-- and find it appears Error at :
prediction = clf.predict(example_measures)
I try to use :
prediction = clf.predict(X_test).
It is ok. I really want to predict the example I create. How can I change the code?