My Imports:
# Importing modules
import numpy as np
import pandas as pd
import os
import matplotlib.pyplot as plt
import cv2
from keras.utils import to_categorical
from keras.layers import Dense,Conv2D,Flatten,MaxPool2D,Dropout
from keras.models import Sequential
from sklearn.model_selection import train_test_split
My Code:
np.random.seed(1)
train_images = []
train_labels = []
shape = (108,108)
label_path = 'Beer/ModelSet/'
train_labels.append('miller_lite')
train_labels.append('stella_artois')
train_labels.append('michelob_ultra')
train_labels.append('belgian_blue')
for folder in os.listdir(label_path):
for files in os.listdir(label_path+folder):
img = cv2.imread(os.path.join(label_path,folder,files))
train_images.append(img)
train_labels = np.asarray(pd.get_dummies(train_labels).values)
train_images = np.asarray(train_images)
x_train, x_val, y_train, y_val = train_test_split(train_images, train_labels, random_state=1)
The part that fails
x_train, x_val, y_train, y_val = train_test_split(train_images, train_labels,
random_state=1)
The reason it fails
ValueError: Found input variables with inconsistent numbers of samples: [20000, 4]
My Error:
2020-08-03 23:47:11.117431: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "/Attempt-2.py", line 40, in <module>
x_train, x_val, y_train, y_val = train_test_split(train_images, train_labels, random_state=1)
File "python3.8/site-packages/sklearn/model_selection/_split.py", line 2127, in train_test_split
arrays = indexable(*arrays)
File "python3.8/site-packages/sklearn/utils/validation.py", line 293, in indexable
check_consistent_length(*result)
File "python3.8/site-packages/sklearn/utils/validation.py", line 256, in check_consistent_length
raise ValueError("Found input variables with inconsistent numbers of"
ValueError: Found input variables with inconsistent numbers of samples: [20000, 4]
The Goal:
Create a classification model that distinguishes between 4 different brands of beer bottles. MillerLite,StellaArtois,MichelobUltra, and BelgianBlue.
Background Note:
I have never had any practical/work experience with either software engineering, data science, software development, machine learning etc. I am just a student messing around for my own amusement/fun.
The Question:
How exactly do I fix this? I understand the problem is that matrix x is not the same size as matrix y. X is 20000 images of size 108,108 and 3 channels RGB. Y is the label matrix: [[1 0 0 0][0 1 0 0][0 0 1 0][0 0 0 1]] in order to split into train/test images the error says I need to have the same len/size array/matrix for both x and y.