0

My Imports:

# Importing modules 
import numpy as np 
import pandas as pd 
import os
import matplotlib.pyplot as plt
import cv2

from keras.utils import to_categorical from keras.layers import Dense,Conv2D,Flatten,MaxPool2D,Dropout from keras.models import Sequential

from sklearn.model_selection import train_test_split

My Code:

np.random.seed(1)

train_images = [] train_labels = []

shape = (108,108)

label_path = 'Beer/ModelSet/'

train_labels.append('miller_lite') train_labels.append('stella_artois') train_labels.append('michelob_ultra') train_labels.append('belgian_blue')

for folder in os.listdir(label_path): for files in os.listdir(label_path+folder): img = cv2.imread(os.path.join(label_path,folder,files)) train_images.append(img)

train_labels = np.asarray(pd.get_dummies(train_labels).values)

train_images = np.asarray(train_images)

x_train, x_val, y_train, y_val = train_test_split(train_images, train_labels, random_state=1)

The part that fails

    x_train, x_val, y_train, y_val = train_test_split(train_images, train_labels, 
random_state=1)

The reason it fails

ValueError: Found input variables with inconsistent numbers of samples: [20000, 4]

My Error:

2020-08-03 23:47:11.117431: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
  File "/Attempt-2.py", line 40, in <module>
    x_train, x_val, y_train, y_val = train_test_split(train_images, train_labels, random_state=1)
  File "python3.8/site-packages/sklearn/model_selection/_split.py", line 2127, in train_test_split
    arrays = indexable(*arrays)
  File "python3.8/site-packages/sklearn/utils/validation.py", line 293, in indexable
    check_consistent_length(*result)
  File "python3.8/site-packages/sklearn/utils/validation.py", line 256, in check_consistent_length
    raise ValueError("Found input variables with inconsistent numbers of"
ValueError: Found input variables with inconsistent numbers of samples: [20000, 4]

The Goal:

Create a classification model that distinguishes between 4 different brands of beer bottles. MillerLite,StellaArtois,MichelobUltra, and BelgianBlue.

Background Note:

I have never had any practical/work experience with either software engineering, data science, software development, machine learning etc. I am just a student messing around for my own amusement/fun.

The Question:

How exactly do I fix this? I understand the problem is that matrix x is not the same size as matrix y. X is 20000 images of size 108,108 and 3 channels RGB. Y is the label matrix: [[1 0 0 0][0 1 0 0][0 0 1 0][0 0 0 1]] in order to split into train/test images the error says I need to have the same len/size array/matrix for both x and y.

Ben
  • 2,562
  • 3
  • 15
  • 29
jmk98
  • 1
  • 1
  • You need 20K values for train_labels i.e. one for each data point (image). Then create the dummies and do the split – 10xAI Aug 04 '20 at 07:51

1 Answers1

0

You have to create labels for each of the images and then split it into train and test. I believe you have 20,000 images - so you have to also have 1 label for each image not jus the 4 categories alone in an array. One of the most important steps in training a DNN model to do image classification (as is your case) or any image related task is creating labels (in your case) and otherwise annotations for other problem statements. I think you have just created a array of what your labels are and havent associated them with the actual image itself.

Vivek
  • 77
  • 2