6

I have a CSV file which has 150 columns belonging to 7 categories but I want a correlation between 2 categories. The categories are movies and music, 12 and 19 columns respectively.

Is there a way to plot a correlation matrix or a correlation graph between two of the categories and selected columns?

For example, 19 columns on x and 12 columns on y. Or summing 12 and 19 columns and having a correlation between only 31 columns instead of 150.

I'm using Python. Which packages could help me?

Brian Spiering
  • 21,136
  • 2
  • 26
  • 109
Maha Kamal
  • 101
  • 1
  • 3

1 Answers1

3

I recommend you to use the following example and try to manipulate the arguments and adjust them for your work:

from matplotlib import cm
cmap = cm.get_cmap('gnuplot')
scatter = pd.scatter_matrix(YOUR_TRAINING_DATA, c = YOUR_LABELS_OF_TRAINING, marker = 'o', s = 40, hist_kwds = {'bins':15}, figsize = (12, 12), cmap = cmap)

Code and image given from coursera data science course

Green Falcon
  • 14,058
  • 9
  • 57
  • 98