2

What i want is including zero counts categories while generating frequencies for categorical variables

example:

df = pd.DataFrame({ 'col1': ['a', 'c', 'a', 'e'], 'col2': ['x', 'y', 'y', 'z'], })

col1 col2 a x c y a y e z

what I want is to generate a frequency table with counts and percentages including zero counts categories

results

  Counts   Cercentage
a  2          50.0%
b  0          0.0%
c  1          25.0%
d  0          0.0%
e  1          25.0%

What I have done is generating the frequency table with counts and percentages, but I need to include also the zero counts categories like b and d as illustrated above.

Here's what I have tried:

pd.concat([df.col1.value_counts(dropna=False),
           df.col1.value_counts(normalize=True,
                                dropna=False).mul(100).round(1).astype(str) + '%'],
          axis=1,
          keys=('Counts','Percentage'))

any help please

cbare
  • 123
  • 5
Espoir
  • 51
  • 1
  • 5
  • Hello there, could you please try to use Code blocks so your tables look cleaner, and give a clear example of what you try to do ? I can't understand the question... – Adept Sep 23 '20 at 09:50

4 Answers4

3

I got the answer use of Series.reindex with a list of all categories and if not match replace to 0

  categories = ['a', 'b', 'c', 'd', 'e']
  pd.concat([df.col1.value_counts().reindex(categories[::-1], fill_value=0),        df.col1.value_counts(normalize=True).reindex(categories[::-1], fill_value=0).mul(100).round(1).astype(str) + '%'],axis=1, keys=('Counts','Percentage'))

This worked for me

Ben
  • 2,562
  • 3
  • 15
  • 29
Espoir
  • 51
  • 1
  • 5
  • Welcome to DS SE! If you answered your own question, would you please mark it as such? Click the checkmark under the vote buttons on your answer. – Ben Sep 23 '20 at 14:30
2

One option is to use Pandas Categoricals. Categorical types define all the values the variables might take on, even if none of those values are observed in the data.

abcde_categorical = pd.CategoricalDtype(['a', 'b', 'c', 'd', 'e'], ordered=True)
xyz_categorical = pd.CategoricalDtype(['x', 'y', 'z'], ordered=True)

df = pd.DataFrame({ 'col1': pd.Categorical(['a', 'c', 'a', 'e'], dtype=abcde_categorical), 'col2': pd.Categorical(['x', 'y', 'y', 'z'], dtype=xyz_categorical), })

col1 col2 0 a x 1 c y 2 a y 3 e z

Then, the frequency table can be computed as follows:

pd.concat([
        df.col1.value_counts(sort=False),
        df.col1.value_counts(sort=False, normalize=True),
    ],
    axis=1
)

Sadly, DataFrame.value_counts doesn't yet as of Pandas 2.1.1 respect categoricals the way Series.value_counts does. You'd think this would work:

pd.concat([
        df.value_counts(sort=False),
        df.value_counts(sort=False, normalize=True),
    ],
    axis=1
)

...but you'd get:

           count  proportion
col1 col2
a    x         1        0.25
     y         1        0.25
c    y         1        0.25
e    z         1        0.25

The reindex trick is one way to fix that:

index_values = [
    (col1, col2)
    for col1 in abcde_categorical.categories
        for col2 in xyz_categorical.categories
]

pd.concat([ df.value_counts(sort=False).reindex(index_values, fill_value=0), df.value_counts(sort=False, normalize=True).reindex(index_values, fill_value=0), ], axis=1 )

Alternatively, DataFrame.groupby does respect categoricals. The argument observed=False is saying, "I want the zeros!"

df_freq = pd.DataFrame(df.groupby(["col1", "col2"], observed=False).size(), columns=['count'])
df_freq['proportion'] = df_freq['count'] / sum(df_freq['count'])

Maybe there's a nicer way to make groupby compute proportions, I don't know.

cbare
  • 123
  • 5
0

didn't read this whole thing - but i stumbled upon this same question. you can just convert the column you're running value_counts() on to a string and that will include nans and 0s.

max
  • 101
0

I couldn't understand the question, but if I only consider the title "including empty categories with pandas value_counts()" here's the way to do it :

df['COLUMN'].value_counts(dropna=False)
Adept
  • 864
  • 5
  • 17