1

I'm trying to find the same number of occurrences in both data frames This is a follow-up question for my previous question
I got 2 data frames

df1=pd.DataFrame([[1,None],[1,None,],[1,None],[1,'item_a'],[2,'item_a'],[2,'item_b'],[2,'item_f'],[3,'item_e'],[3,'item_e'],[3,'item_g'],[3,'item_h']],columns=['id','A'])
df2=pd.DataFrame([[1,'item_a'],[1,'item_b'],[1,'item_c'],[1,'item_d'],[2,'item_a'],[2,'item_b'],[2,'item_c'],[2,'item_d'],[3,'item_e'],[3,'item_f'],[3,'item_g'],[3,'item_h']],columns=['id','A'])

 df1
        id  A
    0   1   None
    1   1   None
    2   1   None
    3   1   item_a # id 1 has 1 occurrences in total in df1
    4   2   item_a
    5   2   item_b
    6   2   item_f #id 2 has 3 occurrences in total in df1(id 2 has 3 occurrences here)
    7   3   item_e
    8   3   item_e
    9   3   item_g
    10  3   item_h #id3 has 4 ccurrences in total in df1

df2 id A 0 1 item_a 1 1 item_b 2 1 item_c 3 1 item_d 4 2 item_a 5 2 item_b 6 2 item_c 7 2 item_d 8 3 item_e 9 3 item_f 10 3 item_g 11 3 item_h


I got an answer on how to find similarities by using

previous result:
d=pd.merge(df1,df2,how='inner')
        id  A
3   1   item_a # id 1 has 1 occurrences in total in d
4   2   item_a
5   2   item_b # id 2 has 2 occurrences in total in d(id 2 has 2 occurrences here which does not match all the occurrences(3) in df1)
7   3   item_e
8   3   item_e
9   3   item_g
10  3   item_h #id 3 has 4 occurrences in total in d

What I've tried to find same number of occurrences in both data frames:
d[d['id'].value_counts()==df1['id'].value_counts()]
Which gave me an error:Can only compare identically-labeled Series objects
I've also tried different things using rename to put a column name for value_counts and merge them but failed.

Match: Count of occurrences in df1 for an id match count of occurrences in result data frame d

        cnt_in_df1|cntin_d
for id1:     1    | 1  count #match => id 1 should be in the desired output.
for id2:     3    | 2  count #mismatch=> id 2 should not be in the desired output
for id3:     4    | 4  count #match => id 3 should be in the desired output.

My desired output for this question:

    id  count 
0   1    1
1   3    4

1 Answers1

1

EDIT: Thanks for you clarifying the question. So now the problem is checking the counts of ids in two data frames are the same.

Here is how you could go about it:

d1 = pd.DataFrame(df1[~df1['A'].isnull()].groupby("id").size())
d2 = pd.DataFrame(d[~d['A'].isnull()].groupby("id").size())

d = pd.merge(d1,d2,on="id")

ids_ = d[d["0_x"] == d["0_y"]].index.values

RETURN: array([1, 3])

This will now give an array of ids where the counts in both df1 and d are the same.

shepan6
  • 1,428
  • 5
  • 14
  • I'm not looking for a size or value_counts() for data frame 'd'. I'm trying to find the matches for the same number of occurrences of ids in df1 and d.
    id count 0 1 1 1 3 4 is my desired output
    – I know nothing jon snow Jun 29 '20 at 12:06
  • Right ok, one query would be that you state in your original question that id 2 has 2 occurrences in total in d(does not match df1) but they match in df1 (indices, ids and A values are the same in both data frames d and df1). Could you make more explicit in your question what is meant as a match? – shepan6 Jun 29 '20 at 13:20
  • Sure! I'll edit the question. – I know nothing jon snow Jun 29 '20 at 14:07
  • Thank you @Iknownothingjonsnow for clarifying the question, I have edited my answer accordingly. – shepan6 Jun 30 '20 at 10:05