How do I answer this metric question with the data I have?

Question

I want to see what is contributing to the increase in appointments being made by patients. I have two datasets I'm analyzing and I'm not sure if I can use one column to compare with another column. This is a (simplified) example of my data:

Patients
ID   | month_joined
A110 | jan 2013 
A111 | feb 2013
A112 | april 2013
Appointments
ID   | month_of_appt | number_of_appts
A110 | jan 2013      |       2
A110 | feb 2013      |       1
A111 | april 2013    |       5
A112 | dec 2013      |       7

Does it make sense to try and compare the number of appointments per month with the month joined? I want to see if I can answer the question "does the number of patients joining have anything to do with the increase in appointments made?" Is that a bad way of doing the analysis? Should I be looking at something else?

B McMinn · Answer 1 · 2022-10-10T17:35:51.823

I am new to data analysis, but here are my thoughts.

I'm not sure why this information needs to be in two tables.

If there isn't a stated reason, then I would look at the length of each df and check if the primary key is ever repeated. If they are the same length and ID isn't duplicated ever, then I'd look at merge (Python) or JOIN(SQL).

Then add a column and calculate the difference between the two time variables and see if that has a correlation with the number_of_appts.

How do I answer this metric question with the data I have?

ID | month_joined

ID | month_of_appt | number_of_appts

1 Answers1