Find correlation between grades from two raters

Question

The question is whether we can find a correlation between two sets of grades (categorical data).

Let’s say we have a dog competition and there are 1000 dogs participating.

There are two rounds of assessment

first round dog owners give their assessment on the scale from A to C. Where A is excellent and C is bad. There are four criteria for assessment during both tours (behaviour etc).

second round one judge gives his assessment of one dog based on the same assessment criteria as in round 1. however, grades vary from M - meeting expectation, E - exceeding expectation, B - Bellow expectation.

We understand that M is B, E is A and B is C.

After two rounds our table would look like:


| dog             | round one | round two |
| --------------- | --------- | --------- |
| Dog1_criteria1  | A         | B         |
| Dog1_criteria2  | A         | E         |
| Dog1_criteria3  | A         | E         |
| Dog1_criteria4  | B         | M         |
| Dog2_criteria1  | A         | E         |
| Dog2_criteria2  | B         | M         |
| Dog2_criteria3  | A         | E         |
| Dog2_criteria4  | C         | B         |
....

How do we find a correlation between the two sets of answers? Thank you!

How about this: https://datascience.stackexchange.com/questions/893/how-to-get-correlation-between-two-categorical-variable-and-a-categorical-variab?rq=1? — TwinPenguins, Jan 30 '21 at 14:01
Remark: these variables are not really categorical, they are ordinal because there is an order: A>B>C, E>M>B. — Erwan, Jan 30 '21 at 15:19
I am afraid previous posts do not answer the question, as they offer an analysis of contingency tables so that we can see whether there is a relationship say between gender and drinking behaviour (drinks usually, always, etc.) and I need to find correlation between two sets of answers for multiple variables... — Толя Поднебесный, Jan 30 '21 at 15:37
This isn't a duplicate of https://datascience.stackexchange.com/questions/893/how-to-get-correlation-between-two-categorical-variable-and-a-categorical-variab. That question is broader and asks for a correlation measure of two different variables. This question asks specifically about ratings from two raters using similar scales. — JohnM, Feb 08 '21 at 17:47

JohnM · Accepted Answer · 2021-01-31T07:09:42.287

You can treat this as a type of inter-rater agreement problem and use Cohen's Weighted Kappa or a similar measurement. Weighted kappa takes into account the distribution of ratings for each round and the difference between grades.

Three matrices are involved: the matrix of observed scores, the matrix of expected scores based on chance agreement, and the weight matrix. Weight matrix cells located on the diagonal (upper-left to bottom-right) represent agreement and thus contain zeros. Off-diagonal cells contain weights indicating the level of separation between ratings. Often, cells one off the diagonal are weighted 1, those two off 2, etc. source:Wikipedia

The equation for weighted κ is:

$$\kappa = 1 - \frac{{\sum_{i=1}^k}{\sum_{j=1}^k}w_{ij}x_{ij}}{{\sum_{i=1}^k}{\sum_{j=1}^k}w_{ij}m_{ij}}$$

where k=number of codes and $w_{ij}, x_{ij}, m_{ij} $ are elements in the weight, observed, and expected matrices, respectively. When diagonal cells contain weights of 0 and all off-diagonal cells weights of 1, this formula produces the same value of kappa as the calculation given above.

In practice, you may want to use an implementation in Python, R, or a statistical software package rather than manual calculations.

Here is some intuition from a similar example on why to use weighted kappa (round 2 grades are converted to round 1 scale):

| dog             | round one | round two |
| --------------- | --------- | --------- |
| Dog1_criteria1  | A         | C         |
| Dog1_criteria2  | A         | B         |
| Dog1_criteria3  | A         | A         |
| Dog1_criteria4  | B         | B         |
| Dog2_criteria1  | A         | A         |
| Dog2_criteria2  | A         | A         |
| Dog2_criteria3  | A         | A         |
| Dog2_criteria4  | C         | C         |

You could look at % agreement and give a score of 6 of 8, or 0.75. That seems good, but suppose the judges just gave everyone an A. That would also be 0.75. So we need to factor in the frequency of the grades and the probability of agreement between each combination. That's where the expected matrix comes in.

Then there is the degree of agreement between two ratings. You usually want to assign more agreement to an A/B or B/C than to an A/C. And the differences may not be linear at all. The weight matrix allows you to account for the differences.

The observed matrix is simply the count of observations for each possible pair of ratings.

A final note: there are several variations of Kappa, such as quadratic weighted kappa. Most of them should work well for comparing grades.

Hi John! Thank you a lot for this post, it helps a lot. I am using R for data analysis, but did not seem to find any relevant code walk through using R. Do you by any chance know of Kappa usage using R book or page? Regardless, thank you a lot for the help!! — Толя Поднебесный, Feb 01 '21 at 08:59
It looks like the irr package has a kappa function with various options. https://cran.r-project.org/web/packages/irr/irr.pdf — JohnM, Feb 08 '21 at 17:37

Find correlation between grades from two raters

1 Answers1