Results that are too good to be true - peer review

Question

So, I'm reviewing a medical study (open-label trial) that compares the efficacy of different drug doses on a patient population (heart failure). The study is arguably of low-quality, compared to the landmark trials that established the benefit of this drug in this population: low sample size (~100 compared to 2000), short follow-up duration (~3 months compared to 16 months), open-label and without placebo.

The results the authors report are too good to be true - the study arm that can be directly compared with previous trials on this topic (same population, same drug dose) had a 4 times greater reduction in NT-proBNP (heart failure biomarker) than in the original study! And remember! Only in 3 months compared to 16!

Furthermore, during previous peer review, another reviewer suggested that it is a study limitation that no markers of functional capacity were available (6-minute walk test), and the authors just included it, seemingly out of thin air! (It wasn't previously mentioned in the methodology of the study).

Another indication that their data is falsified is that they report that all their measured variables were normally distributed - in my experience with similar variables, they are normally log-normal, not outright normal! (Although there is no proof that's always the case in literature).

The editor is seemingly hell-bent on publishing this paper, as it has undergone 4 rounds of review, and reviewers that reject it are being replaced one-by-one.

I know I can't reject this paper on the strong suspicion of foul play - what is the right way to tackle this?

EDIT: Thank you for your responses and comments. The authors ascribe the discrepancy to a different sample make-up, but it's (subjectively) too great to be simply due to the sample composition. As for anonymity, it didn't occur to me initially - I'll try to maintain enough ambiguity as to prevent a breach of blinding.

In case it gets published, you should raise your concern about the then-published paper on PubPeer. — anpami, Feb 20 '22 at 12:59
Do the authors try to explain the discrepancy between their results and the older ones? If so, is their explanation reasonable? If not, shouldn't they? — Andreas Blass, Feb 20 '22 at 13:06
Maybe share this with a senior colleague and see how/what they think/feel too... — The Guy, Feb 20 '22 at 13:14
Since you are (presumably) reviewing the paper anonymously, you should also anonymize your identity here. — Dan Romik, Feb 20 '22 at 15:26
"Furthermore, during previous peer review, another reviewer suggested that it is a study limitation that no markers of functional capacity were available (6-minute walk test), and the authors just included it, seemingly out of thin air! (It wasn't previously mentioned in the methodology of the study)." This just seems like poor writing, the other issues aside? — Azor Ahai -him-, Feb 20 '22 at 20:10
As it's been said many times on here, you do not reject or accept the paper, you only make a recommendation to the editor. — Kimball, Feb 20 '22 at 20:29
"reviewers that reject it are being replaced one-by-one." Are you sure that this is a reputable journal? Sounds to me like one of these "author-friendly" scam journals (MDPI, IEEE Access etc.) — lighthouse keeper, Feb 21 '22 at 11:51
You do not have to conclude that the data has been falsified to conclude that the data is unreliable and unconvincing. "Falsified" data implies conscious intent which is something that you cannot determine from your position. However, you can make a judgement as to whether the data seems good or not. IN SHORT, JUDGE THE PAPER AND IT'S CONTENTS, NOT THE AUTHORS. — RBarryYoung, Feb 21 '22 at 14:50
Is this journal a serious and fairly prestigious (i.e. major national or international) one ? Or is it the medical equivalent of some U of SomeState Law School Local Cases Review ? if the latter, just state your doubts and step aside. — Trunk, Feb 21 '22 at 15:33
If you can see the authors' names, do some investigation into their financial ties. I once reviewed an intervention trial where the lead author did not disclose that they were the lead science officer for the company manufacturing the intervention, and a little sleuthing revealed that... which, along with a bunch of sketchy stuff in the manuscript itself, was enough to recommend rejection. — Alexis, Feb 21 '22 at 18:10
"Too good to be true" Do you mean this with respect to the conclusions of the paper, or with respect to the data? In a sample of coin flips size 100, it is not entirely unlikely to get fewer than 40 heads, even if the coin is fair. On this note, if you'd like to bolster your argument, you should take a CI from this n=2000 study, and compute the probability of getting a sample "far" from the range of the "better" study. If the value for this new study is "far" from the other study in this sense, then you can support a claim that "one or the other of these studies is broken". — Him, Feb 21 '22 at 18:13
Thanks everyone for your input. I will not go into further detail as to the journal so as to maintain the anonymity of the peer review process. That said, rest assured that if it was a predatory-style minor journal I would not have given the matter that much thought. @Him The paper has a number of troubling points - for one, the data is indeed very positive. Furthermore, the authors were able to procure data from diagnostic methods that were not mentioned in the methodology when asked for by previous reviewer. This is highly irregular for a prospective randomized study. — Anastasios Tsarouchas, Feb 21 '22 at 18:48
"Another indication that their data is falsified is that they report that all their measured variables were normally distributed" - are they? Did authors provided the raw data (open science) in supplementary materials, or just some plots? Had been the study preregistered somewhere? Mind that there is an alternative (to data falsification) explanantion of a marker "included of thin air": authors measured many parameters and published only the significant ones. HARKing and p-value fishing is still a scientific misconduct. — abukaj, Feb 21 '22 at 19:03
@abukaj In the Methods section, the authors state that all normally distributed variables are reported as mean+SD, while non-normally as median (IQR) - this is standard in medical literature. Even though the authors were specifically asked by previous reviewers to double check whether ALL parameters are normally distributed, they kept them the same. As for the newly procured methods, this may be a plausible - although still unethical - explanation! — Anastasios Tsarouchas, Feb 21 '22 at 19:12
If authors provided neither raw data nor the criterion used to determine distribution normality, it may be just their arbitrary feeling... Is the reported mean less than sd or 2sd? For values that by definition are nonnegative that may be an indicator of a right skewed distribution, thus possibly log-normal. I have just run 1000 numerical simulations of sampling 100 log-normally distributed values. The maximal ratio of mean to sd was 1.33 (the minimal was 0.35). — abukaj, Feb 21 '22 at 19:51
"The editor is seemingly hell-bent on publishing this paper, as it has undergone 4 rounds of review, and reviewers that reject it are being replaced one-by-one." Replaced as in fired, or as in taken off this particular review? — Acccumulation, Feb 22 '22 at 07:56
@Acccumulation: Presumably they're being taken off the review, as reviewers are generally asked to volunteer to review a paper. For example, if you were recognized as an authority on X, then an editor considering a paper on X might email you to ask you to review that paper. — Nat, Feb 23 '22 at 07:46

Dan Romik · Accepted Answer · 2022-02-21T00:18:21.833

91

In a comment on @Buffy’s answer, you wrote:

[…] my thinking is that it would be wrong to reject these researchers purely on the suspicion of falsified data - after all, I could be wrong.

There is a misconception here that is worth pointing out. The key thing to remember that it is fully the authors’ burden to convince you that the research is correct in order for you to recommend acceptance of their paper; it is not your burden to prove that some suspected flaw you are perceiving is real before you are allowed to recommend rejection. In other words, rejection should be the default decision for any research that doesn’t meet high standards of rigor and address any reasonable criticism that might occur to a referee. So for example, if you thought there was a 25% probability the research results were unreliable and a 75% probability they were correct, then recommending rejection (or at least a revise and resubmit to allow the authors to fix the flaws you are pointing out) would be the correct decision, even though it would still be the case that “after all, you could be wrong”.

Following this logic, if the results really seem too good to be true, then it’s up to the authors to convince you that they aren’t too good to be true, by increasing their sample size, shoring up any methodological deficiencies you point out, and/or rebutting your critique about normal versus log-normal distributions. Consider giving them the chance to do so.

Being skeptical does not mean you are saying the results definitely aren’t correct, you are simply saying you’re unconvinced and that you don’t think the paper should be published until it can more rigorously defend the claims it is making. As @Buffy said, simply state your honest opinion — that is precisely your duty as a reviewer.

edited Feb 21 '22 at 00:18

answered Feb 20 '22 at 22:12

Dan Romik

189,176
42
427
636

33

+1 The fact that this seems to be medical research makes the application of this logic really important. There could be someone's life at the end of this chain. – StephenG - Help Ukraine Feb 21 '22 at 00:12
@StephenG Though in this case, any competent medical decision would be through a meta analysis (formal or informal) that would consider previously published results as well, and weight them more heavily according to sample size and study design quality. – Bryan Krause Feb 21 '22 at 00:47
1

@BryanKrause Medical research (or practice) is not my field however you'll forgive my cynicism that (IMO) there are medical decisions made without the proper or even reasonable oversight, often driven by wishful thinking or management goals. But I take your point. – StephenG - Help Ukraine Feb 21 '22 at 02:26
14

+1 all research is like this. The authors failed to convince you they weren't making shit up. Reject. – obscurans Feb 21 '22 at 03:07
1

@StephenG Oh of course, I just meant to indicate the stakes are a bit lower for OP's situation than they might be feeling. Still definitely good to raise any concerns and be cautious with suspicious results, but even if the paper gets through that isn't the final "check" before patient care. – Bryan Krause Feb 21 '22 at 15:43
Wouldn't increasing their sample size after the fact be p-hacking? – ScottishTapWater Feb 21 '22 at 15:47
2

@BryanKrause: StephenG is right. Numerous if not the vast majority of doctors make medical decisions based on very little actual scientific evidence. They rely on medical or pharmaceutical companies to make the decisions for them. Those in turn make decisions influenced by money. For instance, one doctor who was the head cardiologist in an established public hospital did not know that MRI could give superior cardiological imaging with less risks than a CT scan, and said so himself when I asked for MRI instead of CT scan, and yet he baldly asserted his 'expertise' to have the final say. – user21820 Feb 21 '22 at 15:57
@Persistence: If you increase greatly the sample size (and your new sample does not include the old one), you minimize the risk of (accidental) p-hacking. In the first place, the very notion of confidence intervals is that you must expect ≈ 5% of scientific studies that find significance in some claim at 95%-confidence level to be wrong even if they all are scrupulously honest and scientifically careful. – user21820 Feb 21 '22 at 16:09
@Persistence I don’t do experimental work so I’m not sure. But, to add to user21820’s answer, I think whether it can be interpreted as p-hacking depends on how you report on your results. If you increase the sample size and report that as a new experiment separate from the first one (which is still reported as before), with a completely new data analysis, then you are basically reproducing the experiment - that’s good and not p-hacking. By contrast, if you just add more data points but mix them in with the old ones, and replace the original analysis with a new one, then yes, that’s p-hacking. – Dan Romik Feb 21 '22 at 17:13
1

Very insightful - I chose answer because I found the way our duties as peer reviewers are described most helpful. – Anastasios Tsarouchas Feb 21 '22 at 18:36
3

@user21820 I like to use a horrifying truth disguised as a joke to illustrate this. There is a small branch of medicine called "Evidence-based medicine," which, by its very existence, implies that all other medicine is NOT evidence-based – thegreatemu Feb 21 '22 at 22:07
@DanRomik - Aye, if you treat it as a new data set then sure, if you just increment your sample size until you see something significant then you've got an issue – ScottishTapWater Feb 22 '22 at 11:14

score 79 · Answer 2 · edited Feb 21 '22 at 11:20

79

Say to the editor what you say here. You can't prevent the publication, but you can be honest. Even if there is no foul play, if the results are anomalous then there are probably methodological problems, such as sample size.

You can recommend rejection. If the editor publishes anyway and takes you off the list of reviewers, you are probably better off. Tell it like you see it. The responsibility is with the editor.

edited Feb 21 '22 at 11:20

Greg Martin

2,941
12
21

answered Feb 20 '22 at 12:50

Buffy

363,966
84
956
1,406

1

I was writing an answer very similar to this, but you have beaten me to it. +1 – Louic Feb 20 '22 at 12:57
This was my first thought too, but my thinking is that it would be wrong to reject these researchers purely on the suspicion of falsified data - after all, I could be wrong. On the other hand, if this is paper is published and is indeed falsified, it would hurt the medical community as it could be included in meta-analyses that are then skewed in a false direction... – Anastasios Tsarouchas Feb 20 '22 at 16:09
15

@AnastasiosTsarouchas Say what you know, do what you must, come what may. – Lodinn Feb 20 '22 at 17:48
7

@AnastasiosTsarouchas If the way the results of the paper are presented makes you doubt their correctness that seems to be enough reason to reject. But the entire point of reviewing is sharing your honest evaluation, even (or especially) if you suspect dishonesty. The rest is up to the editor. – Louic Feb 20 '22 at 19:05
I agree with this answer, just commenting to add that a good way to communicate this to the editor is to make use of the "confidential comments to editor" field that the review forms usually provide. – silvado Feb 22 '22 at 14:39
@AnastasiosTsarouchas Metaanalyses that include reports from shitty original papers inherit their shittiness. – Karl Feb 22 '22 at 20:40
1

@Karl "We found 42 peer-reviewed original papers on this topic but 18 were rated as shitty so we do not include them in this meta-analysis." – silvado Feb 23 '22 at 16:01
@silvado made my day :D – Karl Feb 23 '22 at 21:11
@silvado The problem is that this is almost never the case - the main results of most meta-analyses in medicine include even studies with very high possibility of bias, and then (usually) include a sub-analysis of only low-risk-of-bias studies. – Anastasios Tsarouchas Feb 25 '22 at 15:23

Tom · Answer 3 · 2022-02-22T13:48:11.933

31

''The editor is seemingly hell-bent on publishing this paper, as it has undergone 4 rounds of review, and reviewers that reject it are being replaced one-by-one.''

That's not your problem. Recommend rejection for the reasons which you give here, and then it's the decision of the editor if they still wish to publish it.

Edit: I should add, try to be tactful with your comments and don't outright accuse the authors of misconduct, at least not in such a blunt way. Even if you suspect foul play, you could be wrong.

edited Feb 22 '22 at 13:48

answered Feb 20 '22 at 19:01

Tom

3,733
1
11
29

1

And you might decide to no make further reviews for this editor or this journal. – usr1234567 Feb 21 '22 at 14:01
3

I'm extremely uncomfortable with this answer. It's precisely because of this kind of response (just give your recommendations and then it's not your problem) that we keep seeing this repeat again and again, and according to the asker even for just this one paper. If instead it was made publicly known which journal this is, then reviewers will stop getting duped by the journal! After all, it looks like the journal knows the paper is problematic but has ulterior motive (sponsored?) to publish it, and so is trying to find scapegoats (i.e. those reviewers who accept it)! – user21820 Feb 21 '22 at 16:17
1

Sorry, I don't really understand your comment. What do you mean by reviewers ''getting duped'' by the journal? How does ''the journal'' know that a paper is problematic, it is only the editor that accepts or rejects. It can be made public knowledge that the editor is doing stuff which is not moral or has some strange ulterior motive, academia is a small world. – Tom Feb 21 '22 at 18:41
1

''It can be made public knowledge that the editor is doing stuff which is not moral or has some strange ulterior motive, academia is a small world.'' Although this might be problematic as seems to breach the theoretical anonymity of being a reviewer. – Tom Feb 21 '22 at 19:14

Ethan Bolker · Answer 4 · 2022-02-20T21:23:13.727

You can indeed

reject this paper on the strong suspicion of foul play

Tell the editors what you suspect and why. You need not prove that what you suspect is true.

If the paper is published you can comment publicly on it in any way you like without violating reviewer anonymity. You might tell the editor that you may (or will) do that, in hopes that they will take your critique seriously.

Unfortunately, once published it may be cited forever even if debunked.

Andrew Gelman's blog has much to say on this subject.

He responded to email saying

Interesting. One thing that I didn't see in the thread is that every paper will get published somewhere, if the authors want to get it published. So getting it rejected at journal A is no big deal; it will still appear in journal B. Or maybe it could make a difference, if journal A is an attention-getter such as Jama, but otherwise not.

Scott Seidman · Answer 5 · 2022-02-21T16:01:13.960

I try to never attribute to malice that which may be explainable through other mechanisms, and I don't think you need to say "I don't believe the data is real" here, especially in context of the point Trunk made.

However, you are asserting that there is a well-established literature, using solid methodology showing findings quite different from what the current authors are finding. This places an extremely high burden on the authors to convince the referees that their methodology (statistical and otherwise) is correct, and that they've done all appropriate controls.

There are always serendipitous findings. Perhaps the authors have really found something substantial about the difference between biomarkers at 3 months vs longer term, and that would merit publication.

For me, the discussion section would be key here. How have the authors tried to explain the difference between their findings and earlier studies?? Are there other controls that need to be done? Do they need to extend their findings out to 16 months to demonstrate that the biomarkers return to where the published lit would predict? Why haven't they done that??

As I said, the required level of rigor that should be used when the data doesn't match the expectations of the literature is very, very high. I can't tell you how many times I've seen authors run to the community with surprising findings, rather than trying to hunt down the artifact or confounder that really explains their results, and this may be one such case.

If the authors failed to convince you that they've employed the level of rigor required to confirm a result that disagrees with a well established literature, I'd recommend that you recommend rejection solely on that basis. I'd most certainly avoid saying "I think they just made up the data" unless you're very sure that this is the case. That's a charge of academic misconduct, which is very different from saying "I think your science sucks".

This to me seems like the best answer. The suspicion of misconduct don't even matter. The paper isn't up to scientific standards, even if the authors had only the truth at heart. More needn't be said — Nearoo, Feb 22 '22 at 11:25

Trunk · Answer 6 · 2022-02-21T15:16:22.570

2

Medical research is not my field as I am from a physical sciences background.

But I do know a bit about statistics. The dangers of drawing conclusions from small samples are well documented. For the initiated, just run up a simple R (or Matlab) program that generates n values randomly from some normal distribution with some stated mean and standard deviation. Then let the program calculate the mean and standard deviation based on this randomly generated sample. Print the data table plus its mean and std deviation.

Now run this whole program several times each for n = 5, 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, 10,000.

The results should be sobering when you compare mean and standard deviation data based estimates for small n. It can change a lot from one generated sample to another.

It might also be useful, given the OP's observation that log-normal statistics is generally regarded as the most appropriate for the phenomenon in question, that the above simulation be re-run with a log-normal distribution.

If uninitiated in statistics, please get some assistance from a colleague in the statistics function of your organization.

You might append your tables to your review comments for your editor.

edited Feb 21 '22 at 15:16

answered Feb 21 '22 at 12:25

Trunk

4,418
9
28

what you are recommending is basically to calculate the standard error of the estimates. One does not need simulations to do this for normally distributed variables. I think a much better approach would be to ask the authors for their a priori sample size / power calculation, without which any interpretation of the results would be limited to be purely exploratory. – LuckyPal Feb 21 '22 at 16:22
What I recommend is exploring the effect of sample size on data spread. Depending on distribution, mean and variance, the size of a sample of adequate precision can be estimated from this exploration. BTW, as OP says that this phenomenon is widely regarded as log-normal, it might be more thorough to use this for data generation. – Trunk Feb 21 '22 at 20:48
1

yeah, that's called a power analysis or sample size calculation :) there are dozens of softwares available, so one does not need to perform their own simulations. In fact, in any reasonable journal, description of a priori sample size calculation will be mandatory. – LuckyPal Feb 22 '22 at 11:09

MrVocabulary · Answer 7 · 2022-02-22T06:39:23.097

-2

Instead of reviewing the paper you can reject it, which seems that you should do, you could try publishing a note somewhere debunking the study, or at least indicating the issues it has. Scientific criticism and responses are not as 'in' today, but there are ways to at least leave a trace--the easiest one being ResearchGate. Another issue here is that it seems the publisher seems to have wrong incentives, which is a generator of problems such as the one you described here.

edited Feb 22 '22 at 06:39

answered Feb 20 '22 at 19:33

MrVocabulary

212
1
6

2

Sorry but reviewers should never do that. – Buffy Feb 20 '22 at 20:41
13

@Buffy Once the paper is published the reviewer can do that without saying they were a reviewer. – Ethan Bolker Feb 20 '22 at 21:06
1

One of the advantages of open peer review is that reviewer comments get published alongside the paper for everyone to see. Probably ideal for this sort of problem. – rhialto Feb 20 '22 at 21:28
@EthanBolker, seems like it would upset the editor, though, with possible blowback to the reviewer. – Buffy Feb 21 '22 at 00:07
1

@Buffy An option is to resign from being a reviewer if negative reviews are getting sidelined. – MrVocabulary Feb 21 '22 at 07:39
But OP seems to think it unworthy of publication and wants to avoid being a reviewer of it if it is published at the editor's insistence. – Trunk Feb 21 '22 at 12:34
2

Why is this answer so heavily downvoted when other answers with a few upvotes also suggest that the OP can criticize the work publicly if it is published? @Buffy, why can't reviewers criticize a work publicly after it is published? There is no need for the OP to reveal they were a reviewer. – WaterMolecule Feb 21 '22 at 15:34
1

@WaterMolecule it's not an appropriate recommendation to do right now - the qualifier "after it is published" is quite important here, and this answer does not mention it. – Peteris Feb 22 '22 at 03:50
@Peteris To be absolutely fair, right now is also not mentioned by OP. – MrVocabulary Feb 22 '22 at 06:36

Results that are too good to be true - peer review

7 Answers7