23

I am trying to reproduce some of the results in a paper from 2018 where the results were included in a graph but no exact values were given. There is no results table to go with the graph. I can't exactly compare what I produce with my guesses as to what the values in the graph are. They also did not specify one of the parameters used to produce the results.

I have emailed the authors multiple times and they keep ignoring my question about sharing these values with me and apparently do not have the parameter values as they don't have the logs any longer. I am unsure how to proceed with this. It's a paper published in a high-impact well respected journal.

Any help or guidance would be appreciated. I am not being rude or brash about the way I've asked for these results. I just explained I want to compare to them and can't use the graph because I need some level of accuracy.

Tommi
  • 8,904
  • 3
  • 35
  • 56
Espore
  • 354
  • 2
  • 7
  • Why can you not evaluate your findings plotted in the same way? Are they close enough to be convincing that your two experiments find the same outcome? Are you trying to perform a statistical test to see whether you have the same outcomes? – Azor Ahai -him- Nov 01 '20 at 20:24
  • 10
    Its unclear how you know that they don't have the lgos any longer if they are ignoring you? – Ian Sudbery Nov 01 '20 at 21:27
  • 1
    @Mark 2018 it was first published – Espore Nov 01 '20 at 21:56
  • @AzorAhai--hehim Because I don't have the numerical values so it's hard to say how similar or different the results are. Their graph is quite low resolution. I could try and plot my results on the same graph but it will be hard to infer much useful. I can replot it by guessing the values they've used but that seems like the best i can do – Espore Nov 01 '20 at 21:56
  • 1
    @IanSudbery They don't have the parameter I asked for and said that's because they don't have the logs but haven't responded on the rest of the results – Espore Nov 01 '20 at 21:57
  • 14
    "they don't have the logs" - does anybody honestly believe this? It's excusing malice by pleading incompetence. If the journal truly is respectable they'd have open data rules, can you force these people to properly show the work they're happy to reap publication prestige from? Also be aware of course, that by asking uncomfortable questions about peoples' data you burn bridges and make enemies. Science talks a game of openness and scepticism; the reality is more like @Buffy's first couple of paragraphs. Get used to that kind of post-hoc excuse – benxyzzy Nov 02 '20 at 06:42
  • 10
    What is your question? This is a question-and-answer site, so we require you to articulate a specific answerable question. I don't see a question in your post. "Any help or guidance would be appreciated" are too open-ended to be a good fit here. See our [help/how-to-ask]. – D.W. Nov 02 '20 at 08:13
  • 1
    In some circumstances in some jursidictions you might be able to take them to court. – Bob says reinstate Monica Nov 02 '20 at 17:01
  • 14
    @benxyzzy “does anybody honestly believe this?” — Yes. Anybody who has ever worked in science should believe this, because it’s depressingly common. It’s not malice, it’s just standard incompetence, and — at a close approximation — every starting scientist has done this at least once. I know I have, and I know many of those I did my PhD with have as well. I’m not excusing this practice but I can confidently state that it doesn’t require malice or even negligence other than what’s explained by lack of experience. – Konrad Rudolph Nov 03 '20 at 14:21
  • 5
    Just to second Konrad's comment - it's stupendously easy to lose data. Just because the paper was published in 2018 doesn't mean the data was that recent. It could have closed in 2016 and taken ages to review. Could have been based on a student's project who didn't organize very well. I've seen far more poorly organized labs than well-organized ones. – Azor Ahai -him- Nov 03 '20 at 15:44
  • 1
    @D.W. There are multiple questions but is more concise as a single post. "How best to response to this?" "What is the consensus on this from a scientific POV?" "Is this normal?" "Is there any normal approach to get around this?" etc. I figured it wouldn't make sense to split this into multiple different Qs – Espore Nov 04 '20 at 00:11
  • @KonradRudolph That's a fair approach. I would have hoped they could have asked the other authors on the paper or given some approximate ranges for the parameters. It's frustrating they couldn't at least try to encourage someone to reproduce their work – Espore Nov 04 '20 at 00:13
  • 2
    @Espore Please don't leave the questions in the comments. Instead, [edit] your post to state the question explicitly. Comments exist to help you improve your post - we don't want people to have to read the comments to understand what you're asking. Asking multiple questions in a single post doesn't always go well; some of those are a matter of opinion and may not be a good fit here. – D.W. Nov 04 '20 at 01:27
  • 1
    @KonradRudolph It's a "hot" question. Every time a question here makes the hot questions list, it gets flooded with visitors who vote, or comment, or vote on comments, without having much of an understanding of the situation. Hence the ridiculous suggestions such as "take them to court". – Szabolcs Nov 04 '20 at 12:06
  • 1
    @KonradRudolph Also, I wouldn't even call it incompetence. We cannot even tell from the OP's question that the information they're asking for is actually relevant, not to say crucial. Most people assume it is, but students get lost in details all the time. – Szabolcs Nov 04 '20 at 12:16
  • Here's a possiblity: For all we know, some numerical work was done by a student who has now graduated, and is no longer in academia. The code they wrote is somewhat sloppy and not well documented (after all, they were a student, not an experienced scientist). Their supervisor can't start using it again, either because they don't know the tools well enough, or even if they do, it may take a significant amount of time to understand all its details again. It would not be surprising at all that they are not willing to dig up all the details, especially if they are not crucial to the main result – Szabolcs Nov 04 '20 at 12:19
  • 1
    The matter of data and the ethical issues you impute to the article author(s) are given some coverage here: problematic datasets. Paragraph starting after the "Borader Impact Length" chart. It does not directly address your concern, but it provides a fleeting insight into NeurIPS concerns regarding datasets. – pot3mkin Nov 04 '20 at 02:24
  • 1
    Is the reproduction of their work critical to your work, or is it a case of it would be emotionally nice—though scientifically worthless—if your work gave results the same as something which cannot be reliably reproduced? – Andrew Morton Nov 04 '20 at 18:42
  • @AndrewMorton I'm trying to analyse how the methods they present behave depending upon the choice of parameters. I don't have the numbers for that graph so I can only include an approximation of their results compared to mine. I also can't discuss how the parameters used for their results affect the behaviour. I've reproduced their exact model but the exact results they've produced seem not to match up exactly with the theory in their paper. Essentially their theory doesn't match the graph and I am trying to determine why. – Espore Nov 06 '20 at 15:30
  • 1
    @Szabolcs I'm genuinely unsure what your point is. I'm basically trying to understand how their results match up to their claims and their theory. I don't think they do match. I can't seem to reproduce their exact results even though I have reproduced the exact model which behaves exactly as my understanding of the theory suggests. I wanted their exact values so I can show this nicely on a graph, giving a quantitative measure of how different the results are. Also the parameters used to produce their results, or what they say they are, would also help me to understand the differences. – Espore Nov 06 '20 at 15:35

6 Answers6

41

There is no results table to go with the graph.

Use Datathief: https://datathief.org/

Despite the name, in most countries using Datathief is explicity allowed by copyright law. Data cannot be copyrighted.

They also did not specify one of the parameters used to produce the results.

Two possibilities: it does not actually matter or peer review has failed. When it is your turn to peer review, check for this type of mistake. Sorry I cannot not help in this situation.

Paper authors can choose not to answer their email if they wish.

Anonymous Physicist
  • 98,828
  • 24
  • 203
  • 351
  • 4
    https://academia.stackexchange.com/questions/7671/software-for-extracting-data-from-a-graph-without-having-to-click-on-every-singl/7745#7745 – Anonymous Physicist Nov 01 '20 at 23:07
  • 1
    Datathief works well; in the old days we would have manually read the data off the graph at key points for comparison. It may be that the authors aren't used to open data or sharing their underlying data - and if the journal has a page limit there's no room for a table, unless supplemental information is an option – Chris H Nov 02 '20 at 08:50
  • 9
    A good opensource alternative is WebPlotDigitizer. – Anyon Nov 02 '20 at 14:08
  • 10
    'Despite the name, in most countries using Datathief is explicity allowed by copyright law. Data cannot be copyrighted.' Although for the avoidance of doubt, this doesn't remove the ethical requirement to cite the data source when you use the data. – Daniel Hatton Nov 02 '20 at 16:40
19

Maybe not very nice, but hardly "misconduct". You might need to reproduce more of their "experiment" to get new data. Their data is their own, I think.

It may be that the reviewers saw more than the authors are willing to share publicly. And it is possible that they have future plans for the data that would make release at this time unwise.

If you want to reproduce the research you will need to do more than re-use their data. In fact, the results you would gain by looking at fresh data would likely be more valid than reusing theirs. You might find, in fact, that their conclusions aren't supported, assuming it is a statistical argument. Or, you might give it additional credence from using different data.

Buffy
  • 363,966
  • 84
  • 956
  • 1,406
  • 1
    I don't think it is misconduct in any way - I didn't say that, or I hope it didn't come across that way.

    Is it not the same as the results they've already shared in the graph though? But in numerical form? My thoughts were that it's purely just that they've already 'shared' those results just in such a form that they're not useful for numerical comparison.

    I'm not after any of the actual data, just the numbers to match the graph and if possible the parameters to reproduce those numbers.

    – Espore Nov 01 '20 at 21:53
  • 10
    You had it tagged misconduct originally. I was responding to that. – Buffy Nov 01 '20 at 22:28
  • 7
    Not sharing the raw data associated to a plot in a published paper maybe isn't misconduct, but I find it definitively shady. I think that journals ought to require that the raw data used for a plot, and all the analysis pipeline and generation code used to create it, should be submitted to the journal and made public along with the paper, in the name of reproducibility. Sadly, most journals don't currently bother with this. – a3nm Nov 02 '20 at 10:48
  • 7
    I'm not sure its even "shady", its probably just lazy and is definitely poor practice. Requiring all raw data behind every figure is becoming more common. For example, Nature journals require this now. – Ian Sudbery Nov 02 '20 at 14:52
  • @a3nm The journal is NeurIPS – Espore Nov 04 '20 at 00:14
12

Authors are not in any way obligated to comply with random requests for additional information about their paper, nor do they even need to respond at all.

If they provide what you ask for, it would be them doing you a favour.

Their obligation is to the journal and the reviewers. This obligation mostly ends after they've made the reviewers happy and their paper is approved for publishing.

Some egregious problems might cause a paper to get retracted, but some missing data probably doesn't meet the threshold for that.

If contacting them has not worked for you, then you're probably out of luck. Contacting them multiple times is likely to only annoy them.

This applies in general. Specific journals or jurisdictions may have different policies regarding the above.

NotThatGuy
  • 401
  • 3
  • 6
  • 7
    In some circumstances, in some jurisdictions, authors/institutions are obliged to respond to Freedom Of Information requests. – Bob says reinstate Monica Nov 02 '20 at 17:04
  • 3
    @BobsaysreinstateMonica indeed, some funders make data sharing a condition of funding. At the research funder I work for we will if necessary take an appropriately anonymised copy of study data and hand it over to other reputable researchers if the original data owner won’t do it. Luckily we’ve never had to - once the original researcher realises what’s in the contract they produce the data, on those rare occasions they didn’t want to share. – rhialto Nov 02 '20 at 19:58
  • 7
    “If they provide what you ask for, it would be them doing you a favour.” — Nah, that’s nonsense. It’s not a favour, it’s good scientific practice. Even if they’re not legally obliged, they’re ethically and professionally obliged to respond to earnest requests by peers. They only don’t need to respond to crackpots. – Konrad Rudolph Nov 03 '20 at 14:24
  • 4
    @KonradRudolph It just seems frustrating when im trying to spend my time reproducing their work - something which should be helpful to the research community. I'm only asking for the numbers for the graph which was already presented. It's almost just like asking for a higher resolution graph in a sense. But also the parameters which they describe, provide a formula to calculate, but don't say what values they used to produce these results - which seems odd to me. – Espore Nov 04 '20 at 00:17
  • Sorry but no, this is not the right view, definitely not in the era when so many research papers were shown to be non-reproducible. If you don't respond to requests (and don't publish the data on your website), you can now by default be assumed to be a fraud. – JonathanReez Dec 10 '21 at 20:06
8

Check the journal's guidelines and/or contact the editor. Sometimes journals require data and code to be made available (which is obviously the right thing to do) and authors that refuse to do so are breaching the agreement under which they published.

If the journal does not, then you are out of luck. Sadly there are many researchers out there who think science is a race and will do everything in their power to be the ones ahead. This is reprehensible but unavoidable.

I'd refrain from using their results and actively call them out if you publish anything similar, i.e.: attempts where made to reproduce the results from XXX et al. but the authors refused to share their data/code/methodology.

Gabriel
  • 2,676
  • 2
  • 20
  • 30
  • 1
    The journal is NeurIPS. I can't see anything on sharing data from my checks.

    If i intended to publish something similar how is it best to approach this? I wouldn't put something so straight forward, but perhaps "we were unable to obtain results from XXX and so we reproduced them.."

    – Espore Nov 04 '20 at 00:20
  • 1
    You could contact the editor and ask directly. What you propose looks good to me. – Gabriel Nov 04 '20 at 12:38
  • 2
    "call them out" still gives them cite points and hence maybe a wage increase etc. – Ian Nov 04 '20 at 16:32
  • 1
    Ian that's a valid point, but there's no way to get around it unfortunately. Not mentioning the article (the authors really) is the same as giving them a free pass at doing it again. – Gabriel Nov 04 '20 at 17:41
1

Usually research communities are small and you will see each other at the next conference or the conference after that. Than you can ask them in person. Either informally between the sessions or when they present new research based on their paper. It is way more difficult to brush someone away in a room packed with respected peers who will be the referees for their next paper.
But act carefully, you don't want to affront them and turn them into an enemy!

usr1234567
  • 5,748
  • 14
  • 36
-4

So do your own experiment and make your own graph. Compare the graphs. Why do you need the specific data anyway? There will be noise aka experimental errors in all the data so they wont be exactly the same every time which makes needing the original data irrelevant.