I have a huge collection of PDFs of research papers. Many of these have valuable annotations. I also have a huge .bib file containing citations for these and many other works. Is there a reference manager software where I could import the .bib file and the collection of PDFs and somehow the entries in the .bib file could be magically linked to the corresponding PDFs? I would then like to use that tool to access my PDFs (of research papers). I think this was a feature request for mendeley long back http://feedback.mendeley.com/forums/4941-general/suggestions/80946-automatically-find-pdfs-link-them-to-imported-me . As of today, I don't think that it has been implemented. I tried Quiqqa (http://www.qiqqa.com/) , but had no luck.
4 Answers
Here is one nearly automatic way to do it using Zotero (https://www.zotero.org/):
1) import the PDFs in Zotero. One way is to select multiple PDFs and drag them into a collection (in the LHS pane) of Zotero.
2) Select the PDF items (CTRL click in Windows for multiple selections), right click and select "Retrieve metadata from PDF". Note that this step searches online databases for missing information and seems fairly robust.
3) import the .bib file in Zotero
4) Go to the duplicates collection in the LHS panel and merge all the duplicates.
Issues:
1) In step 4, there may be false negatives if the automatically retrieved metadata (in step 2) is too different from the corresponding entry in the .bib file (step 3)
2) Step 2 might fail on old scanned PDFs.
- 432
- 2
- 15
One options is BibDesk (OS X), which can track links between files and associated citations.
Personally, not a fan of what it does to the .bib file, but could suit your purpose.
- 3,660
- 1
- 21
- 25
-
Thanks. I upvoted this answer but I would have preferred to use a tool that works on windows/linux/android. Do you think that I could use Bibdesk to do the association and then export the library to some other (cross-platform) tool in a way that the new tool would import the file associations too? – Abhishek Anand Mar 07 '14 at 15:06
-
1@Abhishek IIRC BibDesk stores file-system references as new fields inside citations, so yes. It's a pretty tight coupling though. I'm really not the best person to ask though; I have largely given up on automated citation tools until I can get around to building my dream tool :P – Matthew G. Mar 07 '14 at 15:18
Try Tellico
A collection manager for linux which "provides default templates for books, bibliographies, videos, music, video games, coins, stamps, trading cards, comic books, and wines."
The reference manual states the following:
"If Tellico was compiled with exempi or poppler support, metadata from PDF files can be imported. Metadata may include title, author, and date information, as well as bibliographic identifiers which are then used to update other information."
Is that useful? If so, then you can check the site for reviews of Tellico and it works on the following:
- Debian
- Ubuntu
- Gentoo
- FreeBSD
- openSUSE
- PC-BSD
- Fink (Mac OS X)
- Fedora
- Linux Mint
- Pardus
- ArchLinux
- 675
- 3
- 10
EndNote x7 has this feature, known as "PDF auto import."
I tried it and it got 0/3 of my sample PDFs correct, all from IEEE conferences initially downloaded from IEEE Xplore.
One of the three articles was closer to having a correct reference (the others were useless). But that article had PDF metadata visible in Acrobat Reader (Title, Author, Subject). EndNote got page numbers right (somehow), the DOI, but failed at the conference name, and reference type (EndNote mistakenly thought it was a journal article).
- 4,052
- 1
- 20
- 31
sqlite, you could import your bib files into Zotero and attach the files to your references by editing the relevant parts of thesqlitedatabase. – Mar 07 '14 at 00:30Jabrefhas this feature. – cbeleites unhappy with SX Mar 07 '14 at 00:43.bibfile from scratch from the pdf metadata, they want to match an existing pdf with an existing bibtex entry. This is an easier task and leaves room for some heuristics. – Federico Poloni Apr 13 '14 at 16:43