by Kareem Carr
Bottom line: This is a strategy for rapidly aggregating and analyzing a large number of research papers on a particular topic using online resources.
Beginning the research process can sometimes seem daunting and for this reason we often put it off. It can seem like it ought to take months. Frequently, when we do apply ourselves, we approach it in a more awkward and disorganized way that we should. However, this need not be the case. The 100-paper strategy is simply a strategy that I have found for aggregating and summarizing 100 papers in a week or less.
Organization is key. The strategy draws almost all of its power from doing this in an organized way. To this end, we must be able to execute actions quickly. In order to execute actions quickly, we must have clear simple goals at each stage and do one thing at a time. We must be ruthless in cutting away the little inefficiencies that cost us seconds or stretch into minutes or hours of indecision.
Stage I
If possible do this in one sitting. This will help you to gain an overall understanding of the work being done in your field more easily than if you spread this stage over several days or weeks.
1. Do this on a connection where you can easily download the papers without needing to enter access information for each journal. Even small delays like entering passwords can be costly. Use Google Scholar or something similar. Increase the number of search results per page to the maximum.
2. Spend 10 minutes or so coming up with appropriate search terms. Try them out; get a sense of whether you are getting the types of results that you want. If you’re having trouble, try looking at the key words listed in articles on your topic.
3. For each page of search result read as quickly as you can. Ask yourself only one question: is this relevant to my topic? If it is open it in a separate window and move on immediately. Since this is a 100 paper strategy, we stop when we have approximately 100 papers. (If you want more, you can of course, continue). DO NOT read the paper any more than you need to decide whether it is relevant. Ideally you should be able to determine this from the title and abstract only. If you can’t, it’s probably not relevant.
4. In the second stage, check if you have access. If you do, save it with a file name including the author and the year. If you do not have access, decide quickly whether it is worth gaining access. If so, then save the citation and move on. Gaining access may require an email to the right authorities or identifying which library you must visit.
5. It is possible that you may fail to get enough results. At this point, you must reconsider your search terms or explore the papers that cite the papers you have found and the ones that are cited by your papers.
In my experience this takes 8 or 9 hours allocating about 5 minutes per paper.
Stage II
6. Develop a few key questions about each paper based on the types of papers you have seen and your topic of interest. The strength of the theorems, any limitations, relevance to the whole question and the types of assumptions required is a good starting point.
7. Using a spreadsheet program, set up a spreadsheet with column headings for the citation and for the questions you identified above. Enter the name of the papers and the authors and any journal number that you might find useful.
8. Skim each paper quickly until you find the information for the appropriate columns in your spreadsheet and answer only these questions. While seemingly innocuous, the simplicity of filling in the boxes speeds up the analysis of the paper dramatically. A column for entering short comments about any anomalies can be useful. Ideally, this process should only take 10-15 minutes per paper.
Stage III
It’s important not to just let your collection of information sit there. Make it work for you. Group papers by subsections of topics. Make diagrams. Don’t be afraid to add new categories as your understanding of your topics increases. Use this spreadsheet to identify the key papers that you want to go back and read in detail.
If you follow this procedure to completion, you will have a corpus of knowledge in a spreadsheet that is easily searched and analyzed. In ten weeks, a working knowledge of 1000 papers on your topic could be attained.
holy cow! This is amazing!!!!
Great post!
This sounds like a very effective way to start gathering and making sense of resources about a topic. Here’s a variation I’ve tried: Instead of using a spreadsheet, set yourself up with a private blog. Write a short blog post about each paper you find, perhaps using a template for your key questions (from step 6 above).
Why a blog? One reason is that it’s easily searchable. An even better reason is that if you start tagging each post with keywords as you go, you’ll end up with an inductively created organizational scheme for your resources. You might start to see useful patterns in the literature.
I have to admit I’m extremely skeptical as to the value of this approach. Obtaining a “working knowledge of 1000 papers” in ten weeks is ludicrous, unless “working knowledge” means something so superficial as to be almost meaningless.
Overall, this method is simply aimed at the wrong problem entirely. There are many obstacles to doing research, but those obstacles can’t be overcome by attaining 15-20 minutes worth of knowledge about each of a hundred papers downloaded off the internet.
In fact, I’d bet this approach would actually be a net negative for many people. It’s natural to feel overwhelmed by the amount of mathematics there is out there, and this can be a big psychological barrier to doing research: it can be easier to spend time seeking background information than actually coming to terms with a research problem. This is essentially a form of procrastination, and it’s an especially seductive form since it feels so virtuous. After all, you are learning something worthwhile, even if you aren’t making progress on your research problem. In practice, lots of grad students spend too much time trying to learn more and more background, while few spend too little.
Dear Anonymous,
The points you raise are of course quite valid. As you identified, my article contains an undisclosed premise, that there is some use to understanding in a detailed way what has been written about your topic. Is this always true? I think it depends. If the answer for you is yes then this strategy will help. It is not a magic bullet. But in my experience, it is extremely effective. By ‘working knowledge’ I mean that you have delineated the work done, who did it and when they did it as it pertains to your problem and you have identified the important aspects of your problem in terms of what other people have focused on. You will also be able to identify which parts of your problem have been solved and perhaps some idea of the techniques that have been tried. Most importantly, this information should be at your finger tips because it is catalogued according to your problem. The idea is to understand how each paper relates to other papers through the lens of your problem. If your problem is novel or your approach is novel, or the research area is very new or your problem is multidisciplinary, it is quite likely no one has already summarized the work done in a way that is useful to you. But even if they have, you might notice something that they have not and your spreadsheet becomes a personalized index of papers.
In terms of being a form of procrastination, each person would have their own challenges in this area. Ideally this is something that you do once, at the beginning of your investigations and only occasionally thereafter for each important paper that you discover. In my experience, very few topics have 1000 papers without there being a large amount of redundant material and while new papers are found, they rarely add new information. As you mentioned in your closing sentence, sometimes graduate students spend too much time gathering information. In part, the aim of my article is to avoid this problem by providing an efficient and systematic method for gaining the information necessary to decide when to stop. This has the benefit of bringing the information gathering process to a close without the sense that one is missing something.
Finally, this method allows you to identify the articles on your topic that are most important and require a more in depth reading and understanding. Often the default behavior is to give each paper an equally rigorous reading which is wasteful, often produces much less useful information than it ought to and can lead to a premature abandonment of the research effort.
Any reference manager worth its weight, including some very good open source products, will do most of these tasks for you, including building citations out of almost every cite-able piece information, and will download and archive what you have access to including pdf, ps, doc, ppt, mp3, jpeg to name a few formats, all while integrating into whatever browser and office product you prefer. Most reference managers will also allow for creating hierarchical collections of references and tagging of references for important terms and topics.
As for finding relevant articles the best advice I was every given was: first figure out the jargon (surprising Wikipedia can be helpful), then find the recent review articles that summarize the literature, finally filter through the references in the review articles.
1. Thank you, Akmal and ‘Successful Researcher’; and thank you for the suggestion, Derek.
2. Dear Aaron,
I am particularly intrigued by the idea of automating the downloading of sources using a reference manager. This could speed up the technique tremendously. Can you name particular software packages that can do this?
Wikipedia and review articles are great places to find information and I use them in certain situations myself. I use this longer method only for information gathering tasks that I think are worth several days or weeks of solid effort. I think what is important here is saving time and if there is already a good enough summary on the topic you are interested in then I think it makes sense to use it. Therefore, I am talking about a small subset of situations where the available sources do not quite suit your needs. I anticipate this could happen in many ways but the most common way, in research, would seem to me to be when you are working on a new question that no one else has answered in the literature in an organized way.
Perhaps this discussion will seem abstract so bear with me. I look at non-fiction texts as an attempt by the author to solve a problem, not my problem but his problem. My problem and his may have significant overlap but it is not unlikely that they won’t and that only a small part of what he has written will be directly applicable to my problem. My aim is to find the parts of his paper that are relevant to my cause and in particular to figure out if they are useful to me in ways that the author might not have intended. Trusting in a review article to adequately inform me is the reverse of this process where I am trusting others to have anticipated my needs and fulfilled them. This may be reasonable in some cases and unreasonable in others. My goal is not to understand an authors answer fully but to characterize it. For instance, if I identify a dominant theme in the method of solutions which is exemplified by a paper written by a person named Sorensen in 1970, I might label all authors who use a similar solution as Sorensen/1970. If I can do this for all my papers then simple questions like: what types of methods are used to solve my problem and how commonly used are they can be given quantitative estimates based on my literature search. I can typically answer questions concerning when Sorensen/1970 is used, if authors differ on when it should be used, which authors differ on when it is used, how effective has its use been in various situations and which authors I might need to read in depth if I would like to become expert in a particular subcategory of its various applications. This is typically a level of facility with the literature that is not attainable though Wikipedia or review articles. In my mind, it’s the difference between off-the-rack clothing and a tailor-made suit. Both have a place in a person’s wardrobe but they are not interchangeable.
Thank you for asking which suite I use; I don’t like to recommend software unsolicited. I’ve been using the Zotero plugin for Mozilla and OpenOffice, and find it quite useful. There are a couple of provisos, first because MS Office is moving away from ActiveX the Zotero plugin only works in older versions of MS Office, second the citation styles are still limited, I think Zotero has about 20 styles.
As far as accessing files, my institution has subscriptions to a large number of electronic journals, I find most of my downloads in JSTOR,arxiv.org,project eculid, sciencedirect, blackwell-synergy, and elsevier.
I think you mentioned earlier the tendency to do too deep of a literature search, a cut off threshold is key. Even in the mathematics literature I have found that there is a point of diminishing returns where the articles become far to low ranking and speculative to consider as a solid reference. I also watch out for reference circles, where a group of authors are clearly reviewing, approving, and referencing each others work, to the exclusion of others contributions. It is relatively easy in those situations for the peer review process to become unhinged, especially as the proofs and arguments grow in complexity.
Organization is the key. I think that is the ingredient I was lacking. Thank you for this insightful article.