Researching in Digitized Libraries

(Editors’ Note: This is the first of a series of essays that explore different methods and techniques for conducting research and that delve deeper into the histories and ethics of the archives themselves.)

How Digitization has Transformed Manuscript Research: New Methods for Early Modern Islamic Intellectual History

Written by Nir Shafir

Scholars often treat manuscript libraries only as repositories of unpublished primary sources. We show up at a library, request a manuscript or two, and leave shortly thereafter with a digital or paper copy in hand or we sit at a desk for hours each day, transcribing a manuscript word by word. In most traditional manuscript libraries, this method made sense. Librarians might only pull manuscripts once a day, or even once a week, bringing only a couple of manuscripts at a time. In such conditions, the most efficient course of action is to peruse a library’s catalog, request a few key manuscripts, and read them closely.

Today, however, the mass digitization of manuscripts is blurring the long held boundaries between manuscript libraries and archives and altering the act of research in the process. Scholars often view the changes that digitization entails in a negative light as the physical document is increasingly removed from the hands of the researcher. Here, though, I would like to take a different approach and explore the true possibilities provided by digitization as scholars are able to ask new questions, discover unknown texts, and gain a different understanding of intellectual life in the early modern Islamic world in particular. My belief is that a fundamental shift has occurred now that researchers can view twenty, fifty, or even one hundred manuscripts a day rather than two to three. In what follows, I examine some of the techniques we can use and the insights we can gain when given the opportunity to look at thousands of manuscripts during a research period. Others, of course, have written about the new possibilities for historical scholarship offered by the digitization of archival material, often focusing on the chance for group projects by geographically dispersed researchers. Research with digital manuscripts, though, is still largely an individual affair that requires spending many a long hour laboring away in a dimly lit library, one’s face illuminated only by the glow of a computer monitor. The conclusions below might seem obvious to those researchers already at work in digitized manuscript libraries, but I think it is worth discussing openly the impact of these technologies on the way we research. I hope that my remarks will not only open a discussion among researchers but also inform librarians and archivists as they continue to digitize their collections.

Bookcase by Manolo Valdes

Bookcase by Manolo Valdes

Medieval Precedents and Early Modern Challenges

Our current model of manuscript research is largely the result of the preoccupation of earlier generations of scholars with the medieval Islamic period (c. 800-1200). Until recently, scholars saw this period as an ideal golden age, a time when Islamic thought reached its intellectual climax in all fields. The number of surviving manuscripts was relatively small and those texts that had survived are often only found in renditions from the early modern period (1400-1800). For scholars who studied the medieval period with a “golden-age” mindset, the exercise at hand was to take the few remaining copies of a medieval text and prepare a critical edition in order to rid the text of the corrupting accretions of the ensuing centuries. The desired result was an ur-text in the form of a printed book, reflecting the original intentions of the properly ascribed author that scholars could then use for further analysis. We bear the legacy of this model today whether we use the fruits of these scholars’ labor in research libraries or continue to create critical editions or catalogs ourselves.

When we attempt to study the relatively neglected early modern period (1400-1800) a new set of challenges emerges. The quantity of material overwhelms scholars. There is simply more: more authors, more manuscripts, more copyists, more readers, more marginal notes. Librarians estimate that two to three million Arabic-script manuscripts currently exist in the world, the vast majority copied in the seventeenth to nineteenth centuries, today stowed away in public or private collections. On top of this, many of the authors and the titles are largely unknown to most scholars. Those texts and authors, like Evliya Çelebi or Mustafa Ali, that are traditionally well-known to scholars of the Ottoman Empire only comprise a tiny sliver of this vast corpus of materials. In reality, I would estimate, albeit unscientifically, that we only know of 10-15% of the works and authors of the early modern period, and even these we often know superficially. What little secondary literature that exists can likewise mislead us as to which treatises and authors were actually popular and widely read in the period. It is my personal belief that this relative surfeit of material is due to a gradual expansion of manuscript production and a transformation in reading practices although such claims are relatively under-researched.

Using a Library as an Archive

By changing the manuscript library into an archive, digitization provides us one set of tools to tackle this vast corpus of material and to explore this altered world of early modern readership. To explain what I mean by this phrase let me briefly generalize about the traditional manner of working in manuscript libraries (although I readily recognize that the line separating manuscript libraries and archives is rather artificial). In a traditional manuscript library, you are limited to requesting only a few volumes a day. Often you are allowed only to look at one volume at a time. Since it is tedious to request repeatedly the same manuscript, which might take a few days to arrive, you take careful notes on the manuscript before returning it. The process as a whole takes quite a bit of time and so you limit yourself to those manuscripts that are directly relevant to the research project, already listed in the catalog, rather than discovering the plethora of new material. A digitized library, on the other hand, allows one to view numerous manuscripts, each copy connected to another author or work, and therefore to jump from one to another within seconds. In this sense, the manuscript library becomes a sort of archive as researchers can quickly begin to dredge numerous unknown authors and works from the depths of the library in the same way that researchers working with documents can slowly piece them together to create a larger picture.

The key to such research is a good electronic catalog that keeps texts organized by their original volume. Most works written in the Islamic world before the twentieth century, save extremely long ones, were not individual volumes or codices. Instead, they were grouped together into miscellanies called mecmua (tr.)/majmu‘as (ar.). Even early printed works from the nineteenth century often follow this format. The main value of a mecmua is that it is a collection of texts, meaning that each text often has some sort of association with the other. Mecmuas are compiled through different means. Sometimes a scribe would copy them as a series. Other times they exist as one person’s personal notes, with additions by later readers. Alternatively, a later reader can take a number of unbound works and bind them into a single volume. On rare occasions, the collected texts were simply randomly assembled. These mecmuas can be the collected essays of one specific author or a collection on a theme, such as one particular legal question, or they can be a group of similarly minded texts and authors. By looking at mecmuas, even simply through a catalog that lists them together, you can start to understand which texts were read with one another, that is, you begin to discover the intertextuality of a scholarly world and thus enter the minds of early modern people. In this fashion, you can begin to break out of the straightjacket of well-known texts and discover those thousands of (relatively) unknown authors.

My personal method, which is only one of many possibilities, is to start my research with the names of a few authors or treatises. Even a few keywords will do. Let us use dreams as an example. You type “rüya” or “rü’ya” or “ruya” into the computer catalog and fifty or so results are returned. To gain more results, you type in “rü’y” or “rüy.” You start examining the search results, one by one, taking notes of authors and titles. You look at the works in mecmuas, paying attention to those other texts compiled alongside. Often the process brings up other texts on dreams that do not necessarily have the word “dream” in the title. This then gives you more titles and author names to search. You can then take each of these authors and search them by name. Some are minor characters with only a few other treatises, others are famous authors with hundreds of treatises, yet others are false attributions. You can then look at the other treatises by each author to see if they also deal with dreams and to get a sense of the other issues that were important to them. Slowly you develop a sense of what genres dealt with dreams and visions and the important personalities that are commonly cited. You find that there are dream interpretation manuals, treatises on the veracity of dreams, and a whole line of debate on visions of the Prophet Muhammad. You can gauge which are medieval copies of old treatises, new copies of medieval treatises, or relatively new works made in the early modern period.

Even works that are titled incorrectly or vaguely, like “a treatise on dreams,” can be valuable. The false attribution is helpful in and of itself as it is often the result of a mental connection made by a reader centuries ago, picked up by an unsuspecting cataloger. Vague titles that refer to a work generically or topically rather than by its actual name can often point to a more well-known treatise whose title never contained the word “dream.” Alternatively, it could be a piece that circulated anonymously and that readers or scribes attributed to various famous figures.  After surveying the texts in this fashion, you can start to ascertain the correct titles and authors, often simply overlooked by catalogers, or by comparing the texts to other versions.

Once you find an author of interest, start by listing all of his works and every copy of each of his works. Then as you start to scan through them, look again at the mecmua in which each text is located and take note of recurrent treatises or those that pique your interest. When you look at the treatise, make sure to look at the colophon and note the copy date and the copyist as well as any marginal notes and the notes’ authors. If the author or a later reader has written a table of contents, see what they emphasize and how they organize the material.  Then you look at the mecmua as a whole, attempting to see if it was copied by the same scribe or sewn together at a later date. (If the same scribe wrote a mecmua then you can use the neighboring works in the mecmua that possess copy dates to estimate the copy date of other treatises.) If the digital copy is of sufficient quality, examine the paper type, the binding, and the sewing to gauge the overall value of the book—whether it was an expensive or cheap volume. Look at ownership statements and library endowment stamps and compare them to the reference lists. Each offers a valuable piece of information. Then you can search the names of the copyist and owners, sometimes coming up with their own works or other copies. Each time you find an intriguing treatise or author, follow that lead to see what associations you can build up. With authors who possess relatively modest oeuvres, with perhaps five to fifteen in a library, you can complete this process fairly quickly. Authors with hundreds of copies of their works will need days of scrutiny.

In the process of all this surveying you not only gain a sense of a field of literature and its authors, you also come across a great deal of minor but important minutiae hidden away in the pages of the manuscripts. You encounter favorite poems, rants, announcements of births, descriptions of historical events, legal rulings, medicinal recipes, lists of books and more. You can use these seemingly trivial asides to find new figures or to contextualize a text, assuming you can pin this material to the correct period, as any later reader could have added these bits. Catalogers often skip the personal notes and thoughts of readers and copyists since they do not necessarily have a discrete author or title, though they are often some of the most valuable sections of manuscripts. You also find many cataloging mistakes, whole treatises skipped over in haste or simply ignored because they did not appear to be worthwhile and “complete” texts. Often the most obscurely or generically labeled treatise is the most interesting, something that a cataloger overlooked because it was too hard to properly identify and describe.

Digitization as Opportunity

In short, the method I outlined above starts with a few figures and slowly establishes a network of people, places, and titles. Each new discovery becomes a new node in this world of early modern thought that can lead us to even more authors and titles. In some sense, you are creating a personal catalog or map, but rather than organizing material by alphabetical author or generalized topic, this catalog connects the writers, readers, and books of a period. Once you achieve a grasp of a period as a whole, you can then focus on particular works and read them closely. The intention of such research is never to replace the close examination of a text but rather to chart the relatively unknown intellectual world of early modern Islamic societies so you can accurately choose the most relevant texts to read.

Of course, you can do such work with the physical manuscripts, but digitization makes it practical and efficient. When you can look at twenty, fifty, one hundred manuscripts in the same day, side by side, following whatever lead you might come across, research that might have taken five years can be done in a year. Moreover, a good digital catalog allows you to search across multiple manuscripts for pieces of titles or author names in a keystroke rather than flipping through the indices of multiple volumes.

There are downsides to the digitization of manuscripts. Scholars often lament, and rightfully so, the inability to interact tactilely with a physical copy, to sense its dimensions and quality with more than just a doubly distant pair of eyes. Employees digitizing the manuscripts often forget to photograph the bindings and covers. The best manuscript libraries allow researchers to access the originals if necessary, though many do not. Some libraries combine the worst of both worlds, forcing researchers to wait for days to read a few digital copies at a time as well as refusing them the privilege of viewing the actual manuscript. Finally, a library is only as good as its catalog. If catalogs, whether paper or electronic, do not accurately list basic information or do not display the mecmua as a whole, and instead treat every treatise as an independent work, then research becomes even more difficult and inefficient. Finally, the true benefits of working with digital manuscripts only become apparent when you have tens of thousands of manuscripts to browse. Only then can you easily track down all the different copies of a treatise and see, within a few seconds, what else the author may have written. For the moment, I am of the opinion that there is only one possible location for such research—Süleymaniye Library in Istanbul—although its catalog leaves much to be desired. The other major libraries, like Dar al-Kutub in Cairo, are a long way off from complete digitization.

Despite these frustrations, I still think that the digitization of manuscripts provides unique solutions to the problem of studying early modern intellectual history in particular. We can discover many of the poorly known authors and treatises of a period (that is, poorly known to us) in an efficient manner without having to rely on sheer chance. In this sense, it might have less to offer to those researchers studying medieval Islamic societies as the vast majority of mecmuas are from the early modern period. Perhaps most importantly of all, it allows us to address that most elusive question of readership and reception. Only when we can quickly go through twenty or thirty manuscripts in a few hours, looking at comments, ownership marks and more can you start making sense of the circulation and reception of these texts. We can pay attention to the short, sundry pamphlet-like literature that was so prevalent in the early modern period, rather than focus on one grand, though seldom-read text. Digitization allows us to access the expanded world of early modern readership. No longer chained to one ur-text, we can compare the many variants and changes of a text. By paying attention to this material world of manuscript reception, we might be able to find a new path between seeing these texts either purely as repositories of facts or as representations. In this sense, although digitization has distanced researchers from the material text itself, it has simultaneously refocused our attention on the manuscript as a medium worthy of study and respect.

(Many thanks to the friends and colleagues who commented on earlier draft. Readers’ comments and thoughts are welcome and encouraged.)


Nir Shafir is a doctoral candidate at UCLA working on early modern intellectual history and history of science in the Ottoman Empire

8 November 2013

Cite this: Nir Shafir, “How digitization has transformed manuscript research: new methods for early modern Islamic intellectual history,” HAZİNE, 8 November 2013,

3 responses to “Researching in Digitized Libraries

  1. Pingback: Süleymaniye Library | HAZİNE·

  2. Pingback: Researching in Digitized Libraries | LIS 653 Knowledge Organization·

Leave a Reply

Your email address will not be published. Required fields are marked *