Shining a light on dark data

Tucked into converted laboratory space on the second floor of the Coker Life Sciences building is the A.C. Moore Herbarium, one of many scientific repositories scattered across the country.

The University of South Carolina has stored and cataloged plant specimens in the herbarium since it was founded by former university president Andrew Charles Moore in 1907, and now Carolina is leading an effort to make the biological information contained there and in similar institutions readily available globally.

This year, the National Science Foundation funded a $2.5-million grant, called "The Key to the Cabinets: Building and Sustaining a Research Database for a Global Biodiversity Hotspot," for 13 states in the Southeast. Over the next four years, the A.C. Moore Herbarium will work closely with the Clemson University Herbarium to digitize images from their own collections as well as from seven other herbaria, all members of the Consortium of South Carolina Herbaria.

The multi-state cooperative is part of a larger nationwide effort, called “Advancing the Digitization of Biodiversity Collections."

“A biodiversity collection means basically anything in a natural history museum or something similar,” says Herrick Brown, the assistant curator of the A.C. Moore Herbarium. “It could be anything from pickled frogs to pinned insects to fossils to, in our case, herbarium specimens.”

Carolina’s herbarium houses more than 120,000 plant specimens, each meticulously detailed as to time, place and conditions of harvest. Carefully dried over about a week under controlled conditions, the samples are meant to last. They are mounted on heavy-stock paper and stored in dozens of metal cabinets. Some of the specimens are more than 200 years old.

The A.C. Moore Herbarium jumped onto the image digitization bandwagon early. Last year it purchased the requisite camera and equipment with funding from the Wade T. Batson endowment fund, which was established for the herbarium in 2002. Some 4,500 images of specimens collected by H.W. Ravenel, a Civil War-era botanist at the university, have been digitized already.

The endowment has enabled the herbarium to hire students to assemble machine-readable metadata for more than 80,000 of the facility’s specimens. Attached metadata is a crucial element that will make the regional and nationwide collection of digitized images an important scientific resource.

“Essentially what we’re dealing with is an archival biodiversity collection that contains a huge amount of what people refer to as dark data,” Brown says. “There’s a lot—a lot—of information available in there, but it’s not easily accessible.”

That will change as the digitized images are cataloged and work their way through a system of data aggregators available on the Web. The National Science Foundation established a national hub, Integrated Digitized Biocollections (, several years ago, and the data will eventually be available on an international network, the Global Biodiversity Information Facility (

“The idea is to unlock and unleash all that dark data, and expose it to a wider audience,” says Brown. “The data can be mined in a variety of new ways. There are biogeography questions, changes in plant phenology, looking at whether blooming times for species have shifted over time or vary over a latitude or an altitude gradient. All of this is possible with these geo-referenced image and data systems.”

Learn more

To help support the university's mission, visit Carolina's Promise.

Share this Story! Let friends in your social network know what you are reading about