DACER21. Digitization to Adapt an Entomology Collection to the Environmental Challenges of the 21st Century

DACER21 is a research project from the 2021 call for "Proyectos Estratégicos Orientados a la Transición Ecológica y a la Transición Digital" under the "Plan Estatal de Investigación Científica y Técnica y de Innovación 2021-2023".

Project Títle: Digitization to Adapt an Entomology Collection to the Environmental Challenges of the 21st Century

Code: TED2021-130795B-I00

 

1.- The Proposal.

The acceleration of the species extinction rate, due to human activity, is one of the most serious environmental problems known. It has been denounced since the late 20th century (Wilson, 1988), and in the 21st century, it continues to worsen and could potentially endanger the survival of the human species itself (Rockström, 2009; Steffen et al., 2011; Steffen et al., 2015).

There is an urgent need for precise data on biodiversity responses to long-term climate change in large geographic areas and across a wide variety of taxa (IPBES, 2019). In scientific literature (Wagner, 2020; Wagner, et al., 2021) and recent popular discourse (McCarthy, 2015; Kolbert, 2020; Goulson, 2021), alarm has spread regarding the decline of insects, the most diverse and abundant taxa on the planet, which play essential roles in ecosystem functioning and services such as pollination, nutrient cycling, and pest control (Wilson, 1987; Losey & Vaughan, 2006; Gallai, 2009; Yang & Gratton, 2014). However, published evidence regarding recent responses of insects to global change is heavily biased geographically and taxonomically and rarely extends beyond a few decades (Collen et al., 2012; Sánchez-Bayo & Wyckhuys, 2019; Wilson & Fox, 2021). Additionally, communicating the decline of insects and their importance for ecosystem function and human well-being is hindered by negative public connotations (Saunders, 2020).

Natural history museums house millions of insect specimens collected over the past two centuries and play a key role in communicating and educating the public. Museum entomological collections are therefore vital for expanding the evidence of the geographic, taxonomic, and temporal scope of insect responses to climate change (Meineke et al., 2018; Kharouba et al., 2019; Montgomery et al., 2020) and for conveying to the public the importance of insects and their fate (Alberch, 1993; Didham et al., 2020; Saunders et al., 2020). However, in practice, museum collections have not yet fulfilled this function due to limitations in infrastructure, analytical techniques, and time and personnel resources (Suarez & Tsutsui, 2004; Ries et al., 2019). New infrastructure and technologies, combined with advances in statistical modeling and big data processing, allow us to overcome these limitations (Short et al., 2018; Jönsson et al., 2021).

The aim is to take advantage of the benefits offered by new digitization techniques, digital storage, processing, and analysis to demonstrate how accessibility and exploitation of scientific information generated by the Entomology Collection of the National Museum of Natural Sciences (MNCN), with its 5 million specimens, Spain's largest insect collection, can be improved.

 

2.- The project's objectives

It will demonstrate how a natural history collection can effectively address and communicate biodiversity responses to global change through four main objectives:

1. Establishment of (a) a workflow for cataloguing and digitizing collections, (b) the computer infrastructure for storing, processing, and analyzing digital data, and (c) the application of novel means to communicate to the public the effects of climate change on organisms, with the support of the Communication Department of the MNCN.

2. Prioritize and expedite the digitization of type specimens from the MNCN Entomology Collection (more than 33,000 specimens representing over 7,000 taxa), using 2D focus stacking techniques to obtain high-resolution extended depth of field images, creating a unique resource to share with the national and global scientific and environmental community. The digitization, dissemination, accessibility, and traceability of data generated from the type specimens will transform them into Digital Extended Specimen (Hardisty et al., 2022), as interconnected data complexes, converted into FAIR representations (Wilkinson et al., 2016) of collection specimens.

Digital Extended Specimen
Figure xx. Digital Extended Specimen.

3. Significantly advance the georeferencing (Chapman & Wieczorek, 2020) and digitization of the Iberian diurnal butterfly collection at the MNCN (approximately 80,000 specimens), and these new data will be analyzed using cutting-edge techniques to observe the effects of climate change on butterflies over the last century. This involves a pre-digitization process that includes essential steps of identification-validation, cataloguing, and georeferencing. Similar to Objective 2, 2D digitization will be used for specimens of the Papilionidae family (approximately 3,000). As a test of research applications, novel analyses of changes in the distribution and seasonality of butterflies will be conducted, taking into account collection biases across space and time.

4. Develop and apply new 3D digitization methods for insect specimens (including 3D images of type specimens), and a case study will demonstrate how they can be used to assess ecomorphological responses to climate change in a mountain butterfly threatened by global warming (the Apollo butterfly, Parnassius apollo). 3D photogrammetry methods will be implemented for medium-sized entomological specimens (20-100 mm) and applied to test a priori hypotheses about the effects of global warming and mountain peak isolation on the morphology of this iconic conservation species. Finally, the aim is to develop a unique scanner in Spain and southern Europe, suitable for high-resolution 3D digitization of insects and other small-sized objects or organisms (<20 mm), following only three international pioneering groups (Nguyen et al., 2014; Ströbel et al., 2018; Plum & Labonte, 2021). For this final part of Objective 4, only these three non-commercialized models exist worldwide, making this final objective high-reward but high-risk, depending on optical engineering, computing, and construction.

 

3.- Work plan.

To achieve the proposed objectives, a work plan has been designed that contemplates five related tasks; the project workflow is described in these tasks.

Esquema de tareas en DACER21
Figure xx. Work plan in DACER21.

 

Task 1. IT infrastructure (objective 1)

The creation of the workflow and the necessary technological infrastructure for the cataloging and digitization of natural history collections, and their dissemination.

Not only is it necessary to have the necessary technological infrastructure (hardware and software), but it is also essential to design the necessary policies and protocols to define a workflow that allows achieving Objective 1. This infrastructure should enable:

1.-The storage of digitized material following established policies and protocols. This involves the proper naming of digital objects and their correct placement in storage units created in the IT infrastructure for this purpose.

2.- The recording and documentation of digital objects in the Digital Collection database of the MNCN using the Digital Object Management Application (AGOD in Spanish), which is being specifically developed for the management of this collection.

3.- The interrelation and traceability of digital objects (digitizations) with the collections and physical collection specimens that have been digitized. The system must allow for bidirectional traceability, so that from the record of a specimen in the collection management system, the digitizations of that specimen can be located and previewed if they exist. Conversely, from the record of the digital object, it should be possible to locate and consult the digitized specimen.

Esquema de infraestructuras TIC, flujo de trabajo y protocolos.
Figure xx. Outline of IT Infrastructures, Workflow, and Protocols.

 

Task 2. Cataloguing and Georeferencing:
Pre-digitization (Objectives 2 and 3)

Cataloguing and Documentation of Type Specimens. Cataloguing and Georeferencing of Iberian Diurnal Butterflies. Necessary Preliminary Step for Digitization.

The first essential step in creating a digital specimen is the proper cataloguing and documentation of the physical specimen. There are three basic requirements that must be met for proper cataloguing:

1.- Identification. Each specimen must be uniquely and unambiguously identified so that it can be recognized and distinguished from other specimens. For this purpose, a Catalog Number will be assigned and associated with it: a unique worldwide identifier around which the Digital Extended Specimen will revolve.

2.- Description. A proper description of the specimens is essential according to a pre-established data model, following a standard, which will later allow them to be found using different search criteria and shared in a coordinated manner with other infrastructures.

3.- Location. A detailed record of the precise location of each specimen must be maintained so that it can be quickly located at any time.

Objective 2 includes completing the location and, if necessary, cataloguing of any remaining type specimens.

Similarly, progress will continue in cataloguing Iberian diurnal butterflies from the collection, with special attention to recording geographic coordinates when available and conducting retrospective georeferencing when only descriptive locality data is available (Objective 3).

Task 3. 2D Digitalisation (Objectives 2 and 3)

2D Digitization of Type Specimens from the Entomology Collection and Iberian Diurnal Butterflies.

Type Specimens (Objective 2):

Once properly catalogued (pre-digitization), the specimen can be digitized, with priority given to the name-bearing types (holotype, lectotype, neotype, or syntypes). In cases of syntypic series with numerous specimens (i.e., multiple individuals on which a name is based), only some will be digitized based on their original localities, sexual dimorphism, morphological variability, etc.

For each specimen, at least two images will be taken, one with all of its labels (two images if any label has writing on both sides) and another showing the overall appearance of the specimen (dorsal habitus). In most cases, we anticipate that at least a second lateral view should be taken. In most cases, specimens will be digitized using a focus stacking technique, which involves taking multiple images at different focus distances covering all planes of the specimen. This ensures that the entire specimen is in focus (extended depth of field, EDOF) when all the images are combined using appropriate software.

Profundidad de Campo Extendida
Figure xx. Extended Depth of Field technique.

Iberian Diurnal Butterflies (Objective 3):

We will capture 2D images of specimens from the Papilionidae family at the MNCN. We have already properly catalogued (pre-digitization) approximately 3,000 specimens of the five species found on the Iberian Peninsula. These species include charismatic, large-sized species with different biogeographical associations (uncommon mountain species [Parnassius apollo, P. mnemosyne], Mediterranean/Iberian endemics [Iphiclides feisthamelii, Zerynthia rumina], and a widely distributed species [Papilio machaon]). These species are of interest both for scientific questions about the influence of the biological cycle on responses to global change and for knowledge exchange activities with the public.

Task 4. 3D Photogrammetry and Development of a High-Resolution 3D Scanner for Small Specimens (Objective 4)

The purpose is to implement a new digitization technique, 3D photogrammetry, in the Virtual Morphology Laboratory of the MNCN capable of creating 3D models from small and complex natural history specimens. In DACER21, these are the specimens from the Entomology Collection.

This task has two levels: Level 1 is straightforward and aims to create a semi-automatic short-range 3D photogrammetry system to produce 3D virtual models of medium-sized Entomology Collection type specimens (20-80 mm) and a sample of Parnassius apollo butterfly specimens for scientific analysis. Level 2 is more complex and aims to build an open-source high-resolution 3D scanner (Plum & Labonte, 2021; Nguyen et al., 2014) based on fully automated high-resolution photogrammetry for entomological specimens of small size (<20 mm).

Short-range photogrammetry (Level 1) is based on high-resolution digital images obtained by a digital camera (Luhmann et al., 2019). Color images are captured from different angles and heights with constant focal depths to produce a series of images that exhibit partial overlaps between pairs. This can be turned into a semi-automatic process using a tripod for a specific camera position height and a motorized turntable with remote-controlled camera trigger.

We will implement standardized short-range photogrammetry as a 3D digitization method for medium-sized entomological objects, ranging from 80 to 20 mm (prepared in Objectives 2 and 3) in the scanning facilities of the Virtual Morphology Laboratory of the MNCN.

However, to digitize specimens smaller than 20 mm (Level 2), we will build and use an open-source high-resolution 3D photogrammetric scanner system that is fully automated, based on technology described in two models currently used for 3D digitization of insects (Plum & Labonte, 2021; Ströbel et al., 2018):

The first scanner system is the Darmstadt Insect SCanner (DISC3D) (Ströbel et al., 2018).

The second scanner system is the "scAnt," used for arthropods (Plum & Labonte, 2021), and it will be implemented in DACER21. It consists of an open-source platform for creating 3D digital models of small objects. It includes a scanner and a User Interface (GUI) that allows for the automatic generation of Extended Depth of Field (EDOF) multiview images, similar to those produced during Task 3. These images are then masked by combining random forest-based edge detection, adaptive boundary detection, and connected component labeling for final processing using photogrammetry software like Agisoft Metashape.

Task 5. Scientific Analysis and Impact (Objectives 3 and 4)

Task 5 includes the analysis of data derived from Objectives 3 (Iberian diurnal butterflies) and 4 (3D images of Parnassius apollo), dissemination of results, public outreach, and knowledge exchange.

Analysis of Historical Effects of Climate Change on Butterflies (Objective 3).

Natural history specimens have been collected randomly, with possible biases in taxa, date, and geographic location, which complicates efforts to analyze responses in species abundance, seasonality (phenology), or geographic distribution to climate change. However, a sufficiently large number of specimens provides an opportunity to control these biases. Using metadata obtained during the cataloging (pre-digitization) of Iberian butterflies, we will investigate if there are indications of changes in distribution (e.g., towards higher latitudes or elevations) or phenology (e.g., earlier first capture or median capture dates) as conditions have warmed since the 1970s.

Ecomorphological Effects of Global Change on Parnassius apollo (Objective 4).

The Apollo butterfly (Fig. 1) has retreated to high mountain refuges as the climate has warmed, and many populations have declined or disappeared in recent years (Van Swaay, C. et al., 2010; Wilson, R. et al., 2015; Kebaïli, C. et al., 2022). Population declines, habitat fragmentation, and climate change are known to alter wing dimensions, symmetry, and sizes of the thorax and abdomen of butterflies (Wilson, R. et al., 2022), including the Apollo (Pierzynowska, K. et al., 2019). We will test the ability of our novel 3D digitization technique to detect evidence of ecomorphological changes in the Apollo, using the unique collection of 1600 specimens of the species from the Iberian Peninsula held at the MNCN, which have already been cataloged and will be digitized in 2D (Task 3). We will compare the results of analyses using 2D data from Task 3 (e.g., wing aspect ratio) with metrics derived from 3D scanning (Task 4) to determine the enhanced understanding provided by 3D data.

Three-dimensional geometric morphometrics (3DGM) uses landmarks to quantify biological shape (Bookstein, F., 1991; Zelditch, M. et al., 2012). These landmarks carry biological hypotheses, are homologous, and correspond across samples of different individuals. Recently perfected methods of 'sliding semilandmarks' (Mitteroecker, P. & Gunz, P. 2009) extend the use of 3DGM to complex shapes composed of surfaces and three-dimensional curves, which are important for DACER21, to quantify the three-dimensional size and shape of the thorax, abdomen, and wings of butterflies.

It is important to note that at each step of the 3DGM analysis, there is a direct quantitative link between statistical spaces and the space of landmark configurations, enabling powerful and interactive 3D visualization of the statistical results. DACER21 will apply these methods to analyze the 3D configurations of landmark points from the Apollo sample obtained through photogrammetry (Task 4).