Although state-of-the-art text recognition systems perform well on printed text, accurately recognizing handwritten text is still a challenge. Older handwritten text might contain unique writing styles and may have deteriorated. Nevertheless, such cases can still provide valuable information relating to the writing style. Text written by the same author could be automatically clustered based on visual similarity and used to identify the collection and reduce manual validation.
Besides text, secondary data hidden in the handwriting, ink colour, mounting paper, label shape and printed label decorations (Fig. \ref{312487}, \ref{478509} & \ref{295237}) can be used to determine their origins and history. Image analysis by itself can be enough to make clusters of specimens for particular purposes, for example, a group of specimens from a particular expedition. These clusters can also help to do further image analysis on images that share some common characteristics.

Rulers and colour checkers

Another element often seen on digitised images of collection objects are rulers, scale bars and colour checkers. These come in many different types and sizes, as different institutions often customise them based on the requirements of the imaging campaign. Colour checkers are used to validate the fidelity of the colours of the specimen image, while a ruler provides a reference to the actual size of the specimen with regards to the image size. Especially when digitising with a digital camera, it can be complex to calculate the actual dimensions of the image, as it depends on the camera lens and individual camera parameters. As it is time-consuming to measure each specimen manually, specimen dimensions are often not included as metadata. Therefore, the detection of rulers and colour checkers on digital images can prove useful to estimate the actual specimen size and correct colour balance. A generic object detection or instance segmentation model can be trained to detect these common objects. If all the rulers in a collection are of a fixed size, the length of the detected ruler can be used to calculate a transformation from pixels to the ruler’s unit of measurement (e.g. cm, mm). This transformation can then be combined with specimen segmentation models, to automatically extract the dimensions and specimen traits \citep{triki_deep_2021-1}. However, when rulers are not of a uniform size, the distance transformation needs to be estimated by calculating the pixel distance between the measurement stripes or bars on the ruler \citep*{bhalerao_ruler_2014}. To extract the specific unit of measurement, the text denoting the unit on the ruler can be recognized or additional metadata about the specimen can be used to infer it.

Finding stamps and signatures

Specimens are often labelled with rubber stamps and occasionally printed or embossed with crests that indicate provenance or ownership (Fig. \ref{295237}). For instance, the stamps of botanical exchange clubs (Fig. \ref{295237}C, \ref{295237}E), which operated in Europe, and particularly the United Kingdom, from the middle of the 19th century into the 1930s \citep{groom_herbarium_2014}. Tens of thousands of specimens were exchanged this way and found their way into collections around the world. If a specimen was part of a botanical exchange club, it implies that duplicates of this specimen existed and it circumscribes the dates within which a specimen was collected. Although stamps usually contain some text, they are often circular or oval, making them intractable to standard OCR engines.