css

Enhancing Cultural Materials: CSU

Overview

The Arizona State University Artificial Intelligence Cloud Innovation Center (AI CIC), powered by Amazon Web Services (AWS), is collaborating with Colorado State University (CSU) Libraries to enhance access to digitized cultural heritage materials. The project focuses on geolocating archival maps from the Water Resources Archive, a nationally recognized collection of historical documents on water resource development in Colorado and the West. These maps remain processed at a basic level, limiting their accessibility and usability for researchers and the public.

Problem

CSU’s collection includes maps that depict planned and existing water projects, irrigation systems, and urban development across Colorado. However, these materials lack robust metadata, making them difficult to search, analyze, or integrate into modern geospatial tools. To unlock the full value of these maps, the project aims to automate metadata generation, geolocation, and authoritative place name matching.

The primary challenges include accurately identifying geographic locations on scanned maps, linking place names to authoritative sources, and defining precise boundary coordinates. Given the large file sizes and varied visual quality of the maps, batch processing at scale is essential.

Student Spotlight

The AI CIC is powered by ASU Student Workers. The following students were assigned to this project to develop this open-source solution in partnership with the AWS and ASU mentor team. 

Approach

The ASU AI CIC team, working with Colorado State University, simplified the process of updating historical map files into modern geospatial formats. Initially, they upload .tiff map images to AWS S3 for secure, centralized storage. This action triggers an AWS Lambda function to process the images and extract text using Amazon Textract. The extracted text is then analyzed by Amazon Bedrock to identify key metadata, such as water body names. 

For accurate mapping, the team uses an open-source geolocation API to gather specific details like Geo location of water resource names, integrating old spatial data with modern maps. They convert this information into GeoJSON formats, placing detailed map data onto contemporary digital maps. These GeoJSON files are neatly organized and uploaded to GitHub for easy access and long-term preservation.

Additionally, the team automates the creation of Dublin Core metadata in CSV format to improve the discoverability and usability of the datasets. The entire process is enhanced by Amazon Bedrock for enrichment services and incorporates Docker-based Lambda layers for smooth pipeline execution. By employing these cloud-native technologies and machine learning techniques, the team ensures the historical maps are modernized effectively while preserving their essential geospatial integrity.

Industry Impact

This project will significantly improve accessibility to historical maps by enhancing searchability and usability within CSU’s digital library system. Researchers, historians, and policymakers can quickly locate maps by region, analyze historical water infrastructure, and integrate this data into modern geographic information systems (GIS). 

The project will also provide CSU with a scalable AI/ML-based approach for future archival digitization projects.By streamlining map indexing and metadata creation, CSU Libraries could reduce the manual workload, allowing archivists to focus on higher-value research tasks. This model could serve as a blueprint for other academic institutions and libraries seeking to modernize access to historical geographic materials.

"Working with the CIC on this project was a great experience in deepening our knowledge of the AWS environment and the tools available.  We have a better understanding of the strengths and weaknesses of large language models to automatically process map and archival materials.  The CIC team members were incredibly responsive to our requirements and creative problem solvers.  We’ve already been able to build upon the underlying structure for other projects and will explore other applications of the technologies in the future."

Suzi White, Director of Library Technology Services, CSU

 Wider Application

Beyond CSU Libraries, this AI-driven geospatial metadata framework can benefit various industries and organizations. Government agencies and water management authorities can leverage AI to process historical maps for infrastructure planning and environmental analysis. Museums and cultural heritage institutions can apply similar methods to improve access to historical cartographic collections. Universities and digital libraries can use AI-powered metadata extraction to streamline archival processing and enhance user access.

By harnessing AWS AI/ML capabilities, this initiative demonstrates how automation can unlock the hidden value of historical documents, making them more accessible, searchable, and useful for a wide range of academic, governmental, and public research applications.

Next Steps

CSU Libraries will continue to explore the use of large language models and AWS infrastructure to process archival materials.  This project has the potential to be built upon to further refine results for accuracy and completeness.

About the ASU CIC

The ASU Artificial Intelligence Cloud Innovation Center (AI CIC), powered by AWS is a no-cost design thinking and rapid prototyping shop dedicated to bridging the digital divide and driving innovation in the nonprofit, healthcare, education, and government sectors.

Our expert team harnesses Amazon’s pioneering approach to dive deep into high-priority pain points, meticulously define challenges, and craft strategic solutions. We collaborate with AWS solutions architects and talented student workers to develop tailored prototypes showcasing how advanced technology can tackle a wide range of operational and mission-related challenges.

Discover how we use technology to drive innovation. Visit our website at ASU AI CIC or contact us directly at [email protected].