PDF Accessibility: The Ohio State University
The Arizona State University Artificial Intelligence Cloud Innovation Center (AI CIC), powered by Amazon Web Services (AWS), partnered with The Ohio State University Libraries (Ohio State) to tackle a significant challenge in the digital era: improving the accessibility of the library’s digital collections. The Ohio State University Libraries’ collection contains hundreds of thousands of PDF documents, many of which did not meet the Web Content Accessibility Guidelines (WCAG) 2.1 Level AA standards, making it difficult or impossible for individuals relying on assistive technologies to utilize those documents. To address this issue, the AI CIC collaborated with Ohio State to develop an innovative, artificial intelligence-driven solution designed to remediate these PDFs, improving accessibility for everyone, regardless of their abilities.
Problem
Ohio State, like many institutions, faces challenges in updating documents to meet modern accessibility guidelines. With the Department of Justice’s April 2024 updates to how the Americans with Disabilities Act (ADA) will be regulated, and Ohio State’s increased emphasis on digital accessibility, the university needed a scalable solution that could quickly and efficiently bring parts of its collection into compliance with WCAG 2.1 Level AA standards. Many of the PDF documents were created before hierarchical content metadata tagging was implemented. The sheer volume of documents made manual remediation an impractical solution as it took $3 - $4 per page, necessitating an advanced, automated approach to address document structure and metadata deficiencies, such as incorrect header formatting and missing alternative text for images.
Approach
The AI CIC team employed a systematic approach, leveraging AWS services to create a robust, scalable solution that can quickly remediate PDF documents. The project started with an in-depth analysis of requirements to be compliant with WCAG 2.1 Level AA standards and a review of the accessibility issues present in Ohio State's documents. This formed the basis for developing a remediation process that used AI and machine learning to automatically identify and correct accessibility gaps. Our solution is designed to provide both efficiency and cost-effectiveness, reducing per-page remediation expense to a fraction of a penny. Below are the key AWS services used to develop this solution:
- Amazon S3: Used to securely store and manage the documents being remediated
- AWS Lambda: Automates the file processing workflows
- ECS (Fargate): Handles document processing efficiently
- AWS Step Functions: Coordinates the various processes involved in splitting, processing, and merging documents
- Amazon Bedrock: Generates alt text for images and charts using advanced LLM capabilities
For bulk processing, 10 pages would cost approximately $0.013130164 + Adobe API costs.
This approach ensures that your document remediation processes remain efficient and cost-effective, even when managing large volumes.
Optimized for scalability, the solution takes 3.5 minutes to remediate a typical 17-page PDF, making it ideal for large institutions managing high volumes of documents. Leveraging AWS services and Adobe Auto-Tag APIs, this solution encourages good accessibility compliance at minimal cost. Institutions can scale remediation efforts efficiently while keeping costs low, perfect for large document repositories needing both speed and compliance.
A key element of the solution was the use of Adobe Auto-Tag APIs which was designed to automatically clean metadata, apply appropriate tags, and further enable document remediation. The project involved continuous iterations and testing, allowing the team to refine the AI model and achieve high compliance rates efficiently.
Industry Impact and Problem Solving
The challenges faced by Ohio State in making its documents accessible are not unique. Many educational institutions, cultural heritage institutions, government agencies, and businesses struggle with complying with accessibility standards, especially given the vast number of legacy documents that exist. The solution developed in partnership with the AI CIC addresses these challenges by providing an automated, scalable approach to document remediation. This not only helps Ohio State meet regulatory expectations but also increases access to the Libraries’ digital assets for all students, faculty, and researchers.
“Our scale is massive, and we are committed to doing what's best for our patrons. With the introduction of new accessibility standards, AI and machine learning offer what may be the most viable path to success, given our resources and scope. I'm excited about the potential of using AI to enhance the experience for those we serve.”
Cory Tressler, Assistant Dean for Technology and Digital Programs, The OSU Libraries
Potential for Wider Application
The success of this project demonstrates the potential for wider application across other educational institutions, cultural heritage institutions, government agencies, and businesses or any organization facing similar challenges with document accessibility. The AI-driven solution can be customized to meet the needs of different organizations, working to achieve accessibility at scale for large document collections.
Supporting Artifacts
GitHub Link | Click Here |
Next Steps
The Ohio State University Libraries are moving toward implementing this solution as an integrated API service within existing systems, so that this service could be an on-demand solution for any patron to utilize. University Libraries are also working on implementation strategies with quality checkpoints built in to ensure an optimal user experience and that the tool is meeting end user accessibility needs.
About the ASU CIC
The ASU Artificial Intelligence Cloud Innovation Center (AI CIC), powered by AWS is a no-cost design thinking and rapid prototyping shop dedicated to bridging the digital divide and driving innovation in the nonprofit, healthcare, education, and government sectors.
Our expert team harnesses Amazon’s pioneering approach to dive deep into high-priority pain points, meticulously define challenges, and craft strategic solutions. We collaborate with AWS solutions architects and talented student workers to develop tailored prototypes showcasing how advanced technology can tackle a wide range of operational and mission-related challenges.
Discover how we use technology to drive innovation. Visit our website at ASU AI CIC or contact us directly at ai-cic@amazon.com.