Hands-on digitizing texts with machine learning and AI

Event box

Hands-on digitizing texts with machine learning and AI

Are you interested in extracting text from scanned images—even poor quality images—and learning more about new advances in optical character recognition (OCR)? Join us for a 3-hour workshop on utilizing machine learning and large language models to programmatically OCR images of text. The workshop will take participants through running Python code in collaborative notebooks to access a variety of tools used to OCR texts, including texts that might be poorly scanned or otherwise difficult to read.

This is a participatory workshop and you will have the opportunity to practice along with the instructors, as well as applying skills in exercises on your own. Our goal is that you walk away with the confidence and skills to use the software and address challenges as they arise.

The workshop is open to all VT community members. Some experience with Python is recommended, and you will need access to a Windows, Mac, or Linux computer. Instructions for setting up accounts with Kaggle, Hugging Face, and Llama will be provided before the workshop.

If you are an individual with a disability and desire an accommodation, welcome! Please email library-event-accessibility@groups.office365.vt.edu at least 10 days prior to the event.

Date:: Wednesday, May 21, 2025
Time:: 9:00am - 12:00pm
Location:: Newman 207A
Campus:: Blacksburg Campus
Audience:: Alumni Faculty/Staff Graduate Students Postdoc Public Researchers Undergraduates
Categories:: Workshop Workshop > Data Science

Registration has closed.

Presenter

Bipasha Banerjee, Chreston Miller & Jesse Sadler

Event Contact

Bipasha Banerjee

AI Research Scientist

Chreston Miller

Data & Informatics Consultant

Jesse Sadler

Digital Humanities Trainer and Project Consultant