IRAZ – Automated Easy Read Text Adaptation
IRAZ
Project Objectives
IRAZ centres on research and development of a technological solution to support the creation of Easy Read adapted content. Within the project, we develop methods and systems to adapt and simplify texts via Artificial Intelligence technology and language models. One of the main objectives of the project is to facilitate the creation of Easy Reading content to boost information accessibility for people with reading disabilities.
Research and Development
IRAZ centres on Easy Read text adaptation in Basque and Spanish. As the project relates to Language Technologies, we notably leveraged generative language models of different types and sizes. We have investigated so far adaptation via zero-shot and few-shot prompting, as well as efficient fine-tuning techniques. We also focus on preparing and validating Easy Read corpora to support research and development in the field.
Technologies
- Large Language Models
- Prompting & fine-tuning
- Corpus alignment & filtering
Use Cases
The project addresses different sectors based on the needs and use cases of the different partners in the consortium. The project notably involves experts in the field of Easy Read text adaptation, to support the creation of new adapted content.
Project Partners
- Merkatu Interactive
- Gureak Marketing
- Lantegi Batuak
- Lectura Fácil Euskadi
- Merkatu Digital
- Vicomtech (Scientific and Technological Coordinator)
Project Funding
- Partially supported by the Basque Government via the Hazitek programme - SPRI Group (2022-2024)
Relevant Publications
- Thierry Etchegoyhen, Jesús Calleja-Perez, and David Ponce (2023). IRAZ: Easy-to-Read Content Generation via Automated Text Simplification. In SEPLN (Projects and Demonstrations) (pp. 60-65). https://ceur-ws.org/Vol-3516/paper13.pdf
- Jesús Calleja, Thierry Etchegoyhen, and David Ponce. 2024. Automating Easy Read Text Segmentation. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 11876–11894, Miami, Florida, USA. Association for Computational Linguistics. https://aclanthology.org/2024.findings-emnlp.694/
- David Ponce, Thierry Etchegoyhen, Jesús Calleja, and Harritxu Gete. 2024. Split and Rephrase with Large Language Models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 11588–11607, Bangkok, Thailand. Association for Computational Linguistics. https://aclanthology.org/2024.acl-long.622/
- Jesús Calleja and Thierry Etchegoyhen (2024). IRLF: A Corpus for Easy Read Text Adaptation. In preparation.
Contact
Thierry Etchegoyhen – Principal Researcher, Vicomtech – tetchegoyhen@vicomtech.org
Looking for support for your next project? Contact us, we are looking forward to helping you.