Julian Spravil

I’m a PhD student in the Autonomous Intelligent Systems Group at the University of Bonn and a research engineer at Fraunhofer Institute for Intelligent Analysis and Information Systems. My research interests include multimodal and multilingual machine learning as well as human-centered applications.

LinkedIn Google Scholar GitHub Hugging Face

Research

My research in multimodal machine learning integrates vision, language, and audio to build assistive systems. Representative papers are highlighted.

Scaling Laws for Conditional Emergence of Multilingual Image Captioning via Generalization from Translation
Julian Spravil, Sebastian Houben, Sven Behnke

40th AAAI Conference on Artificial Intelligence (AAAI) 2026, Singapore

Scaling model size, training samples, and multilinguality enables image captioning in unseen languages via translation as auxiliary task, however, fine-tuning with full task-language coverage remains essential.

Project Page arXiv PDF Poster HuggingFace Code
HyenaPixel: Global Image Context with Convolutions
Julian Spravil, Sebastian Houben, Sven Behnke

27th European Conference on Artificial Intelligence (ECAI) 2024, Santiago de Compostela, Spain

HyenaPixel builds on the Hyena operator, extending it to bidirectional and 2D processing to capture global image context with large convolutions, enabling transformer-level accuracy without attention.

arXiv IOS Press PDF HuggingFace Code