Skip to content
Navigation Menu
{{ message }}
Multimodal Research Lead at Hugging Face.
-
Hugging Face
- Bern, Switzerland
Highlights
- Pro
I’m an engineer working at the intersection of research and real-world systems. I build and ship multimodal AI systems, focusing on vision-language models, speech, and efficient on-device inference.
At Hugging Face, I lead multimodal research and contribute to:
- Vision-Language Models (VLMs)
- Speech-to-speech and conversational systems
- Multimodal research with an emphasis on efficiency and real-world deployment
- Robotics-facing AI systems
I enjoy building things that are both technically solid and actually usable, from research code to demos and production-ready tools.
- Research prototypes and experimental ideas
- Open-source tools and demos
- Work around multimodal models, audio, and vision
- Occasional side projects
- PhD in applied machine learning (speech and generative models)
- Former senior ML engineer at Unity
- Interested in small, fast, and well-engineered models
Feel free to explore, fork, or reach out.
Pinned Loading
-
huggingface/speech-to-speech
huggingface/speech-to-speech PublicBuild local voice agents with open-source models
-
huggingface/nanoVLM
huggingface/nanoVLM PublicThe simplest, fastest repository for training/finetuning small-sized VLMs.
-
-
florence2-finetuning
florence2-finetuning PublicQuick exploration into fine tuning florence 2
-
-
tifgan/stftGAN
tifgan/stftGAN PublicTiFGAN: Time Frequency Generative Adversarial Networks
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.
You can’t perform that action at this time.





