OpenGVLab
We are a research group from Shanghai AI Lab focused on Vision-Centric AI research. The GV in our name, OpenGVLab, means general vision, a general understanding of vision, so little effort is needed to adapt to new vision-based tasks.
We develop model architecture and release pre-trained foundation models to the community to motivate further research in this area. We have made promising progress in general vision AI, with 109 SOTA🚀. In 2022, our open-sourced foundation model 65.5 mAP on the COCO object detection benchmark, 91.1% Top1 accuracy in Kinetics 400, achieved landmarks for AI vision👀 tasks for image🖼️ and video📹 understanding. In 2023, we created VideoChat🦜,llama-adapter🦙, 3D foundation model Ponder V2🧊 and many more wonderful works! In CVPR 2023, our vision foundation model InternImage was listed as one of the most influential papers, and by benefiting from our partner OpenDriveLab, we won the Best paper together🎉 .
In 2024, we released the best open-source VLM InternVL , video understanding foundation model InternVideo2, which won 7 Champions on EgoVis challenges 🥇. Up to now, our brilliant team have open-sourced more than 70 works, please find them here😃
Based on solid vision foundations, we have expanded to Multi-Modality models and. We aim to empower individuals and businesses by offering a higher starting point for developing vision-based AI products and lessening the burden of building an AI model from scratch.
Branches: Alpha (explore lattest advances in vision+language research), uni-medical (focus on medical AI), Vchitect (Generative AI)
Follow us: Twitter 🤗Hugging Face
Medium
WeChat
Zhihu
Pinned Loading
Repositories
- Future-L1 Public
Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction
OpenGVLab/Future-L1’s past year of commit activity - STM-Evaluation Public
OpenGVLab/STM-Evaluation’s past year of commit activity - EfficientQAT Public
[ACL 2025 Main] EfficientQAT: Efficient Quantization-Aware Training for Large Language Models
OpenGVLab/EfficientQAT’s past year of commit activity - MMT-Bench Public
[ICML 2024] | MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
OpenGVLab/MMT-Bench’s past year of commit activity - V2PE Public
[ICCV2025] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
OpenGVLab/V2PE’s past year of commit activity - InternVL-U Public
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image editing into a single framework.
OpenGVLab/InternVL-U’s past year of commit activity
Top languages
Loading…
Most used topics
Loading…