captioning

Star

Here are 112 public repositories matching this topic...

facebookresearch / mmf

Star

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated Jun 17, 2026
Python

roboflow / maestro

Star

streamline the fine-tuning process for multimodal models: PaliGemma 2, Florence-2, and Qwen2.5-VL

transformers vqa objectdetection captioning fine-tuning multimodal vision-and-language phi-3-vision paligemma florence-2 qwen2-vl

Updated Jun 29, 2026
Python

fpgaminer / joycaption

Sponsor

Star

JoyCaption is an image captioning Visual Language Model (VLM) being built from the ground up as a free, open, and uncensored model for the community to use in training Diffusion models.

vlm captioning joycaption

Updated Feb 24, 2026
Jupyter Notebook

ltguo19 / VSUA-Captioning

Star

Code for "Aligning Linguistic Words and Visual Semantic Units for Image Captioning", ACM MM 2019

nlp deep-learning pytorch captioning language-generation

Updated Oct 18, 2019
Python

DavidHuji / CapDec

Star

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)

clip zero-shot-learning captioning multimodal-deep-learning gpt-2 clipcap

Updated Jan 28, 2024
Python

HaydenFaulkner / Tennis

Star

A Tennis dataset and models for event detection & commentary generation

machine-learning video computer-vision mxnet dataset tennis gluon sportsanalytics fine-grained captioning eventdetection

Updated Jun 20, 2025
Python

Labbeti / aac-datasets

Star

Audio Captioning datasets for PyTorch.

audio deep-learning pytorch dataset caption datasets captioning audio-captioning

Updated Mar 25, 2026
Python

Mauville / MedCLIP

Star

Medical image captioning using OpenAI's CLIP

machine-learning deep-learning medical-imaging clip captioning what-a-challenge-this-was

Updated Mar 7, 2023
Jupyter Notebook

mitvis / vistext

Star

VisText is a benchmark dataset for semantically rich chart captioning.

charts dataset captioning-images captioning t5

Updated Aug 10, 2025
Jupyter Notebook

Brekel / VisionCaptioner

Star

Automated image & video captioning using Qwen-VL, Gemma4 and SAM3.

image video ai captioning sam3 qwen2-5-vl qwen3-vl gemma4

Updated Apr 27, 2026
Python

drethage / fully-convolutional-point-network

Star

Fully-Convolutional Point Networks for Large-Scale Point Clouds

deep-neural-networks computer-vision deep-learning point-cloud point-clouds semantic-segmentation meshes 3d captioning

Updated Mar 22, 2019
Python

audio-captioning / clotho-dataset

Star

Python code for handling the Clotho dataset.

audio natural-language-processing deep-learning audio-signal-processing captioning audio-captioning clotho-dataset

Updated Nov 24, 2020
Python

ParitoshParmar / MTL-AQA

Star

What and How Well You Performed? A Multitask Learning Approach to Action Quality Assessment [CVPR 2019]

pytorch video-processing lstm representation-learning action-recognition video-understanding c3d video-captioning captioning fine-grained-classification multitask-learning dilated-convolution action-quality-assessment mtl-aqa fine-grained-action-recognition dilated-c3d

Updated May 5, 2025
Python

Labbeti / aac-metrics

Star

Metrics for evaluating Automated Audio Captioning systems, designed for PyTorch.

audio metrics text captioning audio-captioning

Updated Mar 22, 2026
Python

wangleihitcs / MedicalReportGeneration

Star

A Base Tensorflow Project for Medical Report Generation

tensorflow-models captioning medical-report-generate

Updated Jun 16, 2019
Python

aimagelab / pacscore

Star

[CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation

computer-vision cvpr captioning-images captioning captioning-videos vision-and-language cvpr2023

Updated Jul 29, 2025
Python

TheShadow29 / VidSitu

Star

[CVPR21] Visual Semantic Role Labeling for Video Understanding (https://arxiv.org/abs/2104.00990)

nlp video vision srl captioning captioning-videos vision-and-language grounding video-language event-relations semantic-roles

Updated Aug 17, 2021
Python

lucidrains / AoA-pytorch

Star

A Pytorch implementation of Attention on Attention module (both self and guided variants), for Visual Question Answering

vqa attention attention-mechanism captioning visual-question-answering

Updated Nov 8, 2020
Python

DavidMChan / caption-by-committee

Star

Using LLMs and pre-trained caption models for super-human performance on image captioning.

python machine-learning image ai deep-learning captioning chatgpt

Updated Oct 13, 2023
Python

audio-captioning / dcase-2020-baseline

Star

Audio captioning baseline system for DCASE 2020 challenge.

machine-learning deep-neural-networks deep-learning signal-processing audio-signal-processing captioning dcase machine-listening audio-captioning dcase2020

Updated Aug 22, 2023
Python

Improve this page

Add a description, image, and links to the captioning topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the captioning topic, visit your repo's landing page and select "manage topics."

Learn more

CS Knowledge Base

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

captioning

Here are 112 public repositories matching this topic...

facebookresearch / mmf

roboflow / maestro

fpgaminer / joycaption

ltguo19 / VSUA-Captioning

DavidHuji / CapDec

HaydenFaulkner / Tennis

Labbeti / aac-datasets

Mauville / MedCLIP

mitvis / vistext

Brekel / VisionCaptioner

drethage / fully-convolutional-point-network

audio-captioning / clotho-dataset

ParitoshParmar / MTL-AQA

Labbeti / aac-metrics

wangleihitcs / MedicalReportGeneration

aimagelab / pacscore

TheShadow29 / VidSitu

lucidrains / AoA-pytorch

DavidMChan / caption-by-committee

audio-captioning / dcase-2020-baseline

Improve this page

Add this topic to your repo