moe
Here are 418 public repositories matching this topic...
A high-throughput and memory-efficient inference and serving engine for LLMs
-
Updated
Jun 18, 2026 - Python
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
-
Updated
Jun 17, 2026 - Python
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 600+ LLMs (Qwen3.6, DeepSeek-V4, GLM-5.1, InternLM3, Llama4, ...) and 300+ MLLMs (Qwen3-VL, Qwen3-Omni, InternVL3.5, Ovis2.5, GLM4.5v, Gemma4, Llava, Phi4, ...) (AAAI 2025).
-
Updated
Jun 18, 2026 - Python
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
-
Updated
Jun 18, 2026 - Python
FlashInfer: Kernel Library for LLM Serving
-
Updated
Jun 18, 2026 - Python
An unofficial https://bgm.tv ui first app client for Android and iOS, built with React Native. 一个无广告、以爱好为驱动、不以盈利为目的、专门做 ACG 的类似豆瓣的追番记录,bgm.tv 第三方客户端。为移动端重新设计,内置大量加强的网页端难以实现的功能,且提供了相当的自定义选项。 目前已适配 iOS / Android。
-
Updated
Jun 17, 2026 - TypeScript
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
-
Updated
Feb 1, 2026 - Python
【TMM 2025🔥】 Mixture-of-Experts for Large Vision-Language Models
-
Updated
Jul 15, 2025 - Python
MoBA: Mixture of Block Attention for Long-Context LLMs
-
Updated
Apr 3, 2025 - Python
PyTorch Re-Implementation of "The Sparsely-Gated Mixture-of-Experts Layer" by Noam Shazeer et al. https://arxiv.org/abs/1701.06538
-
Updated
Apr 19, 2024 - Python
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
-
Updated
Dec 6, 2024 - Python
Tutel MoE: Optimized Mixture-of-Experts Library, Support GptOss/DeepSeek/Kimi-K2/Qwen3 using FP8/NVFP4/MXFP4
-
Updated
Jun 18, 2026 - C
cuDNN Frontend is NVIDIA's modern, open-source entry point to the cuDNN library and a growing collection of high-performance open-source kernels.
-
Updated
Jun 17, 2026 - Python
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
-
Updated
Jun 8, 2025 - Python
An open-source solution for full parameter fine-tuning of DeepSeek-V3/R1 671B, including complete code and scripts from training to inference, as well as some practical experiences and conclusions. (DeepSeek-V3/R1 满血版 671B 全参数微调的开源解决方案,包含从训练到推理的完整代码和脚本,以及实践中积累一些经验和结论。)
-
Updated
Mar 13, 2025 - Python
⚡ Native MLX Swift LLM inference server for Apple Silicon. OpenAI-compatible API, SSD streaming for 100B+ MoE models, TurboQuant KV cache compression, MACOS + iOS iPhone app.
-
Updated
May 19, 2026 - Swift
中文Mixtral混合专家大模型(Chinese Mixtral MoE LLMs)
-
Updated
Apr 19, 2026 - Python
Improve this page
Add a description, image, and links to the moe topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the moe topic, visit your repo's landing page and select "manage topics."