GLM-5
Developed by Zhipu AI (ZAI)
Key Capabilities
- 744B parameters with only 40B active per token (MoE efficiency)
- 202K token context window with sparse attention
- Best-in-class agentic engineering and systems tasks
- Native tool calling with optimized parsers
- SWE-bench Verified 77.8%, AIME 2026 92.7%, GPQA-Diamond 86.0%
VRAM Requirements by Quantization
Choose the right GPU based on your performance and quality needs.
| Model / Quantization | VRAM Required |
|---|---|
| active FP16 | 80GB |
| full FP16 | 1.5TB |
| Q4 | 120GB |
Use Cases
GLM-5 (744B total (40B active via MoE)) can be deployed for enterprise AI applications including document processing, code generation, data analysis, and conversational AI. License: MIT.
Run GLM-5 with Petronella
PTG deploys GLM-5 for enterprises needing frontier-class agentic AI under MIT license. 744B parameters with only 40B active delivers cost-effective inference for complex software engineering, security analysis, and systems automation tasks.
Recommended Hardware
| Model Size | Recommended GPU |
|---|---|
| Q4 | DGX Spark (128GB) or 2x RTX PRO 6000 (192GB) |
| FP16 | DGX Station GB300 (384GB) or 4x RTX PRO 6000 (384GB) |
Deploy GLM-5 On-Premises
Our team builds GPU-accelerated systems configured and optimized for GLM-5. Private, secure, and fully under your control.