Releases: dougeeai/llama-cpp-python-wheels
llama-cpp-python 0.3.16 + CUDA 13.0 sm75 Turing - Python 3.13 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.13.x (exact version required)
- CUDA: 13.0 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 580 or higher
- GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
- Architecture: Turing (sm_75)
- VRAM: 4GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda13.0.sm75.turing-cp313-cp313-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 13.0, Python 3.13, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 13.0 sm75 Turing - Python 3.12 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.12.x (exact version required)
- CUDA: 13.0 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 580 or higher
- GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
- Architecture: Turing (sm_75)
- VRAM: 4GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda13.0.sm75.turing-cp312-cp312-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 13.0, Python 3.12, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 13.0 sm75 Turing - Python 3.11 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.11.x (exact version required)
- CUDA: 13.0 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 580 or higher
- GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
- Architecture: Turing (sm_75)
- VRAM: 4GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda13.0.sm75.turing-cp311-cp311-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 13.0, Python 3.11, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 13.0 sm75 Turing - Python 3.10 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.10.x (exact version required)
- CUDA: 13.0 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 580 or higher
- GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
- Architecture: Turing (sm_75)
- VRAM: 4GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda13.0.sm75.turing-cp310-cp310-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 13.0, Python 3.10, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 13.0 sm100 Blackwell - Python 3.13 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.13.x (exact version required)
- CUDA: 13.0 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 580 or higher
- GPU: NVIDIA RTX 5090, 5080, 5070 Ti, 5070, 5060 Ti, 5060, 5050, RTX 5090 Laptop, RTX 5080 Laptop, RTX 5070 Ti Laptop, RTX 5070 Laptop, RTX 5060 Laptop, RTX 5050 Laptop, RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q, RTX PRO 6000 Blackwell Server Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, RTX PRO 4000 SFF Blackwell, RTX PRO 2000 Blackwell, RTX PRO 5000 Blackwell Laptop, RTX PRO 4000 Blackwell Laptop, RTX PRO 3000 Blackwell Laptop, RTX PRO 2000 Blackwell Laptop, RTX PRO 1000 Blackwell Laptop, RTX PRO 500 Blackwell Laptop, B100, B200, B300 (Blackwell Ultra), GB200, GB300
- Architecture: Blackwell (sm_100)
- VRAM: 8GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda13.0.sm100.blackwell-cp313-cp313-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 13.0, Python 3.13, Windows, RTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060, RTX 5050, RTX PRO 6000 Blackwell, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, B100, B200, B300, Blackwell Ultra, GB200, GB300, Blackwell, sm100, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 13.0 sm100 Blackwell - Python 3.12 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.12.x (exact version required)
- CUDA: 13.0 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 580 or higher
- GPU: NVIDIA RTX 5090, 5080, 5070 Ti, 5070, 5060 Ti, 5060, 5050, RTX 5090 Laptop, RTX 5080 Laptop, RTX 5070 Ti Laptop, RTX 5070 Laptop, RTX 5060 Laptop, RTX 5050 Laptop, RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q, RTX PRO 6000 Blackwell Server Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, RTX PRO 4000 SFF Blackwell, RTX PRO 2000 Blackwell, RTX PRO 5000 Blackwell Laptop, RTX PRO 4000 Blackwell Laptop, RTX PRO 3000 Blackwell Laptop, RTX PRO 2000 Blackwell Laptop, RTX PRO 1000 Blackwell Laptop, RTX PRO 500 Blackwell Laptop, B100, B200, B300 (Blackwell Ultra), GB200, GB300
- Architecture: Blackwell (sm_100)
- VRAM: 8GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda13.0.sm100.blackwell-cp312-cp312-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 13.0, Python 3.12, Windows, RTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060, RTX 5050, RTX PRO 6000 Blackwell, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, B100, B200, B300, Blackwell Ultra, GB200, GB300, Blackwell, sm100, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 13.0 sm100 Blackwell - Python 3.11 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.11.x (exact version required)
- CUDA: 13.0 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 580 or higher
- GPU: NVIDIA RTX 5090, 5080, 5070 Ti, 5070, 5060 Ti, 5060, 5050, RTX 5090 Laptop, RTX 5080 Laptop, RTX 5070 Ti Laptop, RTX 5070 Laptop, RTX 5060 Laptop, RTX 5050 Laptop, RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q, RTX PRO 6000 Blackwell Server Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, RTX PRO 4000 SFF Blackwell, RTX PRO 2000 Blackwell, RTX PRO 5000 Blackwell Laptop, RTX PRO 4000 Blackwell Laptop, RTX PRO 3000 Blackwell Laptop, RTX PRO 2000 Blackwell Laptop, RTX PRO 1000 Blackwell Laptop, RTX PRO 500 Blackwell Laptop, B100, B200, B300 (Blackwell Ultra), GB200, GB300
- Architecture: Blackwell (sm_100)
- VRAM: 8GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda13.0.sm100.blackwell-cp311-cp311-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 13.0, Python 3.11, Windows, RTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060, RTX 5050, RTX PRO 6000 Blackwell, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, B100, B200, B300, Blackwell Ultra, GB200, GB300, Blackwell, sm100, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 13.0 sm100 Blackwell - Python 3.10 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 13.0 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.10.x (exact version required)
- CUDA: 13.0 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 580 or higher
- GPU: NVIDIA RTX 5090, 5080, 5070 Ti, 5070, 5060 Ti, 5060, 5050, RTX 5090 Laptop, RTX 5080 Laptop, RTX 5070 Ti Laptop, RTX 5070 Laptop, RTX 5060 Laptop, RTX 5050 Laptop, RTX PRO 6000 Blackwell Workstation Edition, RTX PRO 6000 Blackwell Max-Q, RTX PRO 6000 Blackwell Server Edition, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, RTX PRO 4000 SFF Blackwell, RTX PRO 2000 Blackwell, RTX PRO 5000 Blackwell Laptop, RTX PRO 4000 Blackwell Laptop, RTX PRO 3000 Blackwell Laptop, RTX PRO 2000 Blackwell Laptop, RTX PRO 1000 Blackwell Laptop, RTX PRO 500 Blackwell Laptop, B100, B200, B300 (Blackwell Ultra), GB200, GB300
- Architecture: Blackwell (sm_100)
- VRAM: 8GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda13.0.sm100.blackwell-cp310-cp310-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Tested Configuration
- Built on: Windows 11
- Build date: November 9, 2025
- llama-cpp-python version: 0.3.16
- Build flags: CMAKE_CUDA_ARCHITECTURES=100
Keywords
llama-cpp-python, CUDA 13.0, Python 3.10, Windows, RTX 5090, RTX 5080, RTX 5070 Ti, RTX 5070, RTX 5060, RTX 5050, RTX PRO 6000 Blackwell, RTX PRO 5000 Blackwell, RTX PRO 4500 Blackwell, RTX PRO 4000 Blackwell, B100, B200, B300, Blackwell Ultra, GB200, GB300, Blackwell, sm100, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 12.1 sm75 Turing - Python 3.13 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 12.1 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.13.x (exact version required)
- CUDA: 12.1 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 525.60.13 or higher
- GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
- Architecture: Turing (sm_75)
- VRAM: 4GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda12.1.sm75.turing-cp313-cp313-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 12.1, Python 3.13, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10
llama-cpp-python 0.3.16 + CUDA 12.1 sm75 Turing - Python 3.12 - Windows x64
Pre-built llama-cpp-python wheel for Windows with CUDA 12.1 support
Skip the build process entirely. This wheel is compiled and ready to install.
Requirements
- OS: Windows 10/11 64-bit
- Python: 3.12.x (exact version required)
- CUDA: 12.1 (Toolkit not needed, just driver)
- Driver: NVIDIA Driver 525.60.13 or higher
- GPU: NVIDIA GeForce RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060 12GB, RTX 2060, TITAN RTX, GeForce RTX 2080 Super Laptop, RTX 2080 Super Max-Q, RTX 2080 Laptop, RTX 2080 Max-Q, RTX 2070 Super Laptop, RTX 2070 Super Max-Q, RTX 2070 Laptop, RTX 2070 Max-Q, RTX 2060 Laptop, RTX 2060 Max-Q, GeForce GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, GTX 1660 Ti Laptop, GTX 1660 Ti Max-Q, GTX 1650 Ti Laptop, GTX 1650 Ti Max-Q, GTX 1650 Laptop, GTX 1650 Max-Q, GTX 1630 Laptop, Quadro RTX 8000, RTX 6000, RTX 5000, RTX 4000, RTX 3000, Quadro T2000, T1200, T1000, T600, T550, T500, T400, Tesla T40, T10, T4
- Architecture: Turing (sm_75)
- VRAM: 4GB+ recommended
Installation
pip install llama_cpp_python-0.3.16+cuda12.1.sm75.turing-cp312-cp312-win_amd64.whlWhat This Solves
- No Visual Studio required
- No CUDA Toolkit installation needed
- No compilation errors
- No "No CUDA toolset found" issues
- Works immediately with GGUF models
Keywords
llama-cpp-python, CUDA 12.1, Python 3.12, Windows, RTX 2080 Ti, RTX 2080 Super, RTX 2080, RTX 2070 Super, RTX 2070, RTX 2060 Super, RTX 2060, TITAN RTX, GTX 1660 Ti, GTX 1660 Super, GTX 1660, GTX 1650 Super, GTX 1650, GTX 1630, Quadro RTX 8000, Quadro RTX 6000, Quadro RTX 5000, Quadro RTX 4000, Tesla T4, Turing, sm75, GGUF, llama.cpp, no compilation, prebuilt wheel, Windows 11, Windows 10