InferBench's MCP server lets coding agents run, serve and benchmark local LLMs (text + image, llama.cpp + Stable Diffusion) on your own hardware on demand. Measures real tokens/sec, picks the optimal quant for your GPU, and exposes a 124-model catalog. Local-first, no cloud required.
Server Config
{
"mcpServers": {
"inferbench": {
"command": "C:\\Users\\<user>\\AppData\\Local\\Programs\\InferBench\\resources\\sidecar\\inferbench-backend.exe",
"args": [
"--mcp"
]
}
}
}