Submit

Inferbench

@JoniMartin27

InferBench's MCP server lets coding agents run, serve and benchmark local LLMs (text + image, llama.cpp + Stable Diffusion) on your own hardware on demand. Measures real tokens/sec, picks the optimal quant for your GPU, and exposes a 124-model catalog. Local-first, no cloud required.
Overview

no content

Server Config

{
  "mcpServers": {
    "inferbench": {
      "command": "C:\\Users\\<user>\\AppData\\Local\\Programs\\InferBench\\resources\\sidecar\\inferbench-backend.exe",
      "args": [
        "--mcp"
      ]
    }
  }
}
© 2025 MCP.so. All rights reserved.

Build with ShipAny.