2 months ago
Conkurrence measures whether multiple AI models produce consistent outputs on your evaluation tasks. It tells you which items your AI agrees on and which need human review — using Fleiss' κ, Kendall's W, and bootstrap confidence intervals, the same psychometric methods trusted in clinical research.
Server Config
{
"mcpServers": {
"conkurrence": {
"command": "npx",
"args": [
"-y",
"conkurrence",
"mcp"
]
}
}
}