Staff Applied Researcher, AI Quality
Kubelt • santiago, Chile
Role Description
Locations Remote, United States Overview At GitHub, we’re building the next generation of AI‑powered developer experiences. We’re looking for a Staff Applied Researcher with deep expertise in Large Language Model (LLM) evaluation, LLM agents, strong engineering instincts, and a bias for action to help shape the future of GitHub Copilot and our AI platform. This is a high‑impact role where you will design evaluation systems that directly influence how millions of developers experience AI every day. Responsibilities Lead Model Quality & Evaluation Design next‑generation evaluation frameworks for code generation, reasoning, safety, multimodal tasks, and agentic workflows. Develop scalable automatic metrics, LLM‑judge systems, reward models, and human‑in‑the‑loop evaluation pipelines. Establish high‑signal, repeatable methodologies that influence product decisions across GitHub AI. Drive Applied Research & Engineering Build and optimize evaluation tooling, datasets, benchmarking systems, a...