Back to Developers
utkarsh kamthankar

utkarsh kamthankar

AI Benchmark Task Engineer

Remote, India 3+ yrs exp 82 · Excellent

About

Analytical AI Task Engineer blending advanced academic Data Science foundations with 3.5+ years of rigorous professional experience. Expert in designing and authoring high-quality multi-agent benchmark tasks that evaluate the analytical reasoning, coordination, and execution capabilities of advanced AI systems. Strong proficiency in SQL and Python (pandas, NumPy) for deep data analysis, scripting, and writing precise oracle logic. Proven ability to curate real-world datasets and create realistic synthetic datasets from messy multi-source files (CSV, JSON, logs, vendor assessments). Highly comfortable working with Docker to create reproducible evaluation environments similar to SWE-bench and Terminal-Bench.

Skills & Expertise (10)

Python Advanced
8.4/10
3.5
Years Exp
Pandas Advanced
8.2/10
3.5
Years Exp
NumPy Advanced
8.0/10
3.5
Years Exp
Advanced SQL Advanced
8.0/10
3.5
Years Exp
Docker Intermediate
7.6/10
3.5
Years Exp
Statistical concepts Intermediate
7.4/10
3.5
Years Exp
Anomaly Detection Intermediate
7.2/10
3.5
Years Exp
Debugging Intermediate
6.8/10
3.5
Years Exp
Analytical Reasoning Intermediate
6.5/10
3.5
Years Exp
Dockerfiles

Work Experience

AI Benchmark Task Engineer (Multi-Agent Systems)

Xlairs

Oct 2025 - Present

Design and author multi-agent benchmark tasks centered on complex data analysis workflows, testing how effectively AI systems cross-reference data and execute statistical computation. Write precise oracle logic and Python verification scripts that validate specific, verifiable analytical conclusions rather than generic summaries. Review task performance signals to ensure strong separation between weaker and stronger agentic systems across evaluation suites. Refine benchmark tasks continuously to improve determinism, clarity, difficulty, and scoring quality for leading foundation model companies.

Data Analyst & Python Specialist

Micro1

Aug 2025 - Oct 2025

Analyzed large, messy, multi-source datasets (CSVs, JSON files, survey results, and financial documents) to formulate non-trivial analytical questions with clear, specific answers. Created realistic synthetic datasets and curated real-world style datasets across domains such as finance, operations, and security analysis. Leveraged strong proficiency in SQL and Python (pandas, NumPy) to build workflows demanding contradiction detection and anomaly identification.

Data Scientist & AI Evaluator

Senquire Analytics

Jan 2025 - Oct 2025

Developed detailed decomposition guides that effectively split analytical work across specialist sub-agents (e.g., financial, technical, security, or operations analysts). Created highly reproducible evaluation environments using Python and Docker, including writing Dockerfiles, building container images, and debugging secure execution sandboxes. Applied a solid understanding of statistical concepts—averages, distributions, outliers, and correlations—to benchmark LLM data analysis capabilities.

AI Data Engineer

Wipro LineCraft.AI

Mar 2024 - Oct 2024

Extracted and structured data from messy logs and vendor assessments, standardizing inputs to measure how effectively AI models perform complex analytical workflows. Maintained deep familiarity with AI coding benchmark environments (SWE-bench, Terminal-Bench) to align internal testing methodologies with frontier LLM evaluation standards. Executed cross-referencing and statistical reasoning across multiple sources to establish baseline ground-truths for internal machine learning algorithms.

Junior Data Analyst

Shri Samarth Tools

Dec 2022 - Mar 2024

Extracted and analyzed operational reports utilizing SQL and Python to identify statistical anomalies and cross-reference messy transactional data. Authored reproducible data workflows that verified specific analytical conclusions for senior management, eliminating generic, unactionable summaries. Built a solid foundation in data analysis by tackling unstructured real-world datasets, consistently delivering verifiable outcomes on strict operational deadlines.

Education

MSc. Data Science - Department of Technology, Pune University

2023 - 2025 · Afghanistan

BSc. Computer Science - MGM University

2020 - 2023 · Afghanistan

Certifications

No certifications added yet

Interested in this developer?

Profile Score Breakdown

📷 Photo 10/10
📄 Resume 10/10
💼 Job Title 10/10
✍️ Bio 10/10
🛠️ Skills 15/20
🎓 Education 10/10
⏱️ Experience 12/15
💰 Rate 0/5
🏆 Certs 0/5
Verified 5/5
Total Score 82/100

Profile Overview

Member sinceJun 2026