About
Analytical AI Task Engineer blending advanced academic Data Science foundations with 3.5+ years of rigorous professional experience. Expert in designing and authoring high-quality multi-agent benchmark tasks that evaluate the analytical reasoning, coordination, and execution capabilities of advanced AI systems. Strong proficiency in SQL and Python (pandas, NumPy) for deep data analysis, scripting, and writing precise oracle logic. Proven ability to curate real-world datasets and create realistic synthetic datasets from messy multi-source files (CSV, JSON, logs, vendor assessments). Highly comfortable working with Docker to create reproducible evaluation environments similar to SWE-bench and Terminal-Bench.
Skills & Expertise (10)
Work Experience
AI Benchmark Task Engineer (Multi-Agent Systems)
Xlairs
Oct 2025 - Present
Design and author multi-agent benchmark tasks centered on complex data analysis workflows, testing how effectively AI systems cross-reference data and execute statistical computation. Write precise oracle logic and Python verification scripts that validate specific, verifiable analytical conclusions rather than generic summaries. Review task performance signals to ensure strong separation between weaker and stronger agentic systems across evaluation suites. Refine benchmark tasks continuously to improve determinism, clarity, difficulty, and scoring quality for leading foundation model companies.
Data Analyst & Python Specialist
Micro1
Aug 2025 - Oct 2025
Analyzed large, messy, multi-source datasets (CSVs, JSON files, survey results, and financial documents) to formulate non-trivial analytical questions with clear, specific answers. Created realistic synthetic datasets and curated real-world style datasets across domains such as finance, operations, and security analysis. Leveraged strong proficiency in SQL and Python (pandas, NumPy) to build workflows demanding contradiction detection and anomaly identification.
Data Scientist & AI Evaluator
Senquire Analytics
Jan 2025 - Oct 2025
Developed detailed decomposition guides that effectively split analytical work across specialist sub-agents (e.g., financial, technical, security, or operations analysts). Created highly reproducible evaluation environments using Python and Docker, including writing Dockerfiles, building container images, and debugging secure execution sandboxes. Applied a solid understanding of statistical concepts—averages, distributions, outliers, and correlations—to benchmark LLM data analysis capabilities.
AI Data Engineer
Wipro LineCraft.AI
Mar 2024 - Oct 2024
Extracted and structured data from messy logs and vendor assessments, standardizing inputs to measure how effectively AI models perform complex analytical workflows. Maintained deep familiarity with AI coding benchmark environments (SWE-bench, Terminal-Bench) to align internal testing methodologies with frontier LLM evaluation standards. Executed cross-referencing and statistical reasoning across multiple sources to establish baseline ground-truths for internal machine learning algorithms.
Junior Data Analyst
Shri Samarth Tools
Dec 2022 - Mar 2024
Extracted and analyzed operational reports utilizing SQL and Python to identify statistical anomalies and cross-reference messy transactional data. Authored reproducible data workflows that verified specific analytical conclusions for senior management, eliminating generic, unactionable summaries. Built a solid foundation in data analysis by tackling unstructured real-world datasets, consistently delivering verifiable outcomes on strict operational deadlines.
Education
MSc. Data Science - Department of Technology, Pune University
2023 - 2025 · Afghanistan
BSc. Computer Science - MGM University
2020 - 2023 · Afghanistan
Certifications
No certifications added yet
Interested in this developer?
Profile Score Breakdown
Profile Overview
Skills (10)
Click a skill to find developers with the same skill