Projects

Things I've built.

Price Negotiation RL

RL environment where an LLM agent negotiates prices against an LLM-powered seller using real marketplace listings.

  • Three difficulty levels (Easy / Medium / Hard) with varying zones of possible agreement — from a $480 ZOPA down to no ZOPA, testing walk-away discipline.
  • Six-component reward system on a [−1, 1] scale: surplus capture, walk-away correctness, output compliance, closing speed, opening offer quality, and concession smoothness.
  • Walk-away penalty weighted 5× — trains the agent to value discipline over forcing a bad deal.
  • Pure Python arithmetic grader ensures reproducibility; stochastic LLM sampling on both sides creates diverse training signal across episodes.
  • Deployable on Hugging Face Spaces, Docker, or locally via uv; ships with a browser playground showing real-time reward breakdowns.

Ice-sliding multi-player maze environment on OpenEnv for benchmarking planning and coordination in RL agents.

  • Ice-sliding mechanics — agents slide until hitting a wall or another player, requiring multi-step lookahead over reactive movement.
  • Simultaneous multi-player movement; solved only when every player reaches an exit cell in the same phase.
  • Reward shaping penalises repeated actions (−1), reversals, and revisited board states scaled by prior visit count.
  • Standard OpenEnv reset()/step() API — compatible with any RL training loop without environment-specific wrappers.
  • Includes dataset validation tooling and GIF rendering from recorded rollouts for debugging agent trajectories.

MakeMyDocsBot

CrewAI-powered bot that auto-syncs multilingual documentation across feature branches via a Git pre-push hook.

  • 2nd Runner-Up at the CrewAI Fall Agentic AI Challenge.
  • Detects English documentation changes and auto-generates synchronised translations (Korean, Brazilian Portuguese) with no manual intervention.
  • Runs as a Git pre-push hook — fires before any feature branch push, prompting the developer to sync docs before code lands.
  • Agent-based architecture with specialised CrewAI agents for change detection, translation, and cross-branch synchronisation.
  • Minimal setup: Python 3.12+, uv, and an OpenAI key — clone + uv sync + one hook registration.

Full-stack interactive console for demonstrating and chatting with Reinforcement Learning models and agentic systems.

  • Chat interface for real-time interaction with RL models, backed by smolagents for agent-based reasoning and step-by-step trace display.
  • FastAPI backend (Python 3.12+) paired with a Next.js 15 / React 19 frontend — clean separation of model logic and UI.
  • Integrates Hugging Face datasets directly; response caching via JSON files keeps repeated queries fast.
  • Docker-first production deployment — single container exposes the full stack on port 7860, ready for Hugging Face Spaces.
  • Environment-driven config (HF token, model name) makes swapping the underlying model a one-line change.