RL environment where an LLM agent negotiates prices against an LLM-powered seller using real marketplace listings.
- Three difficulty levels (Easy / Medium / Hard) with varying zones of possible agreement — from a $480 ZOPA down to no ZOPA, testing walk-away discipline.
- Six-component reward system on a [−1, 1] scale: surplus capture, walk-away correctness, output compliance, closing speed, opening offer quality, and concession smoothness.
- Walk-away penalty weighted 5× — trains the agent to value discipline over forcing a bad deal.
- Pure Python arithmetic grader ensures reproducibility; stochastic LLM sampling on both sides creates diverse training signal across episodes.
- Deployable on Hugging Face Spaces, Docker, or locally via uv; ships with a browser playground showing real-time reward breakdowns.
Ice-sliding multi-player maze environment on OpenEnv for benchmarking planning and coordination in RL agents.
- Ice-sliding mechanics — agents slide until hitting a wall or another player, requiring multi-step lookahead over reactive movement.
- Simultaneous multi-player movement; solved only when every player reaches an exit cell in the same phase.
- Reward shaping penalises repeated actions (−1), reversals, and revisited board states scaled by prior visit count.
- Standard OpenEnv reset()/step() API — compatible with any RL training loop without environment-specific wrappers.
- Includes dataset validation tooling and GIF rendering from recorded rollouts for debugging agent trajectories.
CrewAI-powered bot that auto-syncs multilingual documentation across feature branches via a Git pre-push hook.
- 2nd Runner-Up at the CrewAI Fall Agentic AI Challenge.
- Detects English documentation changes and auto-generates synchronised translations (Korean, Brazilian Portuguese) with no manual intervention.
- Runs as a Git pre-push hook — fires before any feature branch push, prompting the developer to sync docs before code lands.
- Agent-based architecture with specialised CrewAI agents for change detection, translation, and cross-branch synchronisation.
- Minimal setup: Python 3.12+, uv, and an OpenAI key — clone + uv sync + one hook registration.
Full-stack interactive console for demonstrating and chatting with Reinforcement Learning models and agentic systems.
- Chat interface for real-time interaction with RL models, backed by smolagents for agent-based reasoning and step-by-step trace display.
- FastAPI backend (Python 3.12+) paired with a Next.js 15 / React 19 frontend — clean separation of model logic and UI.
- Integrates Hugging Face datasets directly; response caching via JSON files keeps repeated queries fast.
- Docker-first production deployment — single container exposes the full stack on port 7860, ready for Hugging Face Spaces.
- Environment-driven config (HF token, model name) makes swapping the underlying model a one-line change.