.. Karta Context Engine documentation master file, created by
   sphinx-quickstart on Tue Feb 25 14:43:59 2025.
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.

Welcome to the Karta Context Engine
------------------------------------

The journey to building robust AI agents can be challenging. At Karta, we found that existing datasets and evaluation tools for *iterating quickly* on agent architectures were non-existent. 
Capturing the subtleties and variety during testing is essential in creating a viable AI experience for customers. 
Off-the-shelf benchmarks rarely captured the complexities our agents faced in production, leaving critical gaps in their assessment. 

Determined to bridge this gap, we set out to create **gold standard tasks** coupled with **deeply representative contexts** for Agent evaluation — meticulously designed datasets, well-defined tasks, and comprehensive toolsets tailored to real-world domains. 
We started with **e-commerce**, an industry where we have deep expertise, ensuring our evaluations reflect the nuanced interactions that AI systems encounter in practical deployments.

.. image:: _static/flow.png
   :alt: Karta Logo
   :width: 600px
   :align: center

The result is the **Karta Context Engine** — a comprehensive suite of structured datasets, domain-specific tasks, and knowledge documents crafted to set a new benchmark for AI testing. 
Designed for seamless integration into existing codebases, it empowers teams to automate testing workflows, run sophisticated simulations, and rigorously validate agent performance. 
With the **Karta Context Engine**, AI teams can confidently measure, iterate, and improve, knowing they are testing against some of the most complex and subtle real-world scenarios available.

Our vision is a state where agent evaluation is **effortless, rigorous, and representative of real-world challenges**. 
Our journey begins with **easily importable datasets** that provide developers with quick access to high-quality test cases. 
From there, we will **expand the breadth of supported domains**, ensuring that AI agents across industries are held to the highest standard.

The culmination of this effort will be a **fully integrated interface** for running and comparing multiple evaluation metrics, allowing AI teams to benchmark performance with **precision and clarity**. 
We believe that **representative task simulations are the key to a safe AI experience**, helping developers build models that behave predictably in high-stakes environments. 
By pushing the boundaries of AI evaluation, we aim to set the foundation for a future where AI is **trustworthy, adaptable, and aligned with real-world needs**.

Understand the Basics
^^^^^^^^^^^^^^^^^^^^^
.. toctree::
   :maxdepth: 3
   :caption: Basics:

   whoisitfor
   problemspace
   howthisworks
   vision
   developerguide

Domain Data
^^^^^^^^^^^
.. toctree::
   :maxdepth: 2
   :caption: Domains:

   ecommerce


Indices and tables
^^^^^^^^^^^^^^^^^^

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`