There are two types of tests you may want to run on your AI functions:

  • Unit Tests: Tests a single AI function
  • Integration Tests: Tests a pipeline of AI functions and potentially buisness logic

We support both types of tests using BAML.

Using the playground

Use the playground to run tests against individual function impls.

Baml CLI

You can run tests defined

From BAML Studio

Coming soon
You can also create tests from production logs in BAML Studio. Any weird or atypical user inputs can be used to create a test case with just 1 click.

JSON Files (__tests__ folder)

Unit tests created by the playground are stored in the __tests__ folder.

The project structure should look like this:

.
├── baml_client/
└── baml_src/
    ├── __tests__/
    │   ├── YourAIFunction/
    │   │   ├── test_name_monkey.json
    │   │   └── test_name_cricket.json
    │   └── YourAIFunction2/
    │       └── test_name_jellyfish.json
    ├── main.baml
    └── foo.baml

You can manually create tests by creating a folder for each function you want to test. Inside each folder, create a json file for each test case you want to run. The json file should be named test_name.json where test_name is the name of the test case.

To see the structure of the JSON file, you can create a test using the playground and then copy the JSON file into your project.

The BAML compiler reads the __tests__ folder and generates a pytest file for you so you don’t have to manually write test boilerplate code.

Programmatic Testing (using pytest)

For python, you can leverage pytest to run tests. All you need is to add a @baml_test decorator to your test functions to get your test data visualized on the baml dashboard.

Running tests

Make sure you are running these commands from your python virtual environment (or poetry shell if you use poetry)

# From your project root
# Lists all tests
pytest -m baml_test --collect-only
# From your project root
# Runs all tests
# For every function, for every impl
pytest -m baml_test

To run tests for a subdirectory

# From your project root
# Note the underscore at the end of the folder name
pytest -m baml_test ./your-tests-folder/

To run tests that have a specific name or group name

# From your project root
pytest -m baml_test -k test_group_name

You can read more about the -k arg of pytest here (PyTest Docs)

-k will match any tests with that given name.

To run a specific test case in a test group

# From your project root
pytest -m baml_test -k 'test_group_name and test_case_name'

Unit Test an AI Function

TypeScript support for testing is still in closed alpha - please contact us if you would like to use it!

# Import your baml-generated functions
from baml_client import baml as b
# Import any custom types defined in .baml files
from baml_client.baml_types import Sentiment, IClassifySentiment

# This automatically generates a test case for each impl
# of ClassifySentiment.
@b.ClassifySentiment.test
async def test_happy_user(ClassifySentimentImpl: IClassifySentiment):
    # Note that the parameter name "ClassifySentimentImpl"
    # must match the name of the function you're testing
    response = await ClassifySentimentImpl("I am ecstatic")
    assert response == Sentiment.POSITIVE

Integration Tests (test a pipeline calling multiple functions)

TypeScript support for testing is still in closed alpha - please contact us if you would like to use it!

# Import your baml-generated LLM functions
from baml_client import baml as b

# Import testing library
from baml_client.testing import baml_test

# Mark this as a baml test (recorded on dashboard and does some setup)
@baml_test
async def test_pipeline():
    message = "I am ecstatic"
    response = await b.ClassifySentiment(message)
    assert response == Sentiment.POSITIVE
    response = await b.GetHappyResponse(message)

Make sure your test file, the Test class and/or test function is prefixed with test or Test respectively. Otherwise, pytest will not pick up your tests.

Parameterized Tests

Parameterized tests allow you declare a list of inputs and expected outputs for a test case. baml will run the test for each input and compare the output to the expected output.

from baml_client.testing import baml_test
# Import your baml-generated LLM functions
from baml_client import baml
# Import any custom types defined in .baml files
from baml_client.baml_types import Sentiment

@baml_test
@pytest.mark.parametrize(
    "input, expected_output",
    [
        ("I am ecstatic", Sentiment.POSITIVE),
        ("I am sad", Sentiment.NEGATIVE),
        ("I am angry", Sentiment.NEGATIVE),
    ],
)
async def test_sentiments(input, expected_output):
    response = await baml.ClassifySentiment(input)
    assert response == expected_output

This will generate 3 test_cases on the dashboard, one for each input.

Using custom names for each test

The parametrize decorator also allows you to specify a custom name for each test case. See below on how we name each test case using the ids parameter.

from baml_client import baml as b
from baml_client.baml_types import Sentiment, IClassifySentiment

test_cases = [
    {"input": "I am ecstatic", "expected_output": Sentiment.POSITIVE, "id": "ecstatic-test"},
    {"input": "I am sad", "expected_output": Sentiment.NEGATIVE, "id": "sad-test"},
    {"input": "I am angry", "expected_output": Sentiment.NEGATIVE, "id": "angry-test"},
]

@b.ClassifySentiment.test
@pytest.mark.parametrize(
    "test_case",
    test_cases,
    ids=[case['id'] for case in test_cases]
)
# Note the argument name "test_case" is set by the first parameter in the parametrize() decorator
async def test_sentiments(ClassifySentimentImpl: IClassifySentiment, test_case):
    response = await ClassifySentimentImpl(test_case["input"])
    assert response == test_case["expected_output"]

Grouping Tests by Input Type

Alternatively, you can group things together logically by defining one test case or test class per input type your testing. In our case, we’ll split up all Positive sentiments into their own group.

from baml_client.testing import baml_test
# Import your baml-generated LLM functions
from baml_client import baml
# Import any custom types defined in .baml files
from baml_client.baml_types import Sentiment

@baml_test
@pytest.mark.asyncio
@pytest.mark.parametrize(
  # Note we only need to pass in one variable to the test, the "input".
  "input",
  [
      "I am ecstatic",
      "I am super happy!"
  ],
)
class TestHappySentiments:
    async def test_sentiments(input, expected_output):
        response = await baml.ClassifySentiment(input)
        assert response == Sentiment.POSITIVE

@baml_test
@pytest.mark.asyncio
@pytest.mark.parametrize(
  # Note we only need to pass in one variable to the test, the "input".
  "input",
  [
      "I am sad",
      "I am angry"
  ],
)
class TestSadSentiments:
    async def test_sentiments(input, expected_output):
        response = await baml.ClassifySentiment(input)
        assert response == Sentiment.NEGATIVE

Alternatively you can just write a test function for each input type.

from baml_client.testing import baml_test
from baml_client import baml
from baml_client.baml_types import Sentiment

@baml_test
@pytest.mark.asyncio
@pytest.mark.parametrize(
    "input",
    [
        "I am ecstatic",
        "I am super happy!",
        "I am thrilled",
        "I am overjoyed",
    ],
)
async def test_happy_sentiments(input):
    response = await baml.ClassifySentiment(input)
    assert response == Sentiment.POSITIVE

@baml_test
@pytest.mark.asyncio
@pytest.mark.parametrize(
    "input",
    [
        "I am sad",
        "I am angry",
        "I am upset",
        "I am frustrated",
    ],
)
async def test_sad_sentiments(input):
    response = await baml.ClassifySentiment(input)
    assert response == Sentiment.NEGATIVE