TypeScript support for testing is still in closed alpha - please contact us if you would like to use it!

To add assertions to your tests, or add more complex testing scenarios, you can use pytest to test your functions, since Playground BAML tests don’t currently support assertions.

To view each pytest run in Boundary Studio, (to enable comment/label on test results, scoring, and keeping track over time) — use the baml_test decorator from the baml_client import.

Example

See the full example, including the .baml files for GetOrderInfo

test_file.py
from baml_client import baml as b
from baml_client.baml_types import Email
from baml_client.testing import baml_test

# Run `poetry run pytest -m baml_test` in this directory.
# Setup Boundary Studio to see test details!
@baml_test
async def test_get_order_info():
  order_info = await b.GetOrderInfo(Email(
      subject="Order #1234",
      body="Your order has been shipped. It will arrive on 1st Jan 2022. Product: iPhone 13. Cost: $999.99"
  ))

  assert order_info.cost == 999.99

Make sure your test file, the Test class AND/or the test function is prefixed with Test or test respectively. Otherwise, pytest will not pick up your tests. E.g. test_foo.py, TestFoo, test_foo

Run pytest -m baml_test -k 'order_info' to run this test. To show have pytest show print statements add the -s flag.

The -m flag indicates it should only run the tests decorated with “baml_test”

Make sure you are running these commands from your python virtual environment (or poetry shell if you use poetry)

For more advanced testing scenarios, helpful commands, and gotchas, check out the Advanced Guide

View test results in Boundary Studio

Add @baml_test decorator from baml_client.testing to your test functions to view test results in Boundary Studio.

Run a specific impl

When you call a function that has multiple impls, the client will automatically call your defined default_impl.

You can always test a specific impl by calling it explicitly in the test function using b.GetOrderInfo.get_impl("version1").run(...)

from baml_client import baml as b
from baml_client.baml_types import Email
from baml_client.testing import baml_test

@baml_test
async def test_get_order_info():
  order_info = await b.GetOrderInfo.get_impl("version1").run(Email(
      subject="Order #1234",
      body="Your order has been shipped. It will arrive on 1st Jan 2022. Product: iPhone 13. Cost: $999.99"
  ))

  assert order_info.cost == 999.99

BAML includes some helper pytest fixtures that will automatically generate tests for each impl you define. See the advanced pytest guide

Using an LLM eval

You can also declare a new BAML function that you can use in your tests to validate results.

This is helpful for testing more ambiguous LLM free-form text generations. You can measure anything from sentiment, to the tone of of the text.

For example, the following GPT-4-powered function can be used in your tests to assert that a given generated sentence is professional-sounding:

enum ProfessionalismRating {
  GREAT
  OK
  BAD
}

function ValidateProfessionalism {
  // The string to validate
  input string
  output ProfessionalismRating
}

impl<llm, ValidateProfessionalism> v1 {
  client GPT4
  prompt #"
    Is this text professional-sounding?

    Use the following scale:
    {#print_enum(ProfessionalismRating)}

    Sentence: {#input}

    ProfessionalismRating:
  "#
}
from baml_client import baml as b
from baml_client.baml_types import Email, ProfessionalismRating
from baml_client.testing import baml_test

@baml_test
async def test_message_professionalism():
  order_info = await b.GetOrderInfo(Email(
      subject="Order #1234",
      body="Your order has been shipped. It will arrive on 1st Jan 2022. Product: iPhone 13. Cost: $999.99"
  ))

  assert order_info.cost == 999.99

  professionalism_rating = await b.ValidateProfessionalism(order_info.body)
  assert professionalism_rating == b.ProfessionalismRating.GREAT