Building a Spam Classifier with BAML
In this tutorial, you’ll learn how to create a simple but effective spam classifier using BAML and OpenAI’s GPT models. By the end, you’ll have a working classifier that can distinguish between spam and legitimate messages.
Prerequisites
- Basic understanding of BAML syntax
- Access to OpenAI API (you’ll need an API key)
Step 1: Define the Classification Schema
First, let’s define what our classification output should look like. Create a new file called spam_classifier.baml
and add the following schema:
This schema defines a simple classification with two possible labels: SPAM
or NOT_SPAM
.
Step 2: Create the Classification Function
Next, we’ll create a function that uses GPT-4 to classify text. Add this to your spam_classifier.baml
file:
Let’s break down what this function does:
- Takes an input as a string
- Uses the
gpt-4o-mini
model - Provides clear guidelines for classification in the prompt
- Returns a MessageType
Step 3: Test the Classifier
To ensure our classifier works correctly, let’s add some test cases:
This is what it looks like in the BAML Playground:

Try it yourself in the Interactive Playground!
Now that you have your classifier set up, try it with your own examples. Here are some messages you can test:
- “Meeting at 2 PM in the conference room”
- “CONGRATULATIONS! You’ve won $1,000,000!!!”
- “Can you review the document I sent yesterday?”
- “Make money fast! Work from home!!!”
Next Steps
- Experiment with different prompt templates to improve accuracy
- Add more spam indicators to the classification criteria
- Create a more complex classification schema with confidence scores
- Try using different GPT models to compare performance
Multi-Label Classification
While the spam classifier demonstrates single-label classification (where each input belongs to exactly one category), many real-world problems require multiple labels. Let’s build a support ticket classifier that can assign multiple relevant categories to each ticket.
Step 1: Define the Label Enum and Schema
Create a new file called ticket_classifier.baml
and define the possible ticket categories as an enum:
Notice how this schema differs from our spam classifier:
- We use an
enum
to define valid labels - The
labels
field is an array (TicketLabel[]
), allowing multiple labels per ticket
Step 2: Create the Multi-Label Classification Function
Add the classification function to your ticket_classifier.baml
file:
Key differences from the spam classifier:
- The prompt includes examples showing both single and multiple labels
- Examples demonstrate how labels can overlap
- The model is instructed to consider all applicable labels
Step 3: Test Multi-Label Classification
Add test cases that cover both single-label and multi-label scenarios:
This is what it looks like in the BAML Playground:

Try it yourself!
Test the multi-label classifier with these examples:
- “How do I upgrade my subscription plan?”
- “I forgot my password and need to update my payment method”
- “What are the features included in the premium plan?”
- “My account is showing incorrect billing history”
Tips for Multi-Label Classification
- Balanced Examples: Include examples in your prompt that show both single and multiple labels
- Clear Descriptions: Add descriptive annotations to help the model understand each label
- Test Edge Cases: Include test cases that verify the model can handle:
- Single label cases
- Multiple label cases
- Edge cases where no labels apply