Here is how we can get structured data from a chunk of text or even an image (using a union input type):

BAML
1class CharacterDescription {
2 name string
3 clothingItems string[]
4 hairColor string? @description(#"
5 The color of the character's hair.
6 "#)
7 spells Spells[]
8}
9
10class Spells {
11 name string
12 description string
13}
14
15function DescribeCharacter(image_or_paragraph: image | string) -> CharacterDescription {
16 client GPT4o
17 prompt #"
18 {{ _.role("user")}}
19
20 Describe this character according to the schema provided:
21 {{ image_or_paragraph }}
22
23
24 {{ ctx.output_format }}
25
26 Before you answer, explain your reasoning in 3 sentences.
27 "#
28}

If you open up the VSCode Playground you will be able to test this function instantly.

Usage

See image docs

1from baml_client import b
2from baml_client.types import CharacterDescription
3from baml_py import Image
4
5...
6 result = await b.DescribeCharacter("...")
7 assert isinstance(result, CharacterDescription)
8
9 result_from_image = await b.DescribeCharacter(Image.from_url("http://..."))