Wednesday, August 20, 2025

What is LangExtract


That’s where LangExtract comes in. It’s a free, open-source Python tool from Google that does the grunt work for you. It lives on GitHub, runs locally or with cloud AI models, and honestly feels like a friend who’s really good at highlighting exactly what matters.


You hand LangExtract a chunk of text, tell it what to look for, and it hands back a neat list of details — all linked to where they came from in the original.


Shows you exactly where it found each detail

Lets you guide it with a quick example so it knows your style

Handles giant documents without choking

Even makes a clickable webpage so you can explore your results

Works on pretty much anything — fiction, medical notes, contracts, whatever

I’ve been writing about tech for a decade, and this is one of those tools you try once and instantly get hooked on.




Why bother?

Because text is everywhere — emails, reports, books — and it’s rarely tidy. Picking through it manually is slow, boring, and easy to mess up. LangExtract is the shortcut. It’s tweakable, lightweight, and built by people who actually get that not everyone wants to wrestle with overcomplicated software.


pip install langextract


echo 'LANGEXTRACT_API_KEY=your-key' > .env


import langextract as lx


prompt = "Find characters, emotions, and relationships. Use exact words from the text."

examples = [

    lx.data.ExampleData(

        text="ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.",

        extractions=[

            lx.data.Extraction("character", "ROMEO", {"mood": "amazed"}),

            lx.data.Extraction("emotion", "But soft!", {"feeling": "soft wonder"}),

            lx.data.Extraction("relationship", "Juliet is the sun", {"type": "poetry"})

        ]

    )

]

text = "Lady Juliet looked up at the stars, her heart racing for Romeo"

result = lx.extract(

    text_or_documents=text,

    prompt_description=prompt,

    examples=examples,

    model_id="gemini-2.5-flash"

)

lx.io.save_annotated_documents([result], "juliet_stuff.jsonl")

html = lx.visualize("juliet_stuff.jsonl")

with open("juliet_viz.html", "w") as f:

    f.write(html)


Want to run it on the entire Romeo and Juliet text from Project Gutenberg?


result = lx.extract(

    text_or_documents="https://www.gutenberg.org/files/1513/1513-0.txt",

    prompt_description=prompt,

    examples=examples,

    model_id="gemini-2.5-flash",

    extraction_passes=3,

    max_workers=20,

    max_char_buffer=1000

)



It’s not an official Google product — it’s Apache 2.0 licensed


No comments:

Post a Comment