That’s where LangExtract comes in. It’s a free, open-source Python tool from Google that does the grunt work for you. It lives on GitHub, runs locally or with cloud AI models, and honestly feels like a friend who’s really good at highlighting exactly what matters.
You hand LangExtract a chunk of text, tell it what to look for, and it hands back a neat list of details — all linked to where they came from in the original.
Shows you exactly where it found each detail
Lets you guide it with a quick example so it knows your style
Handles giant documents without choking
Even makes a clickable webpage so you can explore your results
Works on pretty much anything — fiction, medical notes, contracts, whatever
I’ve been writing about tech for a decade, and this is one of those tools you try once and instantly get hooked on.
Why bother?
Because text is everywhere — emails, reports, books — and it’s rarely tidy. Picking through it manually is slow, boring, and easy to mess up. LangExtract is the shortcut. It’s tweakable, lightweight, and built by people who actually get that not everyone wants to wrestle with overcomplicated software.
pip install langextract
echo 'LANGEXTRACT_API_KEY=your-key' > .env
import langextract as lx
prompt = "Find characters, emotions, and relationships. Use exact words from the text."
examples = [
lx.data.ExampleData(
text="ROMEO. But soft! What light through yonder window breaks? It is the east, and Juliet is the sun.",
extractions=[
lx.data.Extraction("character", "ROMEO", {"mood": "amazed"}),
lx.data.Extraction("emotion", "But soft!", {"feeling": "soft wonder"}),
lx.data.Extraction("relationship", "Juliet is the sun", {"type": "poetry"})
]
)
]
text = "Lady Juliet looked up at the stars, her heart racing for Romeo"
result = lx.extract(
text_or_documents=text,
prompt_description=prompt,
examples=examples,
model_id="gemini-2.5-flash"
)
lx.io.save_annotated_documents([result], "juliet_stuff.jsonl")
html = lx.visualize("juliet_stuff.jsonl")
with open("juliet_viz.html", "w") as f:
f.write(html)
Want to run it on the entire Romeo and Juliet text from Project Gutenberg?
result = lx.extract(
text_or_documents="https://www.gutenberg.org/files/1513/1513-0.txt",
prompt_description=prompt,
examples=examples,
model_id="gemini-2.5-flash",
extraction_passes=3,
max_workers=20,
max_char_buffer=1000
)
It’s not an official Google product — it’s Apache 2.0 licensed
No comments:
Post a Comment