extraction_prompt_template = """
You are an expert at extracting structured information from technical documentation to build a knowledge graph.
Your task is to identify entities and their relationships based on the provided text chunk.
**Entities to Identify (and their properties):**
- **Document**: The overall document. Properties: `title` (from metadata).
- **Section**: Main sections (e.g., "1. Introduction", "2. Device Discovery Feature Enhancement"). Properties: `title`, `order`.
- **SubSection**: Subsections (e.g., "2.1 Automated Gateway Discovery"). Properties: `title`, `order`.
- **Item**: Specific elements within sections/subsections like tables, code blocks, or figures. Properties: `type` (e.g., "Table", "Figure", "CodeBlock"), `title` (if present), `content` (extract relevant text content if short).
- **Concept**: Key technical terms, features, or ideas (e.g., "Device Discovery", "DUAP API", "Gateway Discovery"). Properties: `name`.
- **Person**: Named individuals (e.g., "Mr. David Chen", "Dr. Evelyn Reed", "Sujay"). Properties: `name`.
- **Team**: Departments or named groups (e.g., "Engineering Team", "Bank Team", "Infrastructure Operations"). Properties: `name`.
- **Vendor**: Third-party companies (e.g., "TechSolutions Inc."). Properties: `name`.
- **Project**: Named projects or initiatives (e.g., "Project Aurora", "Quantum Leap", "Project Zenith"). Properties: `name`.
- **Platform**: Specific software/hardware platforms (e.g., "Core Services"). Properties: `name`.
- **Role**: Job titles or specific roles (e.g., "Chief Technology Officer", "Sponsor"). Properties: `title`.
**Relationship Types (all in CAPS, directional, with example properties if applicable):**
- **Hierarchical/Structural:**
- `HAS_SECTION`: (Document)-[:HAS_SECTION {{order: 1}}]->(Section) - *The LLM should generate the 'order' integer.*
- `HAS_SUBSECTION`: (Section)-[:HAS_SUBSECTION {{order: 1}}]->(SubSection) - *The LLM should generate the 'order' integer.*
- `CONTAINS_ITEM`: (SubSection)-[:CONTAINS_ITEM {{type: "Table", title: "Example Title"}}]->(Item) - *The LLM should generate the 'type' and 'title' strings.*
- `NEXT_SECTION`: (Section)-[:NEXT_SECTION]->(Section) (for sequential flow of main sections)
- `NEXT_SUBSECTION`: (SubSection)-[:NEXT_SUBSECTION]->(SubSection)
- **Conceptual/Semantic:**
- `DISCUSSES`: (Section/SubSection/Item)-[:DISCUSSES]->(Concept)
- `IMPACTS`: (Concept)-[:IMPACTS]->(Concept)
- `UTILIZES`: (Concept/Project)-[:UTILIZES]->(Concept/Platform)
- **Organizational/Responsibility:**
- `LED_BY`: (Project/Team)-[:LED_BY]->(Person) OR (Project)-[:LED_BY]->(Team)
- `REPORTS_TO`: (Person)-[:REPORTS_TO]->(Person)
- `SPONSORS`: (Person)-[:SPONSORS]->(Project)
- `INCLUDES_MODULE`: (Project)-[:INCLUDES_MODULE]->(Module)
- `CRITICAL_FOR`: (Module)-[:CRITICAL_FOR]->(Project)
- `DEVELOPED_BY`: (Module)-[:DEVELOPED_BY]->(Team/Vendor)
- `PROVIDES_SUPPORT_FOR`: (Vendor)-[:PROVIDES_SUPPORT_FOR]->(Project)
- `MAINTAINED_BY`: (Platform)-[:MAINTAINED_BY]->(Team)
- `UNDER_DEVELOPMENT_BY`: (Module)-[:UNDER_DEVELOPMENT_BY]->(Team)
- `WILL_INTEGRATE_WITH`: (Module)-[:WILL_INTEGRATE_WITH]->(Project) (for future plans)
**Output Format:**
Return a single JSON object with two keys: "nodes" and "relationships".
```json
{{
"nodes": [
{{"id": "unique_id_or_name", "label": "NodeLabel", "properties": {{"prop1": "value1", "prop2": "value2"}}}},
{{"id": "another_id", "label": "AnotherLabel", "properties": {{"prop_a": "value_a", "content": "extracted content"}}}}
],
"relationships": [
{{"source_id": "source_node_id", "target_id": "target_node_id", "type": "REL_TYPE", "properties": {{"order": 1}}}},
{{"source_id": "another_source_id", "target_id": "another_target_id", "type": "ANOTHER_REL_TYPE", "properties": {{}}}}
]
}}
No comments:
Post a Comment