Saturday, June 21, 2025

 extraction_prompt_template = """

You are an expert at extracting structured information from technical documentation to build a knowledge graph.

Your task is to identify entities and their relationships based on the provided text chunk.


**Entities to Identify (and their properties):**

- **Document**: The overall document. Properties: `title` (from metadata).

- **Section**: Main sections (e.g., "1. Introduction", "2. Device Discovery Feature Enhancement"). Properties: `title`, `order`.

- **SubSection**: Subsections (e.g., "2.1 Automated Gateway Discovery"). Properties: `title`, `order`.

- **Item**: Specific elements within sections/subsections like tables, code blocks, or figures. Properties: `type` (e.g., "Table", "Figure", "CodeBlock"), `title` (if present), `content` (extract relevant text content if short).

- **Concept**: Key technical terms, features, or ideas (e.g., "Device Discovery", "DUAP API", "Gateway Discovery"). Properties: `name`.

- **Person**: Named individuals (e.g., "Mr. David Chen", "Dr. Evelyn Reed", "Sujay"). Properties: `name`.

- **Team**: Departments or named groups (e.g., "Engineering Team", "Bank Team", "Infrastructure Operations"). Properties: `name`.

- **Vendor**: Third-party companies (e.g., "TechSolutions Inc."). Properties: `name`.

- **Project**: Named projects or initiatives (e.g., "Project Aurora", "Quantum Leap", "Project Zenith"). Properties: `name`.

- **Platform**: Specific software/hardware platforms (e.g., "Core Services"). Properties: `name`.

- **Role**: Job titles or specific roles (e.g., "Chief Technology Officer", "Sponsor"). Properties: `title`.


**Relationship Types (all in CAPS, directional, with example properties if applicable):**

- **Hierarchical/Structural:**

    - `HAS_SECTION`: (Document)-[:HAS_SECTION {{order: 1}}]->(Section) - *The LLM should generate the 'order' integer.*

    - `HAS_SUBSECTION`: (Section)-[:HAS_SUBSECTION {{order: 1}}]->(SubSection) - *The LLM should generate the 'order' integer.*

    - `CONTAINS_ITEM`: (SubSection)-[:CONTAINS_ITEM {{type: "Table", title: "Example Title"}}]->(Item) - *The LLM should generate the 'type' and 'title' strings.*

    - `NEXT_SECTION`: (Section)-[:NEXT_SECTION]->(Section) (for sequential flow of main sections)

    - `NEXT_SUBSECTION`: (SubSection)-[:NEXT_SUBSECTION]->(SubSection)

- **Conceptual/Semantic:**

    - `DISCUSSES`: (Section/SubSection/Item)-[:DISCUSSES]->(Concept)

    - `IMPACTS`: (Concept)-[:IMPACTS]->(Concept)

    - `UTILIZES`: (Concept/Project)-[:UTILIZES]->(Concept/Platform)

- **Organizational/Responsibility:**

    - `LED_BY`: (Project/Team)-[:LED_BY]->(Person) OR (Project)-[:LED_BY]->(Team)

    - `REPORTS_TO`: (Person)-[:REPORTS_TO]->(Person)

    - `SPONSORS`: (Person)-[:SPONSORS]->(Project)

    - `INCLUDES_MODULE`: (Project)-[:INCLUDES_MODULE]->(Module)

    - `CRITICAL_FOR`: (Module)-[:CRITICAL_FOR]->(Project)

    - `DEVELOPED_BY`: (Module)-[:DEVELOPED_BY]->(Team/Vendor)

    - `PROVIDES_SUPPORT_FOR`: (Vendor)-[:PROVIDES_SUPPORT_FOR]->(Project)

    - `MAINTAINED_BY`: (Platform)-[:MAINTAINED_BY]->(Team)

    - `UNDER_DEVELOPMENT_BY`: (Module)-[:UNDER_DEVELOPMENT_BY]->(Team)

    - `WILL_INTEGRATE_WITH`: (Module)-[:WILL_INTEGRATE_WITH]->(Project) (for future plans)


**Output Format:**

Return a single JSON object with two keys: "nodes" and "relationships".


```json

{{

  "nodes": [

    {{"id": "unique_id_or_name", "label": "NodeLabel", "properties": {{"prop1": "value1", "prop2": "value2"}}}},

    {{"id": "another_id", "label": "AnotherLabel", "properties": {{"prop_a": "value_a", "content": "extracted content"}}}}

  ],

  "relationships": [

    {{"source_id": "source_node_id", "target_id": "target_node_id", "type": "REL_TYPE", "properties": {{"order": 1}}}},

    {{"source_id": "another_source_id", "target_id": "another_target_id", "type": "ANOTHER_REL_TYPE", "properties": {{}}}}

  ]

}}


No comments:

Post a Comment