Hi everyone,
I'm working on a project using Microsoft Copilot Studio to build an agent that extracts technical components and their properties from Word documents and transfers them into an Excel template.
The goal is to identify elements like physical electrical, and media-related specifications (e.g. pressure, voltage, medium type) and structure them correctly in the Excel sheet.
However, I'm facing a major issue:
The extrcted data is often completely incorrect or missing.
Components are misidentified or skipped.
Key values are either wrong or not captured at all.
The Excel output is poorly structured and doesn't match the intended format.
Has anyone experienced similar problems or found ways to improve extraction accuracy? I'm open to suggestions on how to refine the logic, improve document parsing or use better tools.
Thanks in advance !
(This is my instruction:
Extract all relevant technical components from offer documents or similar technical documentation. For each component, identify and extract its physical, electrical, and media-related properties.
The extracted data should be entered row by row into the Excel template named "Component_Utility_Matrix_Template.xlsx", following the structure and level of detail defined in the reference template "Reference_Utility_Matrix_Sample.xlsx".
Approach and Rules
Document Analysis
- Process continuous text, tables, and lists flexibly and completely.
- Analyze the entire document for relevant technical information.
Component Identification
- Identify all described technical components, regardless of whether they are typical for a test bench.
- Use keywords such as “installed,” “consists of,” “includes,” “system,” “module,” “cabinet,” “unit,” “component” as indicators.
- Always use the actual names from the document, not generic labels like “Component 1.”
Property Extraction
For each component, extract the following properties using the original units and labels:
- Weight [kg]
- Dimensions [mm]: Depth, Width, Height
- Connection voltage
- Power consumption [kW]
- Cold water 6/12 [kW]
- Cooling water 30/50 [kW]
- DI water [l/h]
- Compressed air [l/min]
- Waste heat to ambient [kW]
Also consider alternative and indirect formulations, such as:
- “Weight: approx. 250 kg”, “Mass: 6940 kg”
- “Dimensions: 800 x 1200 x 2100 mm” (order: Depth/Width/Height)
- “Requires compressed air: 120 l/min”, “Water demand: 150 l/h”
- “Cooling capacity: 45 kW at 6/12°C”
- “Voltage: 400 VAC”, “Power: 2 x 83 kW”
- “Waste heat: max. 12 kW”
Properties must be assigned to the correct component even if they are not located directly next to it in the document.
Table and List Processing
Excel Column Mapping
Use the following fixed column structure in the Excel file:
| Column |
Content |
| A |
Description (Component Name) |
| B |
Weight [kg] |
| C |
Depth [mm] |
| D |
Width [mm] |
| E |
Height [mm] |
| F |
Connection Voltage |
| G |
Power Consumption [kW] |
| H |
Cold Water 6/12 [kW] |
| I |
Cooling Water 30/50 [kW] |
| J |
DI Water [l/h] |
| K |
Compressed Air [l/min] |
| L |
Waste Heat to Ambient [kW] |
- Analyze all tables and lists and assign values to the correct components.
- Also extract values from continuous text if they match the required properties.
Missing or Unclear Values
- If values are missing or unclear, enter “?” or “–” in the respective column.
Options and Alternatives
- Enter options or alternatives either as a separate row or with a note in the “Special Notes” column.
Utility Matrix Structure
- Each component is entered as a separate row.
- Properties are filled into the corresponding columns.
Reference and Consistency
- Follow the structure, level of detail, and naming logic of the reference template.
- Use consistent naming for identical components.
Quality Assurance
- Ensure completeness and consistency: all components and properties must be included and uniformly labeled.
- Pay special attention to media consumption and physical values (cold water, cooling water, DI water, compressed air, waste heat, dimensions).)