Hi @DW-21032103-0,
First thing first is according to Microsoft’s documentation, the tax models use OCR + layout analysis to extract structured fields, but not all fields are guaranteed to produce numeric values — especially across different tax years and IRS form revisions!
Second thing is when the field is recognized structurally (the box is detected on the form), but the OCR/field extraction didn’t capture a usable value.
Third thing is (related to first thing) if the IRS form is scanned or flattened, numeric fields may not be extracted properly.
Fourth is when we analyze document (Azure Document Intelligence), its alwasy good to make sure what it has analyzed! So, running OCR preprocessing (e.g., Azure Cognitive Services OCR) before feeding into Document Intelligence can improve results. Or validating 1040 model with valid schema (please see the finally option below) at least do a safe guard to make sure there is something!
Fifth, before entering into 3rd step (that you explained), if someway we can see what the analysis is done by Azure Document Intelligence will give us an insight. Probably analysis can be viewed either through the Document Intelligence Studio interface or programmatically via the API/SDKs, where it is returned as a structured JSON object. For batch operations, the results are stored in an Azure Blob Storage container.
Initially let's step in two steps:
Step-1: Let's have a fallback to content or valueString, we need to make sure we capture numbers even if they’re returned as strings.
coalesce(
body('Parse_JSON')?['analyzeResult']?['documents']?[0]?['fields']?['Box11']?['valueNumber'],
body('Parse_JSON')?['analyzeResult']?['documents']?[0]?['fields']?['Box11']?['content'],
0
)
Step-2: Let's store JSON for auditing purpose: it's always better to keep the full JSON output in a SharePoint column (e.g., “RawJSONExtraction”). This ensures we can troubleshoot later when values don’t appear.
FInally, I am not sure if you have/haven't employed the JSON schema for 1040 model validation, if not, I would suggest to do so. In the Parse JSON action, please paste this schema. (you can tune schema for your required fields)
{
"type": "object",
"properties": {
"analyzeResult": {
"type": "object",
"properties": {
"documents": {
"type": "array",
"items": {
"type": "object",
"properties": {
"fields": {
"type": "object",
"properties": {
"TaxpayerName": {
"type": "object",
"properties": {
"type": { "type": "string" },
"content": { "type": "string" },
"confidence": { "type": "number" }
}
},
"SSN": {
"type": "object",
"properties": {
"type": { "type": "string" },
"content": { "type": "string" },
"confidence": { "type": "number" }
}
},
"Wages": {
"type": "object",
"properties": {
"type": { "type": "string" },
"valueNumber": { "type": "number" },
"content": { "type": "string" },
"confidence": { "type": "number" }
}
},
"AGI": {
"type": "object",
"properties": {
"type": { "type": "string" },
"valueNumber": { "type": "number" },
"content": { "type": "string" },
"confidence": { "type": "number" }
}
},
"RefundAmount": {
"type": "object",
"properties": {
"type": { "type": "string" },
"valueNumber": { "type": "number" },
"content": { "type": "string" },
"confidence": { "type": "number" }
}
}
}
}
}
}
}
}
}
}
}
Feed the output of the Analyze Document (1040 model) into Parse JSON. We will now be able to reference fields directly, e.g.:
body('Parse_JSON')?['analyzeResult']?['documents'][0]?['fields']?['Wages']?['valueNumber']
body('Parse_JSON')?['analyzeResult']?['documents'][0]?['fields']?['TaxpayerName']?['content']
It covers common fields like TaxpayerName, SSN, Wages, AGI, and others. You can expand it as needed depending on which boxes you want to capture.
I am sure some clues I tried to give. If these clues help to resolve the issue brought you by here, please don't forget to check the box Does this answer your question? At the same time, I am pretty sure you have liked the response!