Can Claude Perform Good Estimates? A First-Principles Accuracy Test

Key Findings

Concrete volumes (footings and piers) were within 2% on all five CY line items — essentially exact.
Structural steel element counts (columns, base plates) were perfect. Secondary framing members were underestimated by 50–96% on six of eight beam types.
Building envelope was systematically underestimated — Claude used plan footprint area instead of full 3D enclosure surface.
Some items were significantly overestimated: footing rebar (bar-grid vs bar-schedule), guardrail, handrail, and mineral wool.
Applied against industry unit rates, Claude's quantities produce a total bid of ~$443K vs the consultant's ~$907K — a 51% underestimate overall.

1. Approach

The experiment was designed as a controlled accuracy test: supply Claude with only the IFC drawing package and ask it to populate two blank quantity takeoff templates — no hints, no reference quantities, and no human guidance on what to count.

Source Materials

The project package comprised:

IFC Drawings Structural — 147 pages, 52 MB PDF. Structural plans, elevations, sections, and schedules for the Bridge Connection and Substation Addition.
IFC Drawings Architectural — 156 pages, 284 MB PDF. Architectural plans, elevations, and schedules including door schedules and wall panel layouts.
Drawing List (IFC).xlsx — Index of all drawing numbers and their PDF page locations.
Bridge Connection and Substation Addition empty takeoff templates — CSI-formatted spreadsheets with all line items defined but the quantity column left blank.

How Claude Worked — Cowork Mode

This session ran inside Claude's Cowork mode, running in a sandboxed Ubuntu 22 Linux VM on the user's desktop. Claude had access to a Bash shell, file read/write, and web search. For this task it:

Read the drawing list Excel to build a page-number index
Used pdftotext to extract machine-readable text from specific structural pages (footing schedules, pier schedules, framing notes)
Used pdftoppm to render plan and elevation pages as 200 DPI PNG images for visual scale reading
Wrote a Python script using openpyxl to populate both takeoff templates while preserving all existing formulas and formatting

The entire workflow ran in a single coherent session with one context-length interruption that was cleanly resumed. Estimated API cost: ~$3 at Claude Sonnet 4.5 pricing.

2. Ground Truth — Third-Party Cost Estimator

After Claude completed both takeoffs, a third-party professional cost estimator reviewed the Bridge Connection scope and produced their own quantities from the same drawing package. Their results were entered alongside Claude's, creating a direct line-by-line comparison across 45 non-lump-sum line items.

3. Results

Accuracy Distribution

The charts below show how % errors are distributed across all 45 non-LS line items. Items within −20 to +20% are considered essentially correct; beyond that represents meaningful misses.

Absolute % Error Breakdown — All Units: donut chart. 8 items ≤5% error, 2 items 5–20%, 35 items (78%) exceed ±20% error. — Figure 1. Absolute % error breakdown across 45 line items.

Overall Distribution of % Error — All Units (n=45): bar chart showing bimodal distribution — cluster near zero for concrete, extreme tails for secondary steel and envelope. — Figure 2. Overall distribution of % error. Concrete volumes cluster near zero; secondary steel and envelope sit in the extreme tails.

% Error Distribution by Unit of Measure: five bar charts for CY, LF, SF, EA, and LBS. CY (concrete) is most accurate; LF and SF show widest spread. — Figure 3. Per-unit distribution. CY (concrete) is most accurate; LF and SF show the widest spread.

Cost Impact

Indicative mid-range industry rates (RSMeans / Canadian market, CAD) were applied to both sets of quantities, including a standard 30% markup (contingency, overhead & profit, tax, bond).

Cost by CSI Division — Claude vs. Consultant. Divisions 05 Metals and 07 Envelope account for the entire gap; Divisions 01 and 03 are essentially tied. — Figure 4. Cost by CSI division. Divisions 05 (Metals) and 07 (Envelope) account for the entire gap.

Waterfall: Claude Estimate to Ground Truth (+$403K gap). Claude base $443K builds to ground truth $907K through Div03, Div05, Div07 additions. — Figure 5. Waterfall from Claude baseline ($443K) to ground truth ($907K).

Top 12 Cost Drivers of the $403K Gap. Metal Wall Panels alone account for $212K — more than half the total miss. — Figure 6. Top 12 individual cost drivers. Metal wall panels alone account for $212K of the gap.

4. Takeaways

Where AI Quantity Estimation Works Well

Concrete volumes from schedules. When a footing or pier schedule explicitly lists dimensions, Claude computes volumes with high precision — footings within 0.04%, piers within 2%.
Element counts from drawings. Structural column counts, base plate counts, and door assembly counts were exact.
Simple geometric items with clear drawing callouts. Engineered fill, pier waterproofing, and formwork areas were near-exact because they derived directly from schedule dimensions.
General conditions. Lump-sum items (permits, mobilization, supervision) default to 1-each and are always correct.

Where AI Quantity Estimation Struggles Today

Secondary structural framing. Claude read primary members from framing elevations but missed secondary bays, intermediate beams, and connection hardware — six of eight beam types were 45–96% underestimated.
3D enclosure area vs plan area. Wall panels, spray fireproofing, and insulation were calculated from the plan footprint. The consultant measured from elevations showing the full 3D surface — typically 3–5× larger for an enclosed bridge structure.
Bar-schedule vs bar-grid rebar. Claude computed footing rebar as a full grid of bars at 12" spacing. Engineers use a fixed bar count from the schedule, which is optimised and much lighter — producing a 12× overestimate.
Specialty secondary items. Guardrail, handrail, and coping quantities depend on knowing exact scope boundaries. Without that context, Claude defaulted to full-perimeter counts — 3–12× overestimates.

Economic Assessment

At roughly $3 in API costs, Claude produced a complete set of quantities for two construction scopes from a 303-page drawing package. Even at 51% underestimate accuracy at the bid level, this has clear value:

As a first-pass sanity check. A 45-minute AI-generated takeoff that identifies all line items, gets most counts right, and flags the scopes needing careful measurement is a meaningful productivity multiplier.
As a starting point for human review. The most productive use is likely "Claude does the initial takeoff, a human reviews the secondary steel and envelope areas." That correction takes hours, not days.

Recommended Improvements

Explicit secondary framing pass. After reading primary framing elevations, explicitly search for secondary framing plans and count all members systematically.
Use elevation drawings for envelope area. For enclosed structures, area takeoffs should come from elevations, not plans.
Use bar schedule counts, not grid calculations. When a rebar schedule is present, use the listed bar count directly.
Scope boundary prompting. Ask whether items like guardrail should be counted on one or both sides. Ambiguity defaults should be conservative (lower quantity).

By Luigi La Corte

Can Claude Perform Good Estimates?