Case Study

Can Claude Perform Good Estimates?

A first-principles accuracy test: Claude reads IFC construction drawings and generates two complete quantity takeoffs — then we compare line-by-line against a professional estimator's ground truth.

By Luigi La Corte·March 6, 2026

Key Findings

  • Concrete volumes (footings and piers) were within 2% on all five CY line items — essentially exact.
  • Structural steel element counts (columns, base plates) were perfect. Secondary framing members were underestimated by 50–96% on six of eight beam types.
  • Building envelope was systematically underestimated — Claude used plan footprint area instead of full 3D enclosure surface.
  • Some items were significantly overestimated: footing rebar (bar-grid vs bar-schedule), guardrail, handrail, and mineral wool.
  • Applied against industry unit rates, Claude's quantities produce a total bid of ~$443K vs the consultant's ~$907K — a 51% underestimate overall.

1. Approach

The experiment was designed as a controlled accuracy test: supply Claude with only the IFC drawing package and ask it to populate two blank quantity takeoff templates — no hints, no reference quantities, and no human guidance on what to count.

Source Materials

The project package comprised:

  • IFC Drawings Structural — 147 pages, 52 MB PDF. Structural plans, elevations, sections, and schedules for the Bridge Connection and Substation Addition.
  • IFC Drawings Architectural — 156 pages, 284 MB PDF. Architectural plans, elevations, and schedules including door schedules and wall panel layouts.
  • Drawing List (IFC).xlsx — Index of all drawing numbers and their PDF page locations.
  • Bridge Connection and Substation Addition empty takeoff templates — CSI-formatted spreadsheets with all line items defined but the quantity column left blank.

How Claude Worked — Cowork Mode

This session ran inside Claude's Cowork mode, running in a sandboxed Ubuntu 22 Linux VM on the user's desktop. Claude had access to a Bash shell, file read/write, and web search. For this task it:

  • Read the drawing list Excel to build a page-number index
  • Used pdftotext to extract machine-readable text from specific structural pages (footing schedules, pier schedules, framing notes)
  • Used pdftoppm to render plan and elevation pages as 200 DPI PNG images for visual scale reading
  • Wrote a Python script using openpyxl to populate both takeoff templates while preserving all existing formulas and formatting

The entire workflow ran in a single coherent session with one context-length interruption that was cleanly resumed. Estimated API cost: ~$3 at Claude Sonnet 4.5 pricing.

2. Ground Truth — Third-Party Cost Estimator

After Claude completed both takeoffs, a third-party professional cost estimator reviewed the Bridge Connection scope and produced their own quantities from the same drawing package. Their results were entered alongside Claude's, creating a direct line-by-line comparison across 45 non-lump-sum line items.

3. Results

Accuracy Distribution

The charts below show how % errors are distributed across all 45 non-LS line items. Items within −20 to +20% are considered essentially correct; beyond that represents meaningful misses.

Absolute % Error Breakdown — All Units: donut chart. 8 items ≤5% error, 2 items 5–20%, 35 items (78%) exceed ±20% error.
Figure 1. Absolute % error breakdown across 45 line items.
Overall Distribution of % Error — All Units (n=45): bar chart showing bimodal distribution — cluster near zero for concrete, extreme tails for secondary steel and envelope.
Figure 2. Overall distribution of % error. Concrete volumes cluster near zero; secondary steel and envelope sit in the extreme tails.
% Error Distribution by Unit of Measure: five bar charts for CY, LF, SF, EA, and LBS. CY (concrete) is most accurate; LF and SF show widest spread.
Figure 3. Per-unit distribution. CY (concrete) is most accurate; LF and SF show the widest spread.

Cost Impact

Indicative mid-range industry rates (RSMeans / Canadian market, CAD) were applied to both sets of quantities, including a standard 30% markup (contingency, overhead & profit, tax, bond).

Cost by CSI Division — Claude vs. Consultant. Divisions 05 Metals and 07 Envelope account for the entire gap; Divisions 01 and 03 are essentially tied.
Figure 4. Cost by CSI division. Divisions 05 (Metals) and 07 (Envelope) account for the entire gap.
Waterfall: Claude Estimate to Ground Truth (+$403K gap). Claude base $443K builds to ground truth $907K through Div03, Div05, Div07 additions.
Figure 5. Waterfall from Claude baseline ($443K) to ground truth ($907K).
Top 12 Cost Drivers of the $403K Gap. Metal Wall Panels alone account for $212K — more than half the total miss.
Figure 6. Top 12 individual cost drivers. Metal wall panels alone account for $212K of the gap.

4. Takeaways

Where AI Quantity Estimation Works Well

  • Concrete volumes from schedules. When a footing or pier schedule explicitly lists dimensions, Claude computes volumes with high precision — footings within 0.04%, piers within 2%.
  • Element counts from drawings. Structural column counts, base plate counts, and door assembly counts were exact.
  • Simple geometric items with clear drawing callouts. Engineered fill, pier waterproofing, and formwork areas were near-exact because they derived directly from schedule dimensions.
  • General conditions. Lump-sum items (permits, mobilization, supervision) default to 1-each and are always correct.

Where AI Quantity Estimation Struggles Today

  • Secondary structural framing. Claude read primary members from framing elevations but missed secondary bays, intermediate beams, and connection hardware — six of eight beam types were 45–96% underestimated.
  • 3D enclosure area vs plan area. Wall panels, spray fireproofing, and insulation were calculated from the plan footprint. The consultant measured from elevations showing the full 3D surface — typically 3–5× larger for an enclosed bridge structure.
  • Bar-schedule vs bar-grid rebar. Claude computed footing rebar as a full grid of bars at 12" spacing. Engineers use a fixed bar count from the schedule, which is optimised and much lighter — producing a 12× overestimate.
  • Specialty secondary items. Guardrail, handrail, and coping quantities depend on knowing exact scope boundaries. Without that context, Claude defaulted to full-perimeter counts — 3–12× overestimates.

Economic Assessment

At roughly $3 in API costs, Claude produced a complete set of quantities for two construction scopes from a 303-page drawing package. Even at 51% underestimate accuracy at the bid level, this has clear value:

  • As a first-pass sanity check. A 45-minute AI-generated takeoff that identifies all line items, gets most counts right, and flags the scopes needing careful measurement is a meaningful productivity multiplier.
  • As a starting point for human review. The most productive use is likely "Claude does the initial takeoff, a human reviews the secondary steel and envelope areas." That correction takes hours, not days.

Recommended Improvements

  • Explicit secondary framing pass. After reading primary framing elevations, explicitly search for secondary framing plans and count all members systematically.
  • Use elevation drawings for envelope area. For enclosed structures, area takeoffs should come from elevations, not plans.
  • Use bar schedule counts, not grid calculations. When a rebar schedule is present, use the listed bar count directly.
  • Scope boundary prompting. Ask whether items like guardrail should be counted on one or both sides. Ambiguity defaults should be conservative (lower quantity).

Ready to transform your pre-construction workflow?

Request a demo of Provision AI and see how we can help you identify risks earlier and bid with confidence.

Request a demo

Share