Key Findings
- Concrete volumes (footings and piers) were within 2% on all five CY line items — essentially exact.
- Structural steel element counts (columns, base plates) were perfect. Secondary framing members were underestimated by 50–96% on six of eight beam types.
- Building envelope was systematically underestimated — Claude used plan footprint area instead of full 3D enclosure surface.
- Some items were significantly overestimated: footing rebar (bar-grid vs bar-schedule), guardrail, handrail, and mineral wool.
- Applied against industry unit rates, Claude's quantities produce a total bid of ~$443K vs the consultant's ~$907K — a 51% underestimate overall.
1. Approach
The experiment was designed as a controlled accuracy test: supply Claude with only the IFC drawing package and ask it to populate two blank quantity takeoff templates — no hints, no reference quantities, and no human guidance on what to count.
Source Materials
The project package comprised:
- IFC Drawings Structural — 147 pages, 52 MB PDF. Structural plans, elevations, sections, and schedules for the Bridge Connection and Substation Addition.
- IFC Drawings Architectural — 156 pages, 284 MB PDF. Architectural plans, elevations, and schedules including door schedules and wall panel layouts.
- Drawing List (IFC).xlsx — Index of all drawing numbers and their PDF page locations.
- Bridge Connection and Substation Addition empty takeoff templates — CSI-formatted spreadsheets with all line items defined but the quantity column left blank.
How Claude Worked — Cowork Mode
This session ran inside Claude's Cowork mode, running in a sandboxed Ubuntu 22 Linux VM on the user's desktop. Claude had access to a Bash shell, file read/write, and web search. For this task it:
- Read the drawing list Excel to build a page-number index
- Used
pdftotext to extract machine-readable text from specific structural pages (footing schedules, pier schedules, framing notes) - Used
pdftoppm to render plan and elevation pages as 200 DPI PNG images for visual scale reading - Wrote a Python script using
openpyxl to populate both takeoff templates while preserving all existing formulas and formatting
The entire workflow ran in a single coherent session with one context-length interruption that was cleanly resumed. Estimated API cost: ~$3 at Claude Sonnet 4.5 pricing.
2. Ground Truth — Third-Party Cost Estimator
After Claude completed both takeoffs, a third-party professional cost estimator reviewed the Bridge Connection scope and produced their own quantities from the same drawing package. Their results were entered alongside Claude's, creating a direct line-by-line comparison across 45 non-lump-sum line items.
3. Results
Accuracy Distribution
The charts below show how % errors are distributed across all 45 non-LS line items. Items within −20 to +20% are considered essentially correct; beyond that represents meaningful misses.
Cost Impact
Indicative mid-range industry rates (RSMeans / Canadian market, CAD) were applied to both sets of quantities, including a standard 30% markup (contingency, overhead & profit, tax, bond).
4. Takeaways
Where AI Quantity Estimation Works Well
- Concrete volumes from schedules. When a footing or pier schedule explicitly lists dimensions, Claude computes volumes with high precision — footings within 0.04%, piers within 2%.
- Element counts from drawings. Structural column counts, base plate counts, and door assembly counts were exact.
- Simple geometric items with clear drawing callouts. Engineered fill, pier waterproofing, and formwork areas were near-exact because they derived directly from schedule dimensions.
- General conditions. Lump-sum items (permits, mobilization, supervision) default to 1-each and are always correct.
Where AI Quantity Estimation Struggles Today
- Secondary structural framing. Claude read primary members from framing elevations but missed secondary bays, intermediate beams, and connection hardware — six of eight beam types were 45–96% underestimated.
- 3D enclosure area vs plan area. Wall panels, spray fireproofing, and insulation were calculated from the plan footprint. The consultant measured from elevations showing the full 3D surface — typically 3–5× larger for an enclosed bridge structure.
- Bar-schedule vs bar-grid rebar. Claude computed footing rebar as a full grid of bars at 12" spacing. Engineers use a fixed bar count from the schedule, which is optimised and much lighter — producing a 12× overestimate.
- Specialty secondary items. Guardrail, handrail, and coping quantities depend on knowing exact scope boundaries. Without that context, Claude defaulted to full-perimeter counts — 3–12× overestimates.
Economic Assessment
At roughly $3 in API costs, Claude produced a complete set of quantities for two construction scopes from a 303-page drawing package. Even at 51% underestimate accuracy at the bid level, this has clear value:
- As a first-pass sanity check. A 45-minute AI-generated takeoff that identifies all line items, gets most counts right, and flags the scopes needing careful measurement is a meaningful productivity multiplier.
- As a starting point for human review. The most productive use is likely "Claude does the initial takeoff, a human reviews the secondary steel and envelope areas." That correction takes hours, not days.
Recommended Improvements
- Explicit secondary framing pass. After reading primary framing elevations, explicitly search for secondary framing plans and count all members systematically.
- Use elevation drawings for envelope area. For enclosed structures, area takeoffs should come from elevations, not plans.
- Use bar schedule counts, not grid calculations. When a rebar schedule is present, use the listed bar count directly.
- Scope boundary prompting. Ask whether items like guardrail should be counted on one or both sides. Ambiguity defaults should be conservative (lower quantity).