Data Center Stress Index

Who bears the cost
when a data center moves in?

Every new data center consumes water, draws power from the grid, and occupies land. But the communities hosting them rarely have the data to evaluate whether they are getting a fair deal. This tool changes that.

What questions does this dashboard answer?

1. Which counties bear the heaviest resource burden from data centers — in water, energy, and land?
2. Are communities getting fair economic returns — jobs and wages — relative to the resources consumed?
3. How transparent are states about permitting, environmental review, and tax incentives for data centers?
4. Who actually owns these facilities — and how much of the industry is controlled by foreign entities?
5. What would happen to county stress scores if proposed federal or state legislation were enacted?
6. Where are the geographic hot spots — regions where high-burden data center counties cluster together?

What we calculate

Every U.S. county hosting a data center receives four scores, each normalized on a 0–1 scale:

Resource Burden Score (RBS)

Measures the combined strain on local water, energy, and land. Higher = more burden on community resources. Weighted: 40% water, 35% energy, 25% land.

Regulatory Opacity Score (ROS)

Measures how transparent the state regulatory environment is. Higher = less accountability in permitting, tax incentives, and environmental review.

Economic Return Score (ERS)

Measures what the community gets back in local jobs and wages per megawatt of installed capacity. Higher = better return. Uses Monte Carlo simulation for confidence intervals.

Composite Stress Index (CSI)

Blends all three scores: 50% burden, 30% opacity, 20% inverted return. Counties receive letter grades A through F. This is the single number that ranks communities by overall data center stress.

How we got the data

The entire index is built from 19 public data sources, all free. No proprietary datasets. No gated APIs. Total cost: under $500.

Facilities
PNNL DC Atlas, FracTracker Alliance, Epoch AI
Water Stress
WRI Aqueduct 4.0, spatially joined to counties
Energy
EIA-923 generation + company sustainability reports
Land Use
PNNL sqft, Census TIGER ALAND, MW-to-acreage estimates
Employment
BLS QCEW (NAICS 518210), imputed where suppressed
Regulatory
State Regulatory Index (42 jurisdictions), EPA FLIGHT
Boundaries
Census TIGER shapefiles, congressional districts
Ownership
V8 facility traits: 191 foreign-owned across 14 countries
Policy
12 legislative proposals decomposed into formula parameters

What we know we don’t know

Transparency requires honesty about gaps. Here is what we cannot yet measure:

Energy is estimated, not metered. No public dataset tracks facility-level electricity consumption. V9 estimates consumption using LBNL’s 50% utilization methodology (down from V8’s 100% assumption). Operator-reported data from Meta (18 facilities) and Google/AWS PUE is used where available. For most facilities, consumption is modeled from nameplate MW.
247 counties lack acreage data. Not all facility sources report square footage or lot size. We estimate from MW capacity using industry heuristics (50 acres per 100 MW).
~65% of employment data is suppressed. BLS withholds county-level QCEW data to protect employer confidentiality. We impute from state totals proportional to MW capacity.
Behind-the-meter gas is invisible. An estimated 56 GW of on-site natural gas generation is not captured by any federal energy dataset.
Water consumption is now estimated from cooling type. V9 uses WUE benchmarks (Siddik et al. 2021/2022) to estimate gallons per MWh by cooling method. Meta’s reported per-facility water data validates the approach for 15 US facilities. However, cooling method is unknown for most facilities, and the national weighted average (0.3 L/kWh per Siddik et al. 2021) is used as default.
419 of 670 counties lack precise facility coordinates. Most facilities plot at county centroids. Individual addresses are not in public datasets.
425 facilities appear as both operating and proposed. FracTracker and V8 source data overlap. Deduplication is ongoing.
The State Regulatory Index covers 42 of 50 states. Eight states lack sufficient data for all six transparency variables.

Methodological limitations

Peer review identified the following structural limitations. We disclose them here because transparency is a core value of this project.

Composite weights are policy judgments. The weights that combine water, energy, and land into the Resource Burden Score (0.4/0.35/0.25) and the weights that combine RBS, ROS, and ERS into the composite (0.5/0.3/0.2) reflect the analyst’s judgment about relative importance. They are not derived from the data. PCA validation shows the resulting rankings are robust (ρ > 0.96 vs. data-driven weights), but users should explore the entropy-weighted toggle for a purely data-driven alternative.
Water stress is not water consumption. WRI Aqueduct measures basin-level water stress — the ratio of withdrawals to available supply — not facility-specific water use. A data center with closed-loop cooling in a stressed basin scores the same as an evaporative facility. Facility-level gallons-per-MWh estimates by cooling type are planned for V9.
Economic data is heavily imputed. BLS suppresses ~65% of county-level employment data to protect employer confidentiality. The Economic Return Score for most counties is modeled from state totals, not observed. Counties with high imputation are flagged as Tier C data quality — interpret their ERS with caution.
Regulatory scoring has a single rater. The State Regulatory Index was scored by one analyst-AI team. An inter-rater reliability pilot (3 raters, 10 states, Cohen’s Weighted Kappa target ≥ 0.60) is planned but not yet complete. Until validated, SRI scores should be treated as preliminary.
Letter grades are relative rankings. The A-through-F grades use percentile thresholds, which means exactly 20% of counties fall into each tier by definition. An “A” means better than 80% of DC-hosting counties, not that a county has no data center impact. There are no absolute “safe” thresholds.
No external validation yet. The DCSI has not been compared against established environmental justice indices (EJScreen, SVI). This comparison is planned for V9 to demonstrate whether the DCSI captures impacts that existing tools miss.

Explore the dashboard

Each tab focuses on a different dimension of the data center landscape:

DCSI National
Interactive national map with county-level stress scores, filtering by ownership, classification, and capacity type. Includes Sankey ownership flows, burden breakdowns, and scatter analysis.
DCSI State
A one-stop-shop for any state: governor, research summary, county-level map, facility table, congressional districts, and ownership breakdown. Toggle between plain language, technical, and policy brief.
Policy Impact
Simulate how 12 real legislative proposals would change county stress scores. See proposed data centers on the map and explore what each would mean for local communities.
Methodology
Every formula, every weight, every assumption — fully disclosed. Includes PCA validation, entropy weighting sensitivity analysis, and Monte Carlo error propagation methodology.
Data Sources
Complete inventory of all 19 public data sources with file details, coverage statistics, and version information.
AI Accountability
Every AI-introduced error caught by the analyst during development. Severity, category, fix status, and the lesson: the analyst never abdicated to the machine.
Built in public. Under $500. All data free.
A project by Anna R. Dudley
DCSI National Overview

32 counties show high resource burden with below-average economic return

The Data Center Stress Index (DCSI) is a county-level composite ranking of U.S. counties hosting data centers. It measures three dimensions: the resource burden each facility places on local water, energy, and land; the regulatory opacity surrounding permitting and operations; and the economic return the community receives in jobs and wages per megawatt of installed capacity. Use this dashboard to explore which communities bear the heaviest costs — and which receive the least in return.

670
Counties with facilities
3,954
Facilities tracked
--
Est. TWh/yr consumed
--
Est. billion gal/yr water
754
Proposed / Under Review
45
Foreign HQ flagged
Priority
View
Filter
Resource Burden ranks counties by the combined strain data centers place on water supply (WRI Aqueduct), energy grid (EIA-923), and land use (PNNL/USGS). Economic Return flips the lens to rank by local jobs and wages generated per MW of capacity (BLS QCEW).
Scores shows the raw composite index for each county. Hot-Spot Clusters highlights geographic concentrations of high-stress counties using spatial autocorrelation (Moran's I). Entropy Weights lets the data determine component weights instead of fixed formulas.
2026
Facilities: 4,502Avg CSI: --Proposed: 658
Filtered by:
Clear all filters
Composite Stress Index
Low concernHigh concern
Layers
Commercial Government Academic
Zoom in to see individual data centers

Interactive choropleth showing the Composite Stress Index for U.S. counties with data center facilities. Green = low stress, red = high stress. Sourced from PNNL, FracTracker, Epoch AI, WRI Aqueduct, EIA-923, and BLS. Scroll to zoom.

We have provided the most accurate information we had access to. See the Data Sources tab.
Ownership to Stress Flow
Click any node to filter the dashboard

This Sankey diagram traces who owns data center capacity, where that capacity is headquartered, and how it maps to resource stress tiers. Each flow line represents MW capacity moving from a corporate operator through a headquarters country to a stress grade tier. Click any corporation or country node to cross-filter every chart on the dashboard.

Ownership data sourced from PNNL IM3 Atlas and Epoch AI. Facility-to-operator mapping may be incomplete for smaller or privately held operators. See the Data Sources tab for details.
Showing US-headquartered flows only
Facility Ownership
Click a segment to filter

Market share of U.S. data center capacity by parent company headquarters country or corporate operator. Foreign-owned facilities may face additional CFIUS scrutiny. Data sourced from PNNL IM3 Atlas and Epoch AI ownership records.

Ownership attribution reflects the best available public records. Some facilities are owned through subsidiaries or holding companies that may obscure the ultimate parent. See the Data Sources tab.
Resource Burden Breakdown
Click a state to see the math and filter

Each bar decomposes a state's aggregate resource burden into water stress, grid load, and land use components, the three pillars of the RBS formula (40/35/25 weighting). Water stress comes from WRI Aqueduct 4.0, grid load from EIA-923 plant-level generation, and land use from PNNL facility footprints over USGS NLCD developed area. Click any state to see the raw data and calculation.

Scores are normalized to a 0-to-1 scale. Underlying data quality varies by county and source. See the Data Sources tab for provider details.
Stress by Congressional District
Ranked by composite stress score

Districts ranked by composite stress index. This view connects resource burden to political accountability. Every district bar maps to a specific representative who can champion or block data center policy. District boundaries from Census TIGER/Line shapefiles, legislators from the @unitedstates project and Open States.

District-to-county mapping uses spatial overlay with a 1% area threshold to filter boundary slivers. See the Data Sources tab.
Economic Return Leaders
Counties with highest Jobs + Wages per MW

Counties delivering the most local economic value per megawatt of data center capacity. High ERS counties demonstrate that data centers can generate meaningful employment. Employment and wage data sourced from BLS Quarterly Census of Employment and Wages (NAICS 518210). MW capacity from PNNL, Epoch AI, FracTracker, and hand-researched overrides.

BLS suppresses employment data in counties with fewer than three reporting establishments to protect confidentiality. See the Data Sources tab.
Burden vs. Return: Which Counties Are Getting a Raw Deal?
Counties in the top-left quadrant bear the highest resource burden with the lowest economic return
Scroll to zoom / drag to pan / double-click to reset

Reading this chart: Each bubble is a county hosting data center facilities. Bubble size reflects total MW capacity.

The top-left quadrant is the danger zone: high resource burden (water, energy, land) paired with low economic return (few jobs, low wages per MW). These counties subsidize the digital economy without proportional benefit.

The bottom-right quadrant is the sweet spot: modest resource consumption alongside meaningful local employment and wages.

Click any county bubble to see its full scorecard, elected officials, and facility-level breakdown.

Scatter plot of Resource Burden Score (x-axis) vs. Economic Return Score (y-axis) for all counties with data center facilities. RBS is derived from WRI Aqueduct water stress, EIA-923 energy data, and USGS land cover. ERS is derived from BLS QCEW employment and wage data. Bubble color reflects the Composite Stress Index. Use your scroll wheel to zoom into dense clusters.

Scores reflect the best available public data. Some values are estimated where direct measurements are unavailable. See the Data Sources tab for complete sourcing.
DCSI State

Explore data center stress at the state level

Select a state to see its counties, facilities, ownership breakdown, resource burden, and economic return. All charts below filter to the selected state.

--
Counties
--
Facilities
--
Total MW
--
Proposed / Planned
--
Foreign HQ
--
Avg Stress Score
Governor
--
--
State Research Summary
Select a state to see a research summary.
Audience
Composite Stress Index
Low concernHigh concern
Facility Ownership
Within selected state
County Resource Burden
Water / Grid / Land breakdown per county
County Economic Return
Jobs + Wages per MW
Congressional Districts
Stress by district within state
Burden vs. Return
County-level scatter for selected state
Scroll to zoom / drag to pan / double-click to reset

Reading this chart: Each bubble is a county. Size reflects MW capacity.

The top-left quadrant = high burden, low return. The bottom-right = low burden, high return.

All Facilities
Select a state to see facilities

Policy Impact Lens

Select a policy proposal to see how it would change county stress scores. The map adjusts to show before/after impact. Proposed data centers are shown with dashed outlines. Use your scroll wheel to zoom into any region of the map.

Select a policy to see impact
Composite Stress Index
Low concernHigh concern
Existing Proposed
Policy impact scores are modeled estimates based on bill text and regulatory intent. They are not predictions. See the Data Sources tab for underlying data.
Score Impact Summary
How the selected policy changes county scores
Select a policy above to see a before/after analysis of how it would affect county stress scores, which counties move between tiers, and which proposed data centers would be blocked or modified.
Counties Most Affected
Top counties where scores change the most
State Regulatory Transparency (ROS)
How transparent is each state's data center permitting, environmental review, and tax incentive process?

The Regulatory Opacity Score (ROS) measures how much information states disclose about data center permitting, environmental review, utility rates, tax incentives, and ownership. Higher opacity means less public transparency. Scores are derived from a 6-variable State Regulatory Index across 42 DC-hosting jurisdictions. Click any bar to filter the dashboard to that state.

Proposed Data Centers
Facilities announced or under construction, not yet operational
Cancelled & Withdrawn Projects
Facilities that were cancelled, withdrawn, denied, or rejected — evidence that democratic process and community voice can shape outcomes
Follow the Money

Public Subsidies to Data Center Operators

$16.2 billion in disclosed tax incentives, abatements, and megadeals tracked by Good Jobs First’s Subsidy Tracker. Five companies — Amazon, Apple, Meta, Google, and Microsoft — account for 86% of all disclosed subsidy value. An additional 299 records have undisclosed amounts, meaning the true total is significantly higher.

$16.2B
Disclosed value
706
Subsidy records
86%
To top 5 firms
299
Undisclosed amounts
Top Recipients by Disclosed Value
Top States by Disclosed Value
Subsidy Awards by Year
Context: Indiana’s $8.5B total is dominated by a single Amazon megadeal ($8.3B in 2024). Washington’s 258 records reflect the state’s long-running data center tax incentive program. Virginia, despite hosting more data centers than any other state, shows only $141M in disclosed subsidies — likely reflecting the use of NDAs and nondisclosure agreements that prevent public reporting. The 299 undisclosed-value records suggest significant subsidy activity that is invisible to the public.
Source: Good Jobs First Subsidy Tracker. Full-text search for “data center” across 722,000 records. Some records may include non-DC companies with data center mentions in program notes. State-by-state comparisons are limited by uneven disclosure practices. As noted by Good Jobs First: “Due to uneven disclosure, it is NOT appropriate to make state-by-state comparisons.” Data current through November 2025.

Methodology

How the DCSI is calculated, what data it uses, and where human judgment enters.

The Analytical Question

For every U.S. county hosting a data center: what resource burden is that facility placing on the community, and what economic return is the community receiving?

Score Architecture

Resource Burden Score (RBS)

RBS = 0.4 x Water + 0.35 x Energy + 0.25 x Land

V8: Percentile rank normalization (replaces z-score+minmax, which was destroyed by outliers in the energy and land distributions). Expert weights (primary) validated against PCA-derived, entropy, and equal-weight variants. All four variants show Spearman ρ > 0.96.

Transparency note: The primary weights (0.4/0.35/0.25 for RBS; 0.5/0.3/0.2 for CSI) are policy-weighted, reflecting the analyst’s judgment about relative importance — not empirically derived from the data. PCA and entropy validation confirm these weights produce rankings highly correlated (ρ > 0.96) with data-driven alternatives, but the choice of weights is ultimately a value judgment. Users should consult the entropy-weighted toggle for a purely data-driven ranking.
V9 Energy: Estimated facility-level consumption using LBNL methodology (MW × utilization factor × 8,760 hours). Utilization defaults to 50% per LBNL 2024 national average, adjusted by facility type (hyperscale 58%, colocation 50%, enterprise 43%). PUE sourced from operator reports where available (Google per-campus, Meta fleet 1.08, AWS per-region, Microsoft 1.16) or estimated from facility type. Previous V8 approach used 100% utilization with flat PUE 1.3. EIA-923 generation data supplemented with MW-based demand proxy (MW × PUE × 8,760 hours) for 89 counties with data centers but no local power plants. Renewable energy credit reduces energy burden for facilities reporting above-median renewable sourcing (up to −10%).
V8 Land: MW-to-acreage estimation for 247 counties missing facility footprint data. Industry heuristic: ~20 acres per 100 MW. Source acreage used where available from PNNL and FracTracker.

Regulatory Opacity Score (ROS)

County ROS = State Regulatory Index x (1 + Pushback Modifier)
ROS v8 (active): 6-variable State Regulatory Index across 42 DC-hosting jurisdictions. Variables: environmental review, resource disclosure, tax incentive accountability, permitting openness, utility rate transparency, ownership disclosure. Each scored 0–3 with statute citations. Community pushback applies −10% modifier. V8 addition: Company Opacity Index (COI) now acts as a transparency modifier — counties where data center operators are more transparent (low COI) receive up to 5% reduction in ROS. This recognizes that corporate transparency partially compensates for regulatory opacity.
Validation note: Single-rater scores (anna_claude). Inter-rater reliability pilot (Cohen’s Weighted Kappa > 0.60 per variable) deferred to V8. Cronbach’s Alpha dropped as gate — six variables are formative (independent dimensions), not reflective.

Economic Return Score (ERS)

ERS = 0.5 x Jobs_per_MW + 0.5 x Wage_Premium

V7: Wage Premium (DC avg pay / county all-industry avg pay) replaces raw wages_per_mw. Monte Carlo error propagation (1,000 iterations) produces 90% credible intervals. Counties rated Tier A (high confidence), B (moderate), or C (low — imputed inputs). Mean wage premium: ~1.9x county average.

Composites

Burden Mode: 0.5 x RBS + 0.3 x ROS + 0.2 x (1 - ERS)
Return Mode: 0.5 x (1 - ERS) + 0.3 x RBS + 0.2 x ROS

Company Opacity Index (COI) — V8

12-indicator binary scoring of corporate transparency across 142 data center companies. Indicators span environmental reporting, energy disclosure, water disclosure, PUE reporting, renewable targets, cooling technology, community engagement, tax incentive disclosure, beneficial ownership, supply chain transparency, third-party audits, and incident reporting.

COI = 1.0 − (indicators_disclosed / 12)

Scale: 0 = fully transparent, 1 = fully opaque. Tiers: Transparent (≤0.25), Open (0.26–0.50), Partial (0.51–0.65), Opaque (0.66–0.80), Dark (>0.80). County-level COI is capacity-weighted (larger facilities contribute more).

V8 integration: COI now acts as a small modifier to the Regulatory Opacity Score (ROS). In counties where operators demonstrate higher transparency (lower COI), ROS is reduced by up to 5%. Formula: ROS × (1 + pushback_mod + COI_mod) where COI_mod = −0.05 × (1 − county_avg_coi). Fully transparent companies (COI=0) provide maximum credit; fully opaque (COI=1) provide none.

Energy Efficiency Score (EES) — V8

Facility-level composite derived from PUE, grid dependency, cooling impact, backup emissions, and a transparency penalty. Quality flags: measured (real data for 3+ components), partial (1–2 real data points), insufficient (fully imputed — EES of 0.593 is default).

EES = f(PUE_efficiency, grid_dependency, cooling_impact, backup_emissions) × (1 − transparency_penalty)

Higher EES = more efficient. County-level EES is capacity-weighted.

Note: EES is an informational overlay. It is not currently factored into the composite CSI score. Integration into the RBS Energy sub-score is under evaluation for V9.

Data Quality Tiers — V8

Every county receives a data quality tier reflecting how much of its scoring input comes from observed source data versus estimates or imputations.

Tier A: ≥3 of 4 dimensions observed  |  Tier B: 2 of 4 observed  |  Tier C: ≤1 observed

The four dimensions assessed are: (1) water stress (observed via Aqueduct vs. state-median imputed), (2) energy (EIA-923 supply data vs. MW demand proxy), (3) land (source acreage vs. MW-estimated), and (4) employment (BLS QCEW observed vs. state-residual imputed). Counties with Tier C data have the majority of their score driven by estimates and should be interpreted with caution. The tier is displayed in the county tooltip, scorecard, and sidebar.

CUSUM (Removed — V8)

CUSUM change-point detection was removed from the dashboard in V8. The available Aqueduct data covers a single year of seasonal variation, not a multi-year time series. Running CUSUM on 12 monthly values detects seasonal patterns (summer vs. winter), not genuine acceleration of water stress. The 38% flag rate confirmed this was noise, not signal. CUSUM has been replaced by the data quality tier system and facility-level mapping.

Spatial Autocorrelation (Moran's I)

Measures whether data center stress clusters geographically or distributes randomly. A positive Moran's I indicates clustering: counties near high-stress counties tend to also be high-stress, suggesting regional infrastructure strain rather than isolated incidents. Computed on the CSI values using queen contiguity weights from the Census TIGER county shapefile.

I = (N / W) x (Sum_ij w_ij(x_i - x_bar)(x_j - x_bar)) / (Sum_i(x_i - x_bar)^2)

Local Moran's I (LISA) identifies specific hot-spot and cold-spot clusters. Counties flagged as hot spots (High-High) are surrounded by other high-stress counties. These are the regional pressure zones where infrastructure strain compounds.

Entropy-Weighted Scoring

Rather than fixed weights (0.4/0.35/0.25), entropy weighting lets the data determine how much each component contributes to the composite score. Components with more variation across counties receive higher weights; components where all counties score similarly receive lower weights. This prevents a uniformly high-stress component from dominating the index without adding discriminatory power.

w_j = (1 - E_j) / Sum(1 - E_k), where E_j = -Sum p_ij ln(p_ij) / ln(n)

Both fixed and entropy-weighted composites are computed. The dashboard defaults to fixed weights (which are more interpretable for policy audiences) but allows toggling to entropy-weighted for analytical rigor.

Granger Causality Testing

Tests whether data center facility announcements (from FracTracker timeline data) Granger-cause changes in county-level water stress scores (from Aqueduct monthly data). If past facility announcements help predict future water stress beyond what water stress's own history predicts, this provides statistical evidence of a causal link between data center expansion and resource degradation.

F-test: H0: DC announcements do not Granger-cause water stress changes

Applied county-by-county where sufficient time-series data exists (2015 to 2026). Results reported as significant/not significant with lag selection via AIC.

Grading Methodology — Percentile Thresholds

Counties receive letter grades A through F based on their percentile rank within all 670 data-center-hosting counties:

A = 0–20th percentile  |  B = 20–40th  |  C = 40–60th  |  D = 60–80th  |  F = 80–100th
Transparency note: Percentile grading means exactly 20% of counties will always fall into each tier. This is a relative ranking, not an absolute threshold. An “A” grade means a county has lower composite stress than 80% of DC-hosting counties — it does not mean the county experiences no resource burden. There are no empirically validated “safe” or “critical” stress levels for data center hosting. Natural-breaks (Jenks) classification is under evaluation for V9 as a potential alternative.

Tier C data quality counties (where ≤1 of 4 scoring dimensions uses observed data) should be interpreted with particular caution. Their grades are driven primarily by estimates and imputations.

What This Tool Does Not Do

Does not rank counties without facilities. Does not make causal claims. Does not generate regulatory findings. Informs judgment; it does not replace it.

Reference

Data Sources

PNNL IM3 Data Center Atlas
Free
1,479 U.S. data center facility locations with county FIPS, coordinates, operator, and square footage. Includes high-growth and moderate-growth projection GeoJSON.
Source: MSD Live / PNNLJoin key: state_id + county_id to FIPSFiles: 5Download
Epoch AI Data Center Dataset
FreeUpdated Apr 2, 2026
26 frontier AI data centers (24 US) with verified MW capacity, H100-equivalent GPU counts (2.5M total), capital cost, and construction timelines. Tracks the largest AI compute installations including xAI Colossus (425 MW), OpenAI Stargate Abilene (590 MW), Anthropic-Amazon New Carlisle (1,092 MW), and Meta Prometheus (695 MW). Tier 1 source for MW corrections.
Source: epoch.ai (CC-BY licensed)Join key: Facility name + address geocodingFiles: 5 (centers, timelines, chillers, cooling towers, ZIP)Download
EIA-923 Power Plant Operations
Free
Plant-level electricity generation, fuel consumption, and environmental data. Covers 2015 to 2026 across three schedule types for time-series energy analysis.
Source: eia.govJoin key: Plant ID to EIA-860 to CountyFiles: 32Download
EIA-860 Plant Location Crosswalk
Free
Maps every power plant to its county, latitude, longitude, and utility. Essential join table connecting EIA-923 generation data to county-level geography.
Source: eia.govJoin key: Plant Code to CountyFiles: 13Download
WRI Aqueduct 4.0 Water Risk Atlas
Free
Global water stress scores: baseline annual, baseline monthly (for CUSUM velocity detection), and future projections (2030/2050/2080 scenarios).
Source: World Resources InstituteJoin key: Spatial overlay to County FIPSFiles: 3 CSV + GDBDownload
BLS QCEW: NAICS 518210 & 5182
Free
County-level employment and wage data for data processing/hosting (518210) and broader parent code (5182). Covers 2020 to 2025 with all ownership types.
Source: bls.gov/cewJoin key: area_fipsFiles: 12Download
EPA FLIGHT: Emissions by Unit
Free
Unit-level greenhouse gas emissions including diesel generator hours from Subparts C, D, and AA. Used for Regulatory Opacity Score diesel component.
Source: EPA GHGRPJoin key: Facility to CountyFiles: 1 (.xlsb)Download
FracTracker National DC Tracker
FreeUpdated Apr 2026
1,446 facilities with status (657 proposed, 529 operating, 119 approved/under construction, 52 expanding, 46 suspended, 43 cancelled). Includes community pushback documentation (172 facilities flagged), NDA tracking, MW capacity, cooling type, and power source. April 2026 overhaul adds 44 new fields including resistance status, advocacy information, and source URLs.
Source: FracTracker Alliance / ArcGISJoin key: County + lat/lonFiles: 4 (main CSV + PEC Virginia + Sci4GA layers)Download
USGS NLCD Land Cover
Free
National Land Cover Database raster (1985 to 2023) for land cover change analysis. Requires zonal statistics processing with county shapefile to extract developed-land area per county.
Source: USGS / ScienceBaseJoin key: Raster to County polygon overlayFiles: 5 (992 MB TIF)Download
Census TIGER County Shapefile
Free
2025 county boundary polygons for all 3,200+ U.S. counties. Used for spatial joins (Aqueduct to County, NLCD to County).
Source: Census BureauJoin key: GEOID (FIPS)Files: 7Download
Congressional & State Legislators
Free
Federal legislators from @unitedstates project. State legislators from Open States (51 state files). Used for "Contact Your Rep" feature in county modal.
Source: github.com/unitedstates + Open StatesFiles: 52Download
MW Capacity Overrides (Web Research)
Curated
Hand-researched MW power capacity corrections for facilities where the default square-footage estimate diverges significantly from publicly reported values. Sources include utility interconnection filings, operator press releases, engineering reports (PASE, DPR), and data center tracking databases (Baxtel, Aterio, interconnection.fyi). This override layer exists because campus-level square footage often includes non-IT space (offices, cooling infrastructure, parking), causing the standard 6 MW/100K sqft conversion to overestimate by 3 to 20 times for hyperscale facilities.
Source: Multiple (utility filings, operator sites, trade press)Join key: Facility name + State + CountyFiles: mw_overrides_v2.csv (97 entries) + mw_web_research_v2.csv (452 entries)Priority: Overrides sqft-derived estimates in pipeline
Ownership Affiliation Overrides (Web Research)
Curated
Hand-researched ownership corrections for 228 data center facilities where the original source listed operators as "Unknown" or "Other." Identifies the actual corporate operator, headquarters country, and flags foreign-owned entities. Sources include company press releases, SEC filings, utility interconnection records, and industry databases. Foreign-HQ operators identified include NTT (Japan), Cologix (Canada), Nebius/ex-Yandex (Netherlands), EdgeConneX/EQT (Sweden), Eneus Energy (UK), and MineOne (China).
Source: Multiple (company filings, press releases, industry databases)Join key: Facility name + State + CountyFiles: 1 (ownership_overrides.csv)Priority: Overrides "Unknown" operators in pipeline
Proposed Data Centers Database (Curated)
Curated
Comprehensive dataset of 754 proposed, approved, and under-construction data center facilities in the United States. Includes power source identification (grid, solar, nuclear, natural gas), expected completion dates, MW capacity, operator, and community impact notes. Nuclear-powered facilities are flagged for the Policy Impact Lens toggle. Derived from FracTracker Alliance data enriched with web research.
Source: FracTracker Alliance + web researchJoin key: Facility name + StateFiles: 1 (proposed_data_centers.csv)Nuclear flagged: 3 facilities
PNNL Sqft Supplement
Curated
Fills missing square footage values for 57 PNNL facilities using public filings, satellite imagery measurements, and operator disclosures. Enables MW estimation where only building footprints were available.
Files: pnnl_sqft_supplement.csv (57 entries)Priority: Fills gaps in PNNL Atlas
PNNL Exclusion List
Curated
Flags 9 PNNL entries that are not commercial data centers (government HPC, university research computing, or misidentified facilities). Government/academic facilities are excluded from ROS and ERS scoring but retained in RBS.
Files: pnnl_exclusion_list.csv (9 entries)Impact: Classification-based scoring exclusions
State Regulatory Index (V7)
Curated
6-variable state-level transparency index for 42 DC-hosting jurisdictions. Variables: permit_transparency, environmental_review, energy_disclosure, water_disclosure, tax_incentive_accountability, ownership_disclosure. Scored 0–3 per variable with statute citations. Mean index 0.220, range 0.056–0.556. Pipeline integration pending inter-rater reliability pilot.
Source: State statutes, administrative codes, regulatory docketsFiles: state_regulatory_index.csv (42 entries)Documentation: ROS_SCORING_SUMMARY_TIER1.md (10 states), ROS_SCORING_SUMMARY_TIER2.md (32 states)
MW Cross-Validation Audit
Audit
18-facility cross-validation comparing our researched MW values against independent sources (utility filings, industry databases). Tracks match rates, discrepancies, and items requiring human review for data quality assurance.
Files: mw_cross_validation.csv (18 entries)Purpose: Data quality verification
V8 Unified Facility Dataset
V8 Primary
5,151 facilities with 43 columns. Merges PNNL Atlas + FracTracker with Company Opacity Index (142 companies), Energy Efficiency Scores, sustainability traits (power_source, renewable_pct, cooling_method), and data quality tiers (T1/T2/T3). Enriched with company sustainability reports, utility filings, and industry databases. 14 facility misattributions and 288 county-state mismatches corrected.
Files: v8/facility_traits_merged.csv (5,151 × 43)Coverage: 670 counties, 142 companies scoredKey metrics: Mean COI 0.744, Mean EES 0.547
V9 Energy Estimation Sources
V9.1
Facility-level electricity consumption estimates using LBNL 2024 utilization factors (50% national average, adjusted by type: hyperscale 58%, colocation 50%, enterprise 43%). PUE sourced from operator reports: Google per-campus (16 US sites, TTM PUE 1.04–1.14), Meta fleet (1.08, plus 15 facilities with real 2024 MWh data), AWS per-region (3 US regions, PUE 1.12–1.15), Microsoft (1.16). EIA-861 county commercial demand used for validation.
Source: LBNL 2024, Google/Meta/AWS sustainability reports, EIA-861Files: 6 CSV (google_pue, aws_pue_wue, meta_facility, utilization_factors, water_benchmarks, eia861_county)
V9 Water Estimation Sources
V9.1
Facility-level water consumption estimates using WUE benchmarks by cooling type: evaporative towers 1.8 L/kWh (475 gal/MWh), hybrid 0.9 L/kWh, direct evaporative 0.2 L/kWh, air-cooled 0. Based on Siddik et al. 2021/2022 and Uptime Institute benchmarks. 10 Meta facilities use real 2024 reported water data instead of estimates.
Source: Siddik et al. 2021, Uptime Institute, Meta 2024 Sustainability ReportFiles: water_consumption_benchmarks.csv
Good Jobs First Subsidy Tracker
FreeNew
706 data center subsidy records across all U.S. states. $16.2 billion in disclosed tax credits, abatements, and megadeals. Top recipients: Amazon ($9.0B), Apple ($1.5B), Meta ($1.5B), Google ($1.4B), Microsoft ($684M). Covers 2000–2025 subsidy awards including property tax abatements, sales tax exemptions, and infrastructure grants.
Source: Good Jobs First / Subsidy TrackerJoin key: State + Parent CompanyRecords: 706 (407 with disclosed values)View
Total: 22 sources · All free and public

About This Project

Intelligence-grade analysis. Built in public. Under $500. The analyst never abdicated to the machine.

The Data Center Stress Index answers a question that should be simple: when a data center moves into a county, what does that county actually get in return?

Every county in the United States that hosts a data center is scored across three dimensions: the resource burden it bears (water consumption, grid load, and land use), the transparency of its regulatory environment (permitting visibility and diesel generator reliance), and the economic return it receives (local jobs and wages per megawatt of installed capacity). These three scores combine into a single composite index that ranks counties on a letter-grade scale from A (low burden, high return) to F (high burden, low return).

How It Works

The index draws from 22 public data sources, all free, totaling more than 1 GB of raw data. The V9.1 facility dataset (4,670 facilities across 50+ variables) merges PNNL, FracTracker (April 2026), and Epoch AI (April 2026) records with company-level transparency, efficiency metrics, and estimated energy and water consumption. Water stress scores come from the World Resources Institute's Aqueduct 4.0 atlas, spatially joined to county boundaries using Census TIGER shapefiles. Energy burden comes from the EIA-923 plant-level generation reports, crosswalked to counties through EIA-860 plant locations. Land use intensity is calculated from PNNL facility footprints divided by county total land area (Census TIGER ALAND). Employment and wage data come from the Bureau of Labor Statistics Quarterly Census of Employment and Wages (NAICS 518210 and all-industry), and regulatory opacity is derived from a 6-variable State Regulatory Index covering 42 jurisdictions. Foreign ownership attribution uses a structured field from the V8 facility dataset, covering 191 foreign-owned facilities across 14 countries.

Each component is normalized to a 0-to-1 scale and combined using expert weights validated against three alternative weighting schemes (PCA-derived, entropy, and equal-weight). The Resource Burden Score uses 40% water, 35% energy, and 25% land; PCA validation confirms water and energy dominate variance. The Regulatory Opacity Score uses a 6-variable State Regulatory Index covering environmental review, resource disclosure, tax incentive accountability, permitting openness, utility rate transparency, and ownership disclosure across 42 jurisdictions, modulated by a community pushback modifier. The Economic Return Score combines jobs per MW and a wage premium ratio (data center pay vs. county all-industry average), with Monte Carlo error propagation producing 90% credible intervals and confidence tiers. The composite index blends these three scores (50% burden, 30% opacity, 20% inverted return) to produce a single ranking. V8 adds a Company Opacity Index (12-indicator corporate transparency) and Energy Efficiency Score (PUE, grid dependency, cooling impact) at the facility level.

Advanced Techniques

Beyond the core index, the tool applies several analytical methods to surface patterns that a simple ranking would miss. CUSUM (cumulative sum) change-point detection flags counties where water stress is accelerating rather than stable. Moran's I spatial autocorrelation identifies geographic clusters of high-stress counties, revealing regional infrastructure strain rather than isolated incidents. Entropy-weighted scoring offers a data-driven alternative to fixed weights, giving more influence to components that vary most across counties. And Granger causality testing examines whether data center facility announcements statistically predict future changes in local water stress.

Policy Scenarios

The Policy Impact Lens lets users simulate the effect of 12 real legislative proposals on county stress scores. Each policy is decomposed into the specific formula parameters it would affect, with percentage adjustments derived from bill text and regulatory intent. The policies span the full spectrum from federal moratoriums that would halt all new construction to executive orders that accelerate permitting on federal land. This is not prediction; it is structured scenario analysis designed to inform advocacy, journalism, and legislative debate.

What This Tool Does Not Do

The DCSI does not rank counties that have no data center facilities. It does not make causal claims about whether data centers caused resource degradation. It does not generate regulatory findings or legal conclusions. It informs judgment; it does not replace it. Every data source is public, every formula is disclosed, and every assumption is documented in the Methodology page.

About the Author

The Data Center Stress Index is a project by Anna R. Dudley. The entire tool was built using publicly available data for under $500. No proprietary datasets. No gated APIs. No corporate sponsorship. The purpose is to put the same analytical capability in the hands of local officials, community advocates, and journalists that the industry's own lobbyists already have.

For questions, speaking inquiries, or data requests, contact Anna R. Dudley. Power moves before policy does.

AI Accountability

This page is dedicated to my dear friend and mentor, J.R. Looking back only four years ago sitting in a old basement you taught me how to set up my first virtual machine and how to install packages. In this new age of AI, you continue to teach the entire floor about the importance of never abdicating to the machine. Let AI work for you. Don’t let it think for you. This tab shows all the errors found during the analyst-in-the-loop phase.

Errors Caught by the Analyst

During the development of the DCSI pipeline and dashboard, the analyst (Anna) identified and corrected numerous AI-introduced errors. These ranged from critical scoring bugs that would have invalidated the entire index to visual rendering issues. Every error below was caught through manual review, not automated testing. This page exists as both a record and a reminder: the analyst never abdicated to the machine.

--
Total Errors Found
--
Critical Severity
--
High Severity
--
Fixed
--
Open
Errors by Severity
Errors by Category
Error Discovery Timeline