Every new data center consumes water, draws power from the grid, and occupies land. But the communities hosting them rarely have the data to evaluate whether they are getting a fair deal. This tool changes that.
Every U.S. county hosting a data center receives four scores, each normalized on a 0–1 scale:
Measures the combined strain on local water, energy, and land. Higher = more burden on community resources. Weighted: 40% water, 35% energy, 25% land.
Measures how transparent the state regulatory environment is. Higher = less accountability in permitting, tax incentives, and environmental review.
Measures what the community gets back in local jobs and wages per megawatt of installed capacity. Higher = better return. Uses Monte Carlo simulation for confidence intervals.
Blends all three scores: 50% burden, 30% opacity, 20% inverted return. Counties receive letter grades A through F. This is the single number that ranks communities by overall data center stress.
The entire index is built from 19 public data sources, all free. No proprietary datasets. No gated APIs. Total cost: under $500.
Transparency requires honesty about gaps. Here is what we cannot yet measure:
Peer review identified the following structural limitations. We disclose them here because transparency is a core value of this project.
Each tab focuses on a different dimension of the data center landscape:
The Data Center Stress Index (DCSI) is a county-level composite ranking of U.S. counties hosting data centers. It measures three dimensions: the resource burden each facility places on local water, energy, and land; the regulatory opacity surrounding permitting and operations; and the economic return the community receives in jobs and wages per megawatt of installed capacity. Use this dashboard to explore which communities bear the heaviest costs — and which receive the least in return.
Interactive choropleth showing the Composite Stress Index for U.S. counties with data center facilities. Green = low stress, red = high stress. Sourced from PNNL, FracTracker, Epoch AI, WRI Aqueduct, EIA-923, and BLS. Scroll to zoom.
This Sankey diagram traces who owns data center capacity, where that capacity is headquartered, and how it maps to resource stress tiers. Each flow line represents MW capacity moving from a corporate operator through a headquarters country to a stress grade tier. Click any corporation or country node to cross-filter every chart on the dashboard.
Market share of U.S. data center capacity by parent company headquarters country or corporate operator. Foreign-owned facilities may face additional CFIUS scrutiny. Data sourced from PNNL IM3 Atlas and Epoch AI ownership records.
Each bar decomposes a state's aggregate resource burden into water stress, grid load, and land use components, the three pillars of the RBS formula (40/35/25 weighting). Water stress comes from WRI Aqueduct 4.0, grid load from EIA-923 plant-level generation, and land use from PNNL facility footprints over USGS NLCD developed area. Click any state to see the raw data and calculation.
Districts ranked by composite stress index. This view connects resource burden to political accountability. Every district bar maps to a specific representative who can champion or block data center policy. District boundaries from Census TIGER/Line shapefiles, legislators from the @unitedstates project and Open States.
Counties delivering the most local economic value per megawatt of data center capacity. High ERS counties demonstrate that data centers can generate meaningful employment. Employment and wage data sourced from BLS Quarterly Census of Employment and Wages (NAICS 518210). MW capacity from PNNL, Epoch AI, FracTracker, and hand-researched overrides.
Reading this chart: Each bubble is a county hosting data center facilities. Bubble size reflects total MW capacity.
The top-left quadrant is the danger zone: high resource burden (water, energy, land) paired with low economic return (few jobs, low wages per MW). These counties subsidize the digital economy without proportional benefit.
The bottom-right quadrant is the sweet spot: modest resource consumption alongside meaningful local employment and wages.
Click any county bubble to see its full scorecard, elected officials, and facility-level breakdown.
Scatter plot of Resource Burden Score (x-axis) vs. Economic Return Score (y-axis) for all counties with data center facilities. RBS is derived from WRI Aqueduct water stress, EIA-923 energy data, and USGS land cover. ERS is derived from BLS QCEW employment and wage data. Bubble color reflects the Composite Stress Index. Use your scroll wheel to zoom into dense clusters.
Select a state to see its counties, facilities, ownership breakdown, resource burden, and economic return. All charts below filter to the selected state.
Reading this chart: Each bubble is a county. Size reflects MW capacity.
The top-left quadrant = high burden, low return. The bottom-right = low burden, high return.
Select a policy proposal to see how it would change county stress scores. The map adjusts to show before/after impact. Proposed data centers are shown with dashed outlines. Use your scroll wheel to zoom into any region of the map.
The Regulatory Opacity Score (ROS) measures how much information states disclose about data center permitting, environmental review, utility rates, tax incentives, and ownership. Higher opacity means less public transparency. Scores are derived from a 6-variable State Regulatory Index across 42 DC-hosting jurisdictions. Click any bar to filter the dashboard to that state.
$16.2 billion in disclosed tax incentives, abatements, and megadeals tracked by Good Jobs First’s Subsidy Tracker. Five companies — Amazon, Apple, Meta, Google, and Microsoft — account for 86% of all disclosed subsidy value. An additional 299 records have undisclosed amounts, meaning the true total is significantly higher.
How the DCSI is calculated, what data it uses, and where human judgment enters.
For every U.S. county hosting a data center: what resource burden is that facility placing on the community, and what economic return is the community receiving?
V8: Percentile rank normalization (replaces z-score+minmax, which was destroyed by outliers in the energy and land distributions). Expert weights (primary) validated against PCA-derived, entropy, and equal-weight variants. All four variants show Spearman ρ > 0.96.
V7: Wage Premium (DC avg pay / county all-industry avg pay) replaces raw wages_per_mw. Monte Carlo error propagation (1,000 iterations) produces 90% credible intervals. Counties rated Tier A (high confidence), B (moderate), or C (low — imputed inputs). Mean wage premium: ~1.9x county average.
12-indicator binary scoring of corporate transparency across 142 data center companies. Indicators span environmental reporting, energy disclosure, water disclosure, PUE reporting, renewable targets, cooling technology, community engagement, tax incentive disclosure, beneficial ownership, supply chain transparency, third-party audits, and incident reporting.
Scale: 0 = fully transparent, 1 = fully opaque. Tiers: Transparent (≤0.25), Open (0.26–0.50), Partial (0.51–0.65), Opaque (0.66–0.80), Dark (>0.80). County-level COI is capacity-weighted (larger facilities contribute more).
ROS × (1 + pushback_mod + COI_mod) where COI_mod = −0.05 × (1 − county_avg_coi). Fully transparent companies (COI=0) provide maximum credit; fully opaque (COI=1) provide none.Facility-level composite derived from PUE, grid dependency, cooling impact, backup emissions, and a transparency penalty. Quality flags: measured (real data for 3+ components), partial (1–2 real data points), insufficient (fully imputed — EES of 0.593 is default).
Higher EES = more efficient. County-level EES is capacity-weighted.
Every county receives a data quality tier reflecting how much of its scoring input comes from observed source data versus estimates or imputations.
The four dimensions assessed are: (1) water stress (observed via Aqueduct vs. state-median imputed), (2) energy (EIA-923 supply data vs. MW demand proxy), (3) land (source acreage vs. MW-estimated), and (4) employment (BLS QCEW observed vs. state-residual imputed). Counties with Tier C data have the majority of their score driven by estimates and should be interpreted with caution. The tier is displayed in the county tooltip, scorecard, and sidebar.
CUSUM change-point detection was removed from the dashboard in V8. The available Aqueduct data covers a single year of seasonal variation, not a multi-year time series. Running CUSUM on 12 monthly values detects seasonal patterns (summer vs. winter), not genuine acceleration of water stress. The 38% flag rate confirmed this was noise, not signal. CUSUM has been replaced by the data quality tier system and facility-level mapping.
Measures whether data center stress clusters geographically or distributes randomly. A positive Moran's I indicates clustering: counties near high-stress counties tend to also be high-stress, suggesting regional infrastructure strain rather than isolated incidents. Computed on the CSI values using queen contiguity weights from the Census TIGER county shapefile.
Local Moran's I (LISA) identifies specific hot-spot and cold-spot clusters. Counties flagged as hot spots (High-High) are surrounded by other high-stress counties. These are the regional pressure zones where infrastructure strain compounds.
Rather than fixed weights (0.4/0.35/0.25), entropy weighting lets the data determine how much each component contributes to the composite score. Components with more variation across counties receive higher weights; components where all counties score similarly receive lower weights. This prevents a uniformly high-stress component from dominating the index without adding discriminatory power.
Both fixed and entropy-weighted composites are computed. The dashboard defaults to fixed weights (which are more interpretable for policy audiences) but allows toggling to entropy-weighted for analytical rigor.
Tests whether data center facility announcements (from FracTracker timeline data) Granger-cause changes in county-level water stress scores (from Aqueduct monthly data). If past facility announcements help predict future water stress beyond what water stress's own history predicts, this provides statistical evidence of a causal link between data center expansion and resource degradation.
Applied county-by-county where sufficient time-series data exists (2015 to 2026). Results reported as significant/not significant with lag selection via AIC.
Counties receive letter grades A through F based on their percentile rank within all 670 data-center-hosting counties:
Tier C data quality counties (where ≤1 of 4 scoring dimensions uses observed data) should be interpreted with particular caution. Their grades are driven primarily by estimates and imputations.
Does not rank counties without facilities. Does not make causal claims. Does not generate regulatory findings. Informs judgment; it does not replace it.
Intelligence-grade analysis. Built in public. Under $500. The analyst never abdicated to the machine.
The Data Center Stress Index answers a question that should be simple: when a data center moves into a county, what does that county actually get in return?
Every county in the United States that hosts a data center is scored across three dimensions: the resource burden it bears (water consumption, grid load, and land use), the transparency of its regulatory environment (permitting visibility and diesel generator reliance), and the economic return it receives (local jobs and wages per megawatt of installed capacity). These three scores combine into a single composite index that ranks counties on a letter-grade scale from A (low burden, high return) to F (high burden, low return).
The index draws from 22 public data sources, all free, totaling more than 1 GB of raw data. The V9.1 facility dataset (4,670 facilities across 50+ variables) merges PNNL, FracTracker (April 2026), and Epoch AI (April 2026) records with company-level transparency, efficiency metrics, and estimated energy and water consumption. Water stress scores come from the World Resources Institute's Aqueduct 4.0 atlas, spatially joined to county boundaries using Census TIGER shapefiles. Energy burden comes from the EIA-923 plant-level generation reports, crosswalked to counties through EIA-860 plant locations. Land use intensity is calculated from PNNL facility footprints divided by county total land area (Census TIGER ALAND). Employment and wage data come from the Bureau of Labor Statistics Quarterly Census of Employment and Wages (NAICS 518210 and all-industry), and regulatory opacity is derived from a 6-variable State Regulatory Index covering 42 jurisdictions. Foreign ownership attribution uses a structured field from the V8 facility dataset, covering 191 foreign-owned facilities across 14 countries.
Each component is normalized to a 0-to-1 scale and combined using expert weights validated against three alternative weighting schemes (PCA-derived, entropy, and equal-weight). The Resource Burden Score uses 40% water, 35% energy, and 25% land; PCA validation confirms water and energy dominate variance. The Regulatory Opacity Score uses a 6-variable State Regulatory Index covering environmental review, resource disclosure, tax incentive accountability, permitting openness, utility rate transparency, and ownership disclosure across 42 jurisdictions, modulated by a community pushback modifier. The Economic Return Score combines jobs per MW and a wage premium ratio (data center pay vs. county all-industry average), with Monte Carlo error propagation producing 90% credible intervals and confidence tiers. The composite index blends these three scores (50% burden, 30% opacity, 20% inverted return) to produce a single ranking. V8 adds a Company Opacity Index (12-indicator corporate transparency) and Energy Efficiency Score (PUE, grid dependency, cooling impact) at the facility level.
Beyond the core index, the tool applies several analytical methods to surface patterns that a simple ranking would miss. CUSUM (cumulative sum) change-point detection flags counties where water stress is accelerating rather than stable. Moran's I spatial autocorrelation identifies geographic clusters of high-stress counties, revealing regional infrastructure strain rather than isolated incidents. Entropy-weighted scoring offers a data-driven alternative to fixed weights, giving more influence to components that vary most across counties. And Granger causality testing examines whether data center facility announcements statistically predict future changes in local water stress.
The Policy Impact Lens lets users simulate the effect of 12 real legislative proposals on county stress scores. Each policy is decomposed into the specific formula parameters it would affect, with percentage adjustments derived from bill text and regulatory intent. The policies span the full spectrum from federal moratoriums that would halt all new construction to executive orders that accelerate permitting on federal land. This is not prediction; it is structured scenario analysis designed to inform advocacy, journalism, and legislative debate.
The DCSI does not rank counties that have no data center facilities. It does not make causal claims about whether data centers caused resource degradation. It does not generate regulatory findings or legal conclusions. It informs judgment; it does not replace it. Every data source is public, every formula is disclosed, and every assumption is documented in the Methodology page.
The Data Center Stress Index is a project by Anna R. Dudley. The entire tool was built using publicly available data for under $500. No proprietary datasets. No gated APIs. No corporate sponsorship. The purpose is to put the same analytical capability in the hands of local officials, community advocates, and journalists that the industry's own lobbyists already have.
For questions, speaking inquiries, or data requests, contact Anna R. Dudley. Power moves before policy does.
This page is dedicated to my dear friend and mentor, J.R. Looking back only four years ago sitting in a old basement you taught me how to set up my first virtual machine and how to install packages. In this new age of AI, you continue to teach the entire floor about the importance of never abdicating to the machine. Let AI work for you. Don’t let it think for you. This tab shows all the errors found during the analyst-in-the-loop phase.
During the development of the DCSI pipeline and dashboard, the analyst (Anna) identified and corrected numerous AI-introduced errors. These ranged from critical scoring bugs that would have invalidated the entire index to visual rendering issues. Every error below was caught through manual review, not automated testing. This page exists as both a record and a reminder: the analyst never abdicated to the machine.