Cloudflare R2 — Bucket Setup

Cloudflare R2 — Bucket Setup & Access Key Management

Current Status (April 2026)

Item	Status	Detail
`urban-transparency-raw`	✅ Provisioned	Private bucket, WNAM region
`urban-transparency-processed`	✅ Provisioned + Public	r2.dev URL active (see below)
Custom domain (`data.civiscopio.com`)	📋 Backlog	Deferred — requires civiscopio.com zone in Cloudflare
R2 API keys (pipeline)	✅ N/A	Pipeline uses CF_API_TOKEN directly (no S3 creds needed)
GitHub Actions Secrets	⏳ Needed	CF_ACCOUNT_ID, CF_API_TOKEN, INEGI_API_TOKEN (see Step 2)

Public URL (current):

https://pub-892f495399ba478cbe1375809c9e3cdc.r2.dev

Rate-limited and not recommended for high-traffic production. Connect a custom domain when ready (see backlog).

Overview

Two R2 buckets serve the Urban Transparency Platform pipeline:

Bucket	Purpose	Public Access
`urban-transparency-raw`	Raw ingested files (CSV, JSON, PDF, GeoJSON)	Private
`urban-transparency-processed`	Analysis-ready files served to site (Parquet, GeoJSON, chart JSON)	Public via r2.dev (custom domain TBD)

Remaining Board Actions

Step 1 — Already done ✅

Both buckets were provisioned by Terraform. Public access on urban-transparency-processed is enabled with the r2.dev URL above.

Step 2 — Add GitHub Actions Secrets

The pipeline now uses CF_ACCOUNT_ID + CF_API_TOKEN directly (the same token you already have). No separate R2 API token is needed — the ingest scripts were rewritten to use the Cloudflare REST API instead of the S3-compatible API.

Go to the repo → Settings → Secrets and variables → Actions → New repository secret. Add:

Secret Name	Value
`CF_ACCOUNT_ID`	Your Cloudflare Account ID (`24545b09d114aa250c46b2703991cd7e`)
`CF_API_TOKEN`	Your `CLOUDFLARE_API_TOKEN` — the same one used for Terraform
`INEGI_API_TOKEN`	From INEGI developer portal

CF_ACCOUNT_ID and CF_API_TOKEN may already be set from the Cloudflare Pages deploy workflow. If so, you only need to add INEGI_API_TOKEN.

Folder Structure

`urban-transparency-raw` (private)

urban-transparency-raw/
├── inegi/
│   ├── indicators/{YYYY-MM-DD}/amm_municipalities.csv
│   └── denue/{YYYY-MM-DD}/amm_economic_units.json
├── conagua/
│   └── climate/{YYYY-MM-DD}/monterrey_climate.csv
├── osm/
│   └── boundaries/{YYYY-MM-DD}/amm_boundaries.geojson
├── scraped/{source-name}/{YYYY-MM-DD}/*.csv|*.json|*.pdf
└── _metadata/last_run.json

`urban-transparency-processed` (public via r2.dev / future custom domain)

urban-transparency-processed/
├── parquet/
│   ├── inegi/{YYYY-MM-DD}/amm_census.parquet
│   └── conagua/{YYYY-MM-DD}/amm_climate.parquet
├── geojson/
│   ├── boundaries/amm_colonias.geojson
│   ├── boundaries/amm_municipalities.geojson
│   └── layers/heat_stress.geojson
├── charts/{article-slug}/*.json
└── _metadata/last_run.json

Public URL pattern (current):

https://pub-892f495399ba478cbe1375809c9e3cdc.r2.dev/geojson/boundaries/amm_colonias.geojson

Naming Conventions

Bucket names: lowercase, hyphenated, prefixed urban-transparency-
Date partitions: ISO 8601 YYYY-MM-DD
Parquet/GeoJSON: descriptive snake_case
Chart JSON: kebab-case matching article slug

`_metadata/last_run.json` schema

{
  "timestamp": "2026-04-07T14:00:00Z",
  "run_id": "github-run-12345",
  "sources": {
    "inegi":   { "rows": 132, "status": "ok" },
    "conagua": { "rows": 365, "status": "ok" }
  }
}

Validation (after Step 2 complete)

# Trigger data pipeline manually
# GitHub → Actions → Data Ingest → Run workflow

# Verify files landed in raw bucket
wrangler r2 object get urban-transparency-raw/_metadata/last_run.json --pipe

# Verify CDN public access (use r2.dev URL until custom domain is set up)
curl -I https://pub-892f495399ba478cbe1375809c9e3cdc.r2.dev/_metadata/last_run.json

Custom Domain (Backlog)

When civiscopio.com is added as a zone in this Cloudflare account:

R2 → urban-transparency-processed → Settings → Public Access → Connect Domain
Enter data.civiscopio.com
Update terraform/variables.tf public_domain default to data.civiscopio.com
Update all references from the r2.dev URL to https://data.civiscopio.com