AWS Architecture Documentation
AWS Architecture Documentation
Note: The architecture described below is a designed AWS Free Tier analytics architecture. The pipeline has not been deployed to AWS. This document serves as a deployment plan and demonstrates cloud architecture knowledge for portfolio purposes.
Overview
This project is designed around a five-stage AWS Free Tier analytics pipeline that transforms a raw CSV file into a queryable, dashboard-ready dataset.
Raw CSV → Amazon S3 → AWS Glue Data Catalog → Amazon Athena → Amazon QuickSight → GitHub Pages
Pipeline Stages
Stage 1: Amazon S3 — Raw Data Storage
Purpose: Store the raw complaints.csv file as the single source of truth.
Implementation plan:
- Create an S3 bucket:
your-bucket-name - Upload
complaints.csvto:s3://your-bucket-name/consumer-complaints/raw/complaints.csv - Enable versioning to preserve the original dataset
- Set bucket policy to restrict public access
Free Tier limits:
- 5 GB of standard storage
- 20,000 GET requests and 2,000 PUT requests per month
- This dataset (540 KB) is well within Free Tier limits
Stage 2: AWS Glue Data Catalog — Schema Registration
Purpose: Register the CSV schema so Athena can query it without manual table creation.
Implementation plan:
- Create a Glue database:
consumer_complaints - Run a Glue Crawler pointed at
s3://your-bucket-name/consumer-complaints/raw/ - The crawler auto-detects column names, data types, and row count
- Alternatively, use the manual
CREATE EXTERNAL TABLEstatement insql/01_create_athena_external_table.sql
Free Tier limits:
- 1 million objects stored in the Data Catalog per month
- Glue Crawlers: first 1 million DPU-seconds per month are free
Stage 3: Amazon Athena — SQL Querying
Purpose: Run SQL queries against the S3-stored CSV using standard Presto/Trino syntax.
Implementation plan:
- Point Athena to the Glue Data Catalog database
consumer_complaints - Set query result location:
s3://your-bucket-name/athena-results/ - Run all queries from the
sql/directory - Use semantic views (
sql/06_semantic_views.sql) as the governed query layer
Free Tier limits:
- First 12 months: no Athena-specific free tier (pay per query)
- Cost: $5 per TB of data scanned
- This dataset is 540 KB — estimated cost per query: less than $0.01
- Total estimated cost for all 6 SQL files: under $0.05
Cost optimization tips:
- Use columnar formats (Parquet) for larger datasets
- Partition data by year/month to reduce scan size
- Use
LIMITclauses during development
Stage 4: Amazon QuickSight — Dashboard
Purpose: Build an interactive dashboard from the Athena semantic views.
Implementation plan:
- Connect QuickSight to the Athena data source
- Import the four semantic views as QuickSight datasets:
vw_company_summary→ Company leaderboard visualvw_monthly_trend→ Time-series line chartvw_response_summary→ Response outcome donut chartvw_subissue_detail→ Sub-issue breakdown table
- Publish as a shareable dashboard
Free Tier limits:
- QuickSight offers a 30-day free trial for new accounts
- Standard edition: $9/user/month after trial
- For portfolio purposes, exported Python/matplotlib charts (in
visuals/) serve as the dashboard layer
Stage 5: GitHub Pages — Documentation
Purpose: Publish the project documentation as a public-facing website.
Implementation plan:
- Enable GitHub Pages on the repository (Settings → Pages → Deploy from branch)
- Set source branch to
main, folder to/(root) - Jekyll automatically builds the site from
_config.ymland Markdown files - The site is free and publicly accessible at
https://username.github.io/aws-consumer-complaint-intelligence/
Free Tier: GitHub Pages is free for public repositories.
Architecture Diagram
See architecture/aws_architecture_diagram.md for the full Mermaid diagram.
Estimated Monthly Cost (AWS Free Tier)
| Service | Usage | Estimated Cost |
|---|---|---|
| Amazon S3 | 540 KB storage, ~50 requests/month | $0.00 (Free Tier) |
| AWS Glue | 1 crawler run | $0.00 (Free Tier) |
| Amazon Athena | ~10 queries × 540 KB scanned | < $0.01 |
| Amazon QuickSight | 30-day trial | $0.00 (trial) |
| GitHub Pages | Static site hosting | $0.00 (free) |
| Total | < $0.01/month |
What Would Be Needed to Deploy
- An AWS account (free to create)
- AWS CLI configured with appropriate IAM permissions
- An S3 bucket created in your preferred region
- The
complaints.csvuploaded to the S3 path - Athena query result bucket configured
- Glue crawler run once to register the schema
- QuickSight account connected to Athena
All SQL files in the sql/ directory are ready to run in Athena once the table is created.