AWS Architecture Documentation

Note: The architecture described below is a designed AWS Free Tier analytics architecture. The pipeline has not been deployed to AWS. This document serves as a deployment plan and demonstrates cloud architecture knowledge for portfolio purposes.

Overview

This project is designed around a five-stage AWS Free Tier analytics pipeline that transforms a raw CSV file into a queryable, dashboard-ready dataset.

Raw CSV → Amazon S3 → AWS Glue Data Catalog → Amazon Athena → Amazon QuickSight → GitHub Pages

Pipeline Stages

Stage 1: Amazon S3 — Raw Data Storage

Purpose: Store the raw complaints.csv file as the single source of truth.

Implementation plan:

Create an S3 bucket: your-bucket-name
Upload complaints.csv to: s3://your-bucket-name/consumer-complaints/raw/complaints.csv
Enable versioning to preserve the original dataset
Set bucket policy to restrict public access

Free Tier limits:

5 GB of standard storage
20,000 GET requests and 2,000 PUT requests per month
This dataset (540 KB) is well within Free Tier limits

Stage 2: AWS Glue Data Catalog — Schema Registration

Purpose: Register the CSV schema so Athena can query it without manual table creation.

Implementation plan:

Create a Glue database: consumer_complaints
Run a Glue Crawler pointed at s3://your-bucket-name/consumer-complaints/raw/
The crawler auto-detects column names, data types, and row count
Alternatively, use the manual CREATE EXTERNAL TABLE statement in sql/01_create_athena_external_table.sql

Free Tier limits:

1 million objects stored in the Data Catalog per month
Glue Crawlers: first 1 million DPU-seconds per month are free

Stage 3: Amazon Athena — SQL Querying

Purpose: Run SQL queries against the S3-stored CSV using standard Presto/Trino syntax.

Implementation plan:

Point Athena to the Glue Data Catalog database consumer_complaints
Set query result location: s3://your-bucket-name/athena-results/
Run all queries from the sql/ directory
Use semantic views (sql/06_semantic_views.sql) as the governed query layer

Free Tier limits:

First 12 months: no Athena-specific free tier (pay per query)
Cost: $5 per TB of data scanned
This dataset is 540 KB — estimated cost per query: less than $0.01
Total estimated cost for all 6 SQL files: under $0.05

Cost optimization tips:

Use columnar formats (Parquet) for larger datasets
Partition data by year/month to reduce scan size
Use LIMIT clauses during development

Stage 4: Amazon QuickSight — Dashboard

Purpose: Build an interactive dashboard from the Athena semantic views.

Implementation plan:

Connect QuickSight to the Athena data source
Import the four semantic views as QuickSight datasets:
- vw_company_summary → Company leaderboard visual
- vw_monthly_trend → Time-series line chart
- vw_response_summary → Response outcome donut chart
- vw_subissue_detail → Sub-issue breakdown table
Publish as a shareable dashboard

Free Tier limits:

QuickSight offers a 30-day free trial for new accounts
Standard edition: $9/user/month after trial
For portfolio purposes, exported Python/matplotlib charts (in visuals/) serve as the dashboard layer

Stage 5: GitHub Pages — Documentation

Purpose: Publish the project documentation as a public-facing website.

Implementation plan:

Enable GitHub Pages on the repository (Settings → Pages → Deploy from branch)
Set source branch to main, folder to / (root)
Jekyll automatically builds the site from _config.yml and Markdown files
The site is free and publicly accessible at https://username.github.io/aws-consumer-complaint-intelligence/

Free Tier: GitHub Pages is free for public repositories.

Architecture Diagram

See architecture/aws_architecture_diagram.md for the full Mermaid diagram.

Estimated Monthly Cost (AWS Free Tier)

Service	Usage	Estimated Cost
Amazon S3	540 KB storage, ~50 requests/month	$0.00 (Free Tier)
AWS Glue	1 crawler run	$0.00 (Free Tier)
Amazon Athena	~10 queries × 540 KB scanned	< $0.01
Amazon QuickSight	30-day trial	$0.00 (trial)
GitHub Pages	Static site hosting	$0.00 (free)
Total		< $0.01/month

What Would Be Needed to Deploy

An AWS account (free to create)
AWS CLI configured with appropriate IAM permissions
An S3 bucket created in your preferred region
The complaints.csv uploaded to the S3 path
Athena query result bucket configured
Glue crawler run once to register the schema
QuickSight account connected to Athena

All SQL files in the sql/ directory are ready to run in Athena once the table is created.