AWS Architecture Documentation

Note: The architecture described below is a designed AWS Free Tier analytics architecture. The pipeline has not been deployed to AWS. This document serves as a deployment plan and demonstrates cloud architecture knowledge for portfolio purposes.


Overview

This project is designed around a five-stage AWS Free Tier analytics pipeline that transforms a raw CSV file into a queryable, dashboard-ready dataset.

Raw CSV → Amazon S3 → AWS Glue Data Catalog → Amazon Athena → Amazon QuickSight → GitHub Pages

Pipeline Stages

Stage 1: Amazon S3 — Raw Data Storage

Purpose: Store the raw complaints.csv file as the single source of truth.

Implementation plan:

  • Create an S3 bucket: your-bucket-name
  • Upload complaints.csv to: s3://your-bucket-name/consumer-complaints/raw/complaints.csv
  • Enable versioning to preserve the original dataset
  • Set bucket policy to restrict public access

Free Tier limits:

  • 5 GB of standard storage
  • 20,000 GET requests and 2,000 PUT requests per month
  • This dataset (540 KB) is well within Free Tier limits

Stage 2: AWS Glue Data Catalog — Schema Registration

Purpose: Register the CSV schema so Athena can query it without manual table creation.

Implementation plan:

  • Create a Glue database: consumer_complaints
  • Run a Glue Crawler pointed at s3://your-bucket-name/consumer-complaints/raw/
  • The crawler auto-detects column names, data types, and row count
  • Alternatively, use the manual CREATE EXTERNAL TABLE statement in sql/01_create_athena_external_table.sql

Free Tier limits:

  • 1 million objects stored in the Data Catalog per month
  • Glue Crawlers: first 1 million DPU-seconds per month are free

Stage 3: Amazon Athena — SQL Querying

Purpose: Run SQL queries against the S3-stored CSV using standard Presto/Trino syntax.

Implementation plan:

  • Point Athena to the Glue Data Catalog database consumer_complaints
  • Set query result location: s3://your-bucket-name/athena-results/
  • Run all queries from the sql/ directory
  • Use semantic views (sql/06_semantic_views.sql) as the governed query layer

Free Tier limits:

  • First 12 months: no Athena-specific free tier (pay per query)
  • Cost: $5 per TB of data scanned
  • This dataset is 540 KB — estimated cost per query: less than $0.01
  • Total estimated cost for all 6 SQL files: under $0.05

Cost optimization tips:

  • Use columnar formats (Parquet) for larger datasets
  • Partition data by year/month to reduce scan size
  • Use LIMIT clauses during development

Stage 4: Amazon QuickSight — Dashboard

Purpose: Build an interactive dashboard from the Athena semantic views.

Implementation plan:

  • Connect QuickSight to the Athena data source
  • Import the four semantic views as QuickSight datasets:
    • vw_company_summary → Company leaderboard visual
    • vw_monthly_trend → Time-series line chart
    • vw_response_summary → Response outcome donut chart
    • vw_subissue_detail → Sub-issue breakdown table
  • Publish as a shareable dashboard

Free Tier limits:

  • QuickSight offers a 30-day free trial for new accounts
  • Standard edition: $9/user/month after trial
  • For portfolio purposes, exported Python/matplotlib charts (in visuals/) serve as the dashboard layer

Stage 5: GitHub Pages — Documentation

Purpose: Publish the project documentation as a public-facing website.

Implementation plan:

  • Enable GitHub Pages on the repository (Settings → Pages → Deploy from branch)
  • Set source branch to main, folder to / (root)
  • Jekyll automatically builds the site from _config.yml and Markdown files
  • The site is free and publicly accessible at https://username.github.io/aws-consumer-complaint-intelligence/

Free Tier: GitHub Pages is free for public repositories.


Architecture Diagram

See architecture/aws_architecture_diagram.md for the full Mermaid diagram.


Estimated Monthly Cost (AWS Free Tier)

Service Usage Estimated Cost
Amazon S3 540 KB storage, ~50 requests/month $0.00 (Free Tier)
AWS Glue 1 crawler run $0.00 (Free Tier)
Amazon Athena ~10 queries × 540 KB scanned < $0.01
Amazon QuickSight 30-day trial $0.00 (trial)
GitHub Pages Static site hosting $0.00 (free)
Total   < $0.01/month

What Would Be Needed to Deploy

  1. An AWS account (free to create)
  2. AWS CLI configured with appropriate IAM permissions
  3. An S3 bucket created in your preferred region
  4. The complaints.csv uploaded to the S3 path
  5. Athena query result bucket configured
  6. Glue crawler run once to register the schema
  7. QuickSight account connected to Athena

All SQL files in the sql/ directory are ready to run in Athena once the table is created.