Limitations

This document describes the known limitations of the dataset, analysis methodology, and project scope. Acknowledging limitations is a standard practice in data analytics and demonstrates analytical maturity.

Dataset Limitations

1. Complaint-Based Data Only — No Resolution Quality Measurement

The dataset records whether a complaint was closed and how (explanation, monetary relief, etc.), but does not capture whether the consumer was satisfied with the outcome. A complaint “closed with explanation” could represent a legitimate resolution or a dismissive response. The CFPB discontinued the consumer dispute tracking field in 2017, so the Consumer disputed? column is uniformly N/A in this dataset. This means we cannot measure consumer satisfaction or resolution quality.

2. Masked ZIP Codes Limit Geographic Analysis

106 of 542 records (19.6%) have partially or fully masked ZIP codes (e.g., 283XX, XXXXX). This masking is applied by the CFPB to protect consumer privacy. As a result, geographic analysis is incomplete — the ZIP-level complaint concentration analysis covers only 436 records and may not accurately represent the true geographic distribution of complaints across North Carolina.

3. Narrative Availability Bias

Only 337 of 542 complaints (62.2%) include a consumer narrative. The 205 complaints without narratives were submitted without consumer consent to publish, or the consumer chose not to provide a narrative. This means the keyword theme analysis (fee, interest, payment, etc.) is based on a subset of complaints and may not represent the full complaint population. Complaints without narratives may have different characteristics than those with narratives.

4. Single State, Single Issue Category

This dataset is filtered to North Carolina and the “Fees or interest” issue category only. Findings cannot be generalized to other states, other issue categories (e.g., billing disputes, fraud), or the national credit card complaint landscape. Synchrony Financial’s dominance in this dataset, for example, may reflect North Carolina-specific market share rather than a national pattern.

5. Temporal Incompleteness for April 2026

The dataset ends in April 2026, but the April 2026 data is incomplete (only 9 complaints recorded). This partial month should not be compared directly to full months when analyzing trends. The March 2026 spike (38 complaints) may appear more dramatic in context because it is followed by an incomplete month.

6. No Dollar Amount Data

While some complaint narratives mention specific dollar amounts, these are redacted by the CFPB (shown as {$X.XX}). The dataset does not include the actual financial amounts disputed, so we cannot quantify the total consumer financial harm or the average monetary relief amount.

7. Self-Reported Data

CFPB complaints are self-reported by consumers. The dataset reflects consumer perception of wrongdoing, not verified instances of regulatory violations. A company may have a high complaint count because it has a large customer base in North Carolina, not necessarily because it engages in worse practices than competitors.

8. AWS Architecture Not Deployed

The AWS pipeline described in this project (S3 → Glue → Athena → QuickSight) is a designed architecture and has not been deployed. The SQL files are written in Athena-compatible syntax and have been reviewed for correctness, but they have not been executed against a live Athena environment. Results shown in this project are from local Python analysis only.