AI-Assisted Analysis Validation Log
AI-Assisted Analysis Validation Log
This document records AI-generated insights from the original Project 2 analysis alongside their validation status. Each insight was tested against the actual complaints.csv dataset using Python and/or SQL. This process demonstrates the importance of validating AI-generated claims before including them in stakeholder reports.
Dataset: 542 CFPB complaints, North Carolina, Credit Card Fees & Interest, Jan 2024 – Apr 2026 Validation method: Python (pandas) and Athena-compatible SQL queries
Validation Summary
| # | AI-Generated Insight | Status | Evidence |
|---|---|---|---|
| 1 | Synchrony Financial is the top complaint source | ✅ Accepted | 123 complaints (22.7%) |
| 2 | “Problem with fees” dominates sub-issues | ✅ Accepted | 355 complaints (65.5%) |
| 3 | Companies respond timely nearly 100% of the time | ✅ Accepted | 99.6% timely rate |
| 4 | Most complaints are closed without monetary relief | ✅ Accepted | 63.3% closed with explanation only |
| 5 | Complaint volume spikes in early calendar year months | ⚠️ Partially Accepted | Jan 2025 (27) and Jan 2026 (27) are elevated, but not consistently the highest months |
| 6 | “Promotional rate” is a common narrative theme | ❌ Rejected | 0 occurrences in 337 narratives |
| 7 | Consumers frequently mention “APR” in narratives | ❌ Rejected | Only 14 occurrences (4.2% of narratives) |
| 8 | Monetary relief is rare — under 10% of complaints | ❌ Rejected | 169 complaints (31.2%) received monetary relief |
Detailed Validation Records
Insight 1: Synchrony Financial is the top complaint source
AI-generated claim: “Synchrony Financial generates the most consumer complaints in this dataset.”
Validation method: Python value_counts() on the Company column.
Result:
SYNCHRONY FINANCIAL 123
CITIBANK, N.A. 71
CAPITAL ONE FINANCIAL CORPORATION 67
Bread Financial Holdings, Inc. 67
Status: ✅ Accepted Synchrony Financial leads with 123 complaints, more than 1.7× the next closest company. The claim is accurate.
Insight 2: “Problem with fees” dominates sub-issues
AI-generated claim: “Fee-related complaints are the most common sub-issue by a wide margin.”
Validation method: Python value_counts() on the Sub-issue column.
Result:
Problem with fees 355 (65.5%)
Charged too much interest 148 (27.3%)
Unexpected increase in interest rate 39 (7.2%)
Status: ✅ Accepted “Problem with fees” accounts for nearly two-thirds of all complaints. The claim is accurate.
Insight 3: Companies respond timely nearly 100% of the time
AI-generated claim: “Financial institutions in this dataset are highly compliant with CFPB response deadlines.”
Validation method: Python boolean count on Timely response? column.
Result:
Yes: 540 (99.6%)
No: 2 (0.4%)
Status: ✅ Accepted The 99.6% timely response rate confirms the claim. Note: timely response measures deadline compliance, not consumer satisfaction.
Insight 4: Most complaints are closed without monetary relief
AI-generated claim: “The majority of complaints are resolved with an explanation rather than financial compensation.”
Validation method: Python value_counts() on Company response to consumer.
Result:
Closed with explanation 343 (63.3%)
Closed with monetary relief 169 (31.2%)
Closed with non-monetary relief 20 (3.7%)
In progress 10 (1.8%)
Status: ✅ Accepted 63.3% of complaints are closed with explanation only. The claim is accurate.
Insight 5: Complaint volume spikes in early calendar year months
AI-generated claim: “Complaints tend to increase at the start of the year, possibly tied to year-end billing cycles.”
Validation method: Python monthly groupby on Date received.
Result (January months):
2024-01: 18
2025-01: 27
2026-01: 27
Highest months overall:
2026-03: 38 (highest)
2025-06: 25
2025-01: 27
2026-01: 27
Status: ⚠️ Partially Accepted January months do show elevated volumes in 2025 and 2026, but the single highest month is March 2026 (38 complaints). The seasonal pattern is suggestive but not conclusive with only 28 months of data. The insight is directionally reasonable but should not be stated as a confirmed pattern.
Insight 6: “Promotional rate” is a common narrative theme
AI-generated claim: “Consumers frequently mention promotional rates in their complaint narratives.”
Validation method: Python string search across 337 narratives for the term “promotional rate”.
Result:
"promotional rate": 0 occurrences
Status: ❌ Rejected Zero narratives mention “promotional rate.” This term does not appear in the dataset. The insight was likely generated based on general knowledge of credit card complaints rather than this specific dataset.
Insight 7: Consumers frequently mention “APR” in narratives
AI-generated claim: “APR is a commonly referenced term in consumer complaint narratives.”
Validation method: Python string search across 337 narratives for the term “apr” (case-insensitive).
Result:
"apr": 14 occurrences (4.2% of narratives with text)
Status: ❌ Rejected “APR” appears in only 14 of 337 narratives (4.2%). Consumers in this dataset tend to use plain language (“fee”, “interest”, “late fee”) rather than financial terminology. The claim overstates the frequency of this term.
Insight 8: Monetary relief is rare — under 10% of complaints
AI-generated claim: “Very few consumers receive monetary relief from their complaints.”
Validation method: Python value_counts() on Company response to consumer.
Result:
Closed with monetary relief: 169 (31.2%)
Status: ❌ Rejected 31.2% of complaints resulted in monetary relief — nearly one in three. This is not “rare” by any reasonable definition. The AI-generated claim significantly underestimated the monetary relief rate. This is a meaningful finding: credit card fee and interest complaints in North Carolina have a relatively high monetary relief rate compared to other complaint categories.
Key Takeaway
This validation exercise demonstrates that AI-generated insights can be directionally useful but require verification against actual data before inclusion in any report or dashboard. Three of eight insights were rejected, and one required qualification. The validation process is a core component of responsible data analytics practice.