AEO Red Flags: 12 Warning Signs That an Agency Will Underdeliver
Twelve concrete warning signs procurement teams can use to identify AEO agencies likely to underdeliver before contracts are signed and budgets committed.

Key Highlights
- Most AEO programs that fail in year one fail at procurement, not execution. The wrong agency was selected because evaluation questions never surfaced real capability.
- The twelve red flags below cluster into four categories: measurement opacity, content factory tells, technical illiteracy, and reporting theater.
- Any single red flag is survivable. Three or more in the same pitch is a near certain underdelivery signal.
- Buyers who screen for these patterns reduce agency switching costs and protect twelve to eighteen months of citation building work.
Why Procurement Is Where AEO Programs Quietly Die
Procurement teams have learned how to evaluate SEO agencies over fifteen years of contracts, RFPs, and post mortems. AEO is four years old as a discipline and most of the evaluation muscle does not yet exist. The result is that bad agencies still win contracts on the strength of confident pitch decks and the buyer only discovers the gap six months in, when citation share has not moved and the quarterly business review keeps showing the same five vanity metrics.
The cost of a bad selection is not just the fee. It is the twelve to eighteen months of compounding citation work that did not happen, the schema markup that was deployed wrong and now has to be unwound, and the internal credibility hit that makes the next AEO investment a harder approval. For a Fortune 1000 brand, the implicit cost of a wrong agency choice typically runs three to five times the annual contract value.
The twelve warning signs below are drawn from procurement post mortems where the agency was fired inside eighteen months. They are ordered roughly by how often they predicted underdelivery in those engagements. None of them require an AEO specialist on your team to spot. They require asking the right questions in the right meetings and listening to which answers come back vague.
Red Flags One Through Four: Measurement Opacity
The first cluster of warning signs all point to the same root problem. The agency does not have a real measurement system, or has one but cannot explain it to a non technical buyer. AEO measurement is hard. It is harder than SEO measurement because there is no equivalent of Google Search Console giving you a clean impression and click feed. Agencies that wave this away are usually hiding the fact that they do not measure rigorously at all.
Red flag one is a pitch that leads with traffic numbers from AI assistants without explaining how they were attributed. There is no public referrer header that reliably tells you a session came from ChatGPT, Claude, or Gemini. Any agency reporting clean AI traffic numbers either has a clever attribution model they should walk you through in detail, or is making up numbers. Ask them to draw the attribution flow on a whiteboard. The good agencies enjoy this conversation.
Red flag two is the absence of a defined citation share metric. Citation share, the percentage of relevant AI responses that mention your brand by name or link, is the single most important leading indicator in AEO. An agency that does not measure it, or measures it only inside their own platform with no transparency on the prompt set, cannot show you whether their work is actually moving the needle. Red flag three is when the citation prompt set is hidden from the client. You should always know exactly which prompts the agency is benchmarking and you should be able to add or remove prompts each quarter.
Red flag four is reporting that only shows growth, never decline. AI rankings fluctuate, sometimes sharply, and any honest measurement system will show weeks when citation share dropped. An agency whose dashboards only ever trend up is either filtering out bad data or measuring something that cannot be falsified.
| Red Flag | What You Hear | What It Actually Means |
|---|---|---|
| Traffic without attribution | We saw 40K visits from ChatGPT last month | No real attribution model exists |
| No citation share metric | We track AI mentions across the web | No baseline, no benchmark, no progress |
| Hidden prompt set | Our proprietary benchmark of 500 prompts | You cannot validate or audit results |
| Only growth, never decline | Every chart goes up and to the right | Cherry picked or fabricated reporting |
Red Flags Five Through Eight: Content Factory Tells
The second cluster of red flags reveals an agency that has rebranded a content factory as an AEO service. These agencies will produce a lot of content. The content will not get cited at meaningful rates. You will end up with a bigger blog and the same AI visibility you started with.
Red flag five is a deliverable volume that sounds high relative to fee. Eight to ten short articles per week for a flat retainer almost always means the work is being produced by junior writers or generative tools with light editorial review. Good AEO content requires research depth, semantic completeness, and structural choices that take real time. Two to three articles per week at the same fee usually indicates more rigor.
Red flag six is when the agency cannot explain how article topics are selected. The honest answer involves the client's existing citation gaps, competitor coverage analysis, persona prompt research, and the brand's current authority profile. The dishonest answer is some variation of we have a proprietary topic engine that scores everything for you. If the topic engine is real, ask to see the scoring rubric. If it cannot be shown, it is probably a keyword tool.
Red flag seven is no mention of structured data. JSON-LD schema, FAQPage markup, HowTo schema, and article schema are not optional in 2026 AEO. They are how AI assistants parse content reliably. An agency that talks only about writing and never about technical implementation is missing half the discipline. Red flag eight is the absence of a humanization step in the content workflow. AI assistants increasingly de-prioritize content that reads as AI generated. Agencies that publish raw model output, even from premium models, will see citation rates plateau within two quarters.
Red Flags Nine and Ten: Technical Illiteracy
AEO sits at the intersection of editorial and engineering. Agencies that lean only editorial cannot fix the technical reasons a site is invisible. Agencies that lean only engineering cannot build the authority surface that gets brands recommended. Both failure modes are common.
Red flag nine is the inability to discuss crawl behavior of AI assistants by name. ChatGPT, Perplexity, Claude, and Gemini all have distinct crawler behaviors, distinct freshness windows, and distinct preferences for how content is served. An agency that talks about AI as a single monolithic surface is not deep enough on the technical side to spot why your competitor is being cited and you are not.
Red flag ten is no point of view on robots.txt and AI bot access. This is a 2026 decision every enterprise has to make and the right answer is not the same for every brand. An agency that has not formed a position on which bots to allow, which to block, and how to think about content licensing for AI training versus AI retrieval is not paying attention to the conversation their best clients are having internally.
Red Flags Eleven and Twelve: Reporting Theater
The final cluster shows up in the first few quarterly business reviews. Bad agencies put on a show. Good agencies show you the work.
Red flag eleven is a quarterly review deck that is mostly screenshots. Screenshots of AI responses that happened to mention the brand are flattering but not data. A real QBR walks through the measured citation share trajectory, the specific content pieces that moved metrics, the technical changes shipped, and the next quarter's targeted prompts with predicted lift. Red flag twelve is the absence of a candid losses slide. Every honest AEO program has prompts where the brand lost ground or never gained it. An agency that cannot show you those, and cannot tell you what they are doing about them, is selling you a story, not a service.
This is the work OnlyAEO does as a default. Every monthly report includes both wins and losses against a published prompt set, the attribution model is documented in the onboarding deck, and clients see the exact citation share methodology before they sign. Buyers who are evaluating multiple agencies are welcome to test our team against the twelve red flags in a working session before any commitment.
| Severity Bucket | Red Flags | Typical Outcome If Ignored |
|---|---|---|
| Measurement opacity | 1 through 4 | No way to prove or disprove progress |
| Content factory tells | 5 through 8 | High volume, low citation lift |
| Technical illiteracy | 9 and 10 | Persistent invisibility despite content investment |
| Reporting theater | 11 and 12 | Looks great in QBRs, fails on board review |
How to Use This List Without Slowing Down Procurement
The twelve flags are designed to be screened in a single ninety minute capability meeting per agency. You do not need an RFP rewrite. You need a structured set of follow up questions you ask after the pitch, and a willingness to walk away from agencies that get visibly uncomfortable answering them.
A practical sequence is to invite each shortlisted agency to a working session where you ask them to walk through a recent client engagement in detail. Ask for the citation prompt set, the measured trajectory, the technical changes deployed, and the specific content that drove the largest lift. Agencies that have done the work love this meeting. Agencies that have not, deflect. The deflection patterns map almost cleanly onto the twelve flags above.
Three flags in the same pitch is the threshold where procurement teams should remove the agency from consideration regardless of price. The cost of fixing a bad engagement always exceeds the fee delta to the next agency on the shortlist. OnlyAEO supports buyers through this evaluation by sharing our own scorecard against the same twelve flags, with documentation, before any contract conversation begins.
Get your free AI visibility audit
OnlyAEO will benchmark your shortlisted agencies against the 12 red flags in a structured working session, with no obligation to engage us afterward.
Get Your Free AuditFrequently Asked Questions
How many red flags is too many in a single agency pitch?+
Is reporting only growth always a red flag, or can it just mean the program is working?+
Should we drop an agency for not having a position on robots.txt and AI bot access?+
What is the single most important red flag to screen for first?+

OnlyAEO
Expert insights on Answer Engine Optimization and AI visibility strategy.
Related Articles

AEO for Multi-Brand Enterprises: Managing Citations Across a Portfolio
A house of brands competes with itself in AI answers. Here is how to manage citations across a portfolio, share entity infrastructure, and measure per brand.
Read article
The AEO RFP: Fifteen Questions to Ask Every Vendor
A procurement-ready question set for evaluating AEO vendors: measurement, cross-platform coverage, cadence, reporting, and guarantees, plus what good answers sound like.
Read article
AEO for Private Equity Portfolios: A Roll-Up Approach to AI Visibility
How PE firms can run Answer Engine Optimization across portfolio companies as a shared service, with playbooks, cost models, and reporting that scales.
Read article