What Matters
- -Ask for production deployments, not demos. Any vendor can build an impressive demo. The question is how many AI systems they've shipped to real users who depend on them daily.
- -Pricing structure tells you everything. Fixed-scope, fixed-price vendors have skin in the game. Time-and-materials vendors can blame scope for overruns indefinitely.
- -Data handling is non-negotiable. Understand exactly where your data goes, who can access it, and what happens to it after the engagement ends.
- -The team question is critical. Who specifically builds your project? Junior developers working under a senior who reviewed the proposal once is not the same team that gave you the demo.
- -Post-launch support is where the relationship either holds or breaks. Get support terms in writing before you sign.
Three months into an AI project with the wrong vendor is an expensive education. The vendor who gave the best demo isn't always the vendor who ships working software. The proposal that looked most thorough isn't always written by the team who builds your product.
Before you sign with an AI development vendor -- any vendor, including us -- ask these 20 questions. The answers reveal more than any sales presentation.
Category 1: Production Track Record
1. How many AI systems have you shipped to production?
Not demos. Not proof-of-concepts. Not internal tools. Systems that real users depend on daily, where a failure has business consequences.
Any agency can build an impressive demo in 48 hours. The hard part is getting from demo to production -- handling edge cases, building the monitoring infrastructure, surviving the first month of real usage, and maintaining the system when the underlying models update. Gartner found in 2024 that 30% of generative AI projects get abandoned after proof of concept - most because the vendor couldn't close that gap.
Ask for a number, then ask to see references.
2. Can you share three client references with similar AI projects in production for 6+ months?
References matter most when the project type matches yours. Ask specifically for:
- Similar industry or use case
- Projects that have been running for at least 6 months (enough time for early problems to surface)
- Contacts you can call, not just written testimonials
When you speak to references, ask: "What went wrong and how did they handle it?" The vendor's response to problems tells you more than their response to success.
3. What AI projects have you built in my industry?
Domain expertise matters in AI development. Medical records processing has different compliance requirements than e-commerce recommendation engines. Financial fraud detection has different model requirements than customer service chatbots.
A vendor who has built AI in your industry has already solved the domain-specific problems you'll hit. A vendor building in your industry for the first time will solve those problems on your dime.
4. What is the most complex AI failure you've had in production and how did you fix it?
This question reveals two things: their honesty (every real AI project has failures) and their operational depth (how they monitor, diagnose, and fix production problems).
A vendor who says "we've never had a failure" is either inexperienced or not being straight with you. AI systems fail in interesting and non-obvious ways -- model drift, edge-case inputs, upstream data changes, API changes from model providers. Good vendors have war stories.
Category 2: Team and Process
5. Who specifically will work on my project?
There's a common pattern in agencies: senior people give the sales presentation, junior people do the work. Ask for the actual team: names, roles, and AI-specific experience. Then verify them.
Check LinkedIn profiles for the engineers listed. How long have they been doing AI/ML work? What do their public projects and writing look like? Have they built the type of system you need?
6. Will the people who gave this presentation be working on my project?
Follow-up to the above. If the founders or senior engineers gave the demo but will "oversee" the project while others execute, understand what that means in practice. Weekly reviews? Full code review? Or quarterly check-ins?
There's nothing wrong with senior oversight plus junior execution -- it's how most agencies work. But you need to know which model you're buying.
7. How do you scope AI projects and what happens when scope changes?
Scope management is where AI projects go wrong more than anywhere else. AI development has genuine uncertainty -- model performance might require architectural changes, data quality issues might require additional processing steps, integration complexity might be higher than estimated.
How does the vendor handle this?
-
Fixed-scope, fixed-price: They commit to a defined set of deliverables at a defined cost. If scope genuinely expands, it's a conversation with a new estimate. They have skin in the game to scope accurately upfront.
-
Time-and-materials: You pay for hours. Scope can expand indefinitely. Every estimation miss is your problem.
Fixed-scope isn't always possible for research-heavy AI work. But for known deliverables (build a chatbot with these capabilities, integrate these data sources, deploy to this environment), fixed-scope is better for you.
8. How do you test AI systems before delivery?
Testing AI is fundamentally different from testing traditional software. There's no "expected output" you can hard-code for an LLM response. Model behavior is probabilistic.
Ask what their AI testing process looks like:
- How do they build and maintain evaluation datasets?
- How do they measure model performance against the metrics that matter for your use case?
- How do they test edge cases and adversarial inputs?
- What does their staging environment look like?
A vendor without a clear answer to these questions has not thought seriously about quality.
9. What does the handoff process look like?
At the end of the engagement, what do you receive? Source code (obviously), but also:
- Infrastructure-as-code (so you can rebuild the deployment)
- Documentation that your team can actually use
- Runbooks for common operational tasks (restarting services, responding to model drift)
- A knowledge transfer session with your internal team
The difference between "we handed over the code" and "your team can independently operate this system" is significant. Ask which one they deliver.
Category 3: Data and Security
10. Where does my data go during the project?
In AI development, your data is sent to model providers for training, fine-tuning, or inference. Each hop is a security boundary.
Get specific answers to:
- Which model providers will be used? (OpenAI, Anthropic, Google, Meta open-source, other?)
- Does your data go through the vendor's own infrastructure, or directly to model providers?
- What are the data retention policies at each model provider?
- Is there a data processing agreement?
For regulated industries (healthcare, finance, legal), this isn't a nice-to-have -- it's a compliance requirement. If the vendor can't answer clearly, they haven't done regulated industry work.
11. Who can access my data at the vendor side?
Which of their employees or contractors can see your data? Under what circumstances? What access controls are in place?
This matters especially for sensitive business data (customer records, financial data, proprietary formulas, competitive strategy). You're entitled to know exactly who has access and why.
12. Who owns the code and models after delivery?
This sounds obvious but often isn't. Get clarity on:
- Source code: You own it. Full stop. No licensing back to the vendor.
- Fine-tuned models: If the vendor fine-tunes a base model on your data, who owns that model? If they walk away, can you keep running the model?
- Proprietary components: Some vendors include proprietary frameworks or libraries. Understand if you're licensing those ongoing or receiving a full source code transfer.
- Training data: If your data was used to fine-tune a model, can the vendor use that model or those learnings on other client projects?
Get the IP ownership terms in writing, not just verbal assurance.
Category 4: Pricing and Commercial Terms
13. What is the full pricing model and what is and isn't included?
The proposal price is rarely the final cost. Ask specifically what's excluded:
- Infrastructure costs during development (who pays for GPU compute, model API costs, staging environments?)
- Third-party tool and API costs
- Post-launch support and maintenance
- Model API costs when the system goes to production
- Future model updates when the underlying LLM versions change
"$80K for the build" sounds different when you add $15K in model API costs during development and $3K/month in ongoing inference costs.
14. What triggers a change order?
Every project has changes. How the vendor handles them defines your commercial relationship.
Good answer: "We scope tightly upfront to minimize changes. Genuine scope expansion (features you ask for that weren't in the original spec) gets a new estimate and your approval before any work starts."
Warning sign: Vague answer, or "we'll handle it as they come up." This usually means informal scope creep billed at high hourly rates.
15. What does post-launch support cost?
AI systems require ongoing maintenance: monitoring for model drift, updating integrations when APIs change, handling edge cases that surface in production, and performance optimization.
Get specific numbers:
- What's included in the project price?
- What does ongoing support cost per month?
- What's the SLA for production issues?
- Is there a retainer model or break-fix pricing?
The vendor who ghosts you three weeks after launch is more common than you'd hope. Support terms in writing, before you sign.
Category 5: Red Flags
"The failure question is the one that separates vendors who've shipped real systems from those who've only shipped demos. Every production AI system breaks eventually - what matters is whether the team has a plan for when it does." - Ashit Vora, Captain at 1Raft
16. Can you describe how your AI handles failure cases?
Every AI system fails sometimes. LLMs hallucinate. Classifiers make wrong predictions. Models encounter input types they haven't seen.
Ask: "When your AI is wrong, what does the user experience? How does the system degrade gracefully?"
Good AI systems fail safely: returning to a human workflow, showing confidence scores so users know when to double-check, or presenting an "I don't know" rather than confident wrong answers.
Vendors who haven't thought about this haven't shipped AI to production users.
17. Why do you recommend [specific model/approach] for my use case?
Ask them to justify their technical recommendation. A vendor suggesting a custom fine-tuned model when a well-prompted base model would perform comparably is either over-engineering your project or running up the bill.
If they can't explain the trade-offs between their recommended approach and simpler alternatives, they may not understand the trade-offs themselves -- or they're optimizing for scope, not your outcome.
18. What happens if the AI doesn't meet the performance targets?
Before starting, you should agree on measurable success criteria (accuracy rate, task completion rate, response time, cost per query). Then ask: what happens if the system doesn't hit those targets by launch?
Does the vendor commit to iterating until it does? Refund a portion? Define "good enough" as whatever ships?
The answer to this question reveals the vendor's confidence in their approach and their commitment to your outcome.
19. Have you built this exact type of system before?
If your project is a RAG-based document search system, ask if they've built RAG systems before. If it's a real-time fraud detection model, ask how many real-time ML pipelines they've deployed.
Vendors who pivot from their actual experience ("we've built similar things") or talk around the question ("our team is very adaptable") are telling you something.
Adjacent experience is fine. No experience is a problem. Know which one you're getting.
20. What questions should I be asking that I'm not?
This one is the most revealing of all.
The best vendors know things about your problem that you don't. A vendor who deeply understands your use case should be able to identify risks or questions you haven't thought to ask.
If they answer with "no, you've covered everything" or a deflection, you're talking to someone in sales mode, not problem-solving mode. The vendors who push back with "actually, the bigger question is..." are the ones who've been in the trenches with this work.
What to Do With the Answers
Score each vendor on a simple 1-3 scale across the five categories:
- Production track record
- Team and process
- Data and security
- Commercial terms
- Red flags
Vendors who score poorly in any category don't just have a weakness in that area -- they're showing you something about how they operate. A vendor who can't answer data security questions hasn't built for regulated industries. A vendor with a vague scope change process has had ugly client disputes and learned nothing from them.
The best AI development partner is honest about what they don't know, specific about what they do know, and more focused on your outcome than closing the deal.
Related reading: How to Choose an AI Development Partner -- the full framework for evaluating AI vendors. AI Development Company vs. Freelancer -- when to use which.
Frequently asked questions
The five most important: (1) How many AI systems have you shipped to production (not just demos)? (2) Who specifically will work on my project and what is their AI experience? (3) What is your pricing model and what happens if scope changes? (4) How do you handle my data and who can access it? (5) What does post-launch support look like and what is included? These questions reveal whether you're working with a proven vendor or an agency riding the AI hype wave.
Related Articles
How to Choose an AI Development Partner
Read articleHow to Evaluate AI Vendors
Read articleAI Development Company vs. Freelancer
Read articleIn-House vs. Outsource AI Development
Read articleFurther Reading
Related posts

Top AI Consulting Firms in 2026: Big Four vs. Boutique
Most AI consulting engagements end with a strategy deck that never ships. Here is when you actually need a Big Four firm - and when a boutique studio gets you to production 10x faster.

Build vs Buy AI: A Decision Framework for Product Teams
75% of AI use cases run on vendor products. The 25% companies build custom deliver the deepest moats. Here's the framework for deciding which bet to make.

Why 80% of AI Projects Fail (and How to Beat the Odds)
85% of AI projects fail - not from bad algorithms, but from five predictable implementation mistakes that every organization makes. Here is how to be in the 15% that succeeds.
