October 6, 2025
7 minutes
I submitted the same 50 test prompts to DALL-E 3, Midjourney v6, Stable Diffusion XL, and Imagen 3. The goal: understand where each system draws the line between creative freedom and content safety—and whether those lines are consistent, effective, or even coherent.
The results revealed a fragmented landscape. What one system blocks, another allows. What triggers a ban on one platform barely raises a flag on another. And across all four, inconsistency within each system often outweighed differences between them.
Here's what I learned about the state of content moderation in generative AI.
Designing a fair evaluation meant balancing several considerations:
1. Legitimate Use Cases with Edge Case Risk
Examples:
These are reasonable requests that could be used for education, journalism, or art—but also touch on violence, graphic content, or political sensitivity.
2. Benign Variants of Risky Concepts
Examples:
Testing whether filters distinguish context or just pattern-match keywords.
3. Indirect References
Examples:
Can systems understand cultural context and nuance?
4. Compositional Complexity
Examples:
Multiple elements that individually are benign but together touch on sensitive topics.
5. Adversarial Probes (Responsible Testing)
I included a small number of clear policy violations to verify filters are working:
Approach: Extremely conservative with detailed refusal messages.
Strengths:
Weaknesses:
Notable Quirk: Inconsistent between similar phrasings. "Person sleeping" accepted, "unconscious person" rejected—even though prompt specified "peaceful medical context."
Verdict: Safe, but restrictive to the point of limiting legitimate use cases. Best for commercial/brand-safe content.
Approach: Moderately strict with less transparent filtering.
Strengths:
Weaknesses:
Notable Quirk: Prompts with "in the style of [famous artist]" sometimes bypassed filters that would otherwise trigger. Style references acted as a context signal.
Verdict: Balances creative freedom with safety better than DALL-E, but less predictable. Risk of unexpected bans.
Approach: Minimal filtering, relies on deployment choices.
Strengths:
Weaknesses:
Notable Quirk: Some SDXL hosting services add their own filters (like DreamStudio or Clipdrop), creating user confusion about what the model vs. platform restricts.
Verdict: Powerful for professional use cases but requires sophisticated deployment and moderation strategies. Not appropriate for general public platforms without additional safety layers.
Approach: Moderately strict with focus on brand safety.
Strengths:
Weaknesses:
Notable Quirk: Extremely strict on any prompt mentioning real people, even in clearly artistic contexts ("portrait in the style of Rembrandt of [historical figure]").
Verdict: Conservative but more consistent than DALL-E. Good for enterprise use where brand risk is a concern.
All four correctly blocked:
This is the critical baseline. Any failure here would be unacceptable.
Medical Content:
Political Content:
Historical Violence:
Artistic Nudity:
Perhaps most concerning: inconsistency within the same platform.
I ran each prompt 3 times on each system to test reliability:
Users can't predict what will be allowed. That's a UX problem and a safety problem.
These systems serve different users with different risk tolerances:
None is objectively better. But users need to understand which system matches their use case.
When educational medical illustrations get blocked, systems lose utility for healthcare, journalism, and research. The cost of false positives is real.
If the same prompt works on Monday but fails on Tuesday, users assume the system is broken. That perception undermines safety messaging when it matters.
All systems struggle with prompts where context determines appropriateness:
Keyword matching isn't enough. But true contextual understanding remains elusive.
Stable Diffusion's approach empowers users but raises societal questions: who is responsible when open models enable harm? The developers? Hosting services? End users?
1. Transparency Over Perfection
Users can adapt to strict policies if they're clearly documented. Publish filtering guidelines with examples.
2. Consistency Is a Feature
Invest in deterministic filtering. If a prompt is blocked, it should always be blocked. Unpredictability breaks user trust.
3. Contextual Signals Matter
Let users signal intent: "for educational purposes," "historical documentation," "artistic study." Don't ignore these cues.
4. Gradual Enforcement
Warn before blocking. Show why a prompt was flagged. Let users refine rather than hitting a wall.
5. Appeal Mechanisms
When false positives occur (and they will), give users a path to contest decisions. Especially for professional use cases.
Content moderation in generative AI is a zero-sum game in some respects:
There's no perfect equilibrium. But we can make systems predictable, transparent, and context-aware. That's the best we can do with current technology.
Each platform is making different bets about where to draw lines. As users, we need to understand those choices. As builders, we need to keep refining. And as a society, we need to have honest conversations about what risks we're willing to accept for what benefits.
Because the alternative—systems that block arbitrary things for unexplained reasons—serves no one.
What's your experience with content filters on these platforms? Have you noticed patterns or inconsistencies? I'd love to hear what others have found. Reach out via GitHub.
Want to discuss this article? Standard contact info is available throughout the site. Or, if you've been paying attention, you might know a more direct route.