Nano Banana, Flux, and the AI Image Models Powering Personalised Art
If you've spent any time in AI circles recently, you've come across names like Flux, Stable Diffusion, DALL-E, Midjourney, Gemini, and a dozen other models competing for the title of "best AI image generator." And if you've dug a little deeper into the fal.ai ecosystem, you might have encountered something called nano banana.
This article explains what these models are, how they differ, and — more concretely — how they power products like MyComicGift to turn a written story into a personalised illustrated comic in under two minutes.
The AI Image Generation Explosion
In 2022, the release of Stable Diffusion changed the landscape of AI image generation by making a high-quality open-source model publicly available. What had been the domain of research labs became accessible to developers, hobbyists, and product builders overnight.
Since then, the field has moved extremely fast:
- DALL-E 3 (OpenAI) launched with much better prompt adherence than its predecessors and integrated directly into ChatGPT
- Midjourney iterated rapidly through versions, developing a distinctive aesthetic that made it popular for creative and commercial work
- Stable Diffusion XL, then Stable Diffusion 3, pushed the open-source baseline higher
- Flux (Black Forest Labs) — founded by core contributors to the Stable Diffusion project — released a family of models that benchmarked strongly on quality and prompt following
- Gemini (Google) brought multimodal image generation into the Google ecosystem, integrated with Google's broader AI infrastructure
- Adobe Firefly targeted the commercial creative market with a model trained on licensed content
- Ideogram developed notably strong text-rendering capabilities within generated images
Each of these models has different strengths, different licensing terms, different aesthetics, and different API accessibility. The model choice matters enormously for any specific application.
What These Models Actually Do
All of the major current models are diffusion models — they work by learning to reverse a process of adding noise to images. During training, the model sees millions of image-text pairs and learns associations between visual concepts and language descriptions.
At generation time, the model starts from random noise and iteratively refines it toward an image that matches the text prompt. The result is an image that didn't exist before, synthesised from the patterns the model learned during training.
What's changed in recent generations:
Prompt adherence. Early models would loosely interpret prompts. Modern models like Flux and DALL-E 3 follow specific instructions closely — generating a character in a specific setting doing a specific action, not just a vague interpretation.
Consistency. Generating the same character twice and having them look like the same person used to be nearly impossible. Newer techniques (IP-Adapter, ControlNet, and others) make this tractable.
Style control. Fine-tuning methods like LoRA allow a model to be specialised on a specific artistic style so consistently that every output reliably looks like it came from the same artist.
The Models Everyone Is Talking About
A brief honest assessment of the major players:
Flux (Black Forest Labs / fal.ai): Currently one of the strongest models for illustration and non-photorealistic work. Excellent prompt adherence, good character consistency, handles flat-colour illustration styles particularly well. Available via the fal.ai API.
DALL-E 3 (OpenAI): Excellent at following specific compositional prompts. Strong for single images. Less suited to maintaining character consistency across a sequence of images.
Midjourney: Produces beautiful outputs, often with a distinctive aesthetic quality. Less controllable via API, less suited to production pipelines that need consistent, repeatable style application.
Stable Diffusion 3 / SDXL: Hugely flexible via the open-source ecosystem. Quality in the best fine-tuned models is excellent. More technically demanding to deploy at production quality.
Gemini (Google): Google's multimodal model handles image generation as part of a broader capability set. Strong general-purpose performance, integrated with Google's infrastructure. Less specialised for illustration use cases than some alternatives.
Adobe Firefly: Trained on licensed Adobe Stock content, making it legally clean for commercial use. Aesthetically conservative — good for product photography and commercial illustration, less interesting for creative/distinctive styles.
Ideogram: Strongest on text-within-image tasks. Less specialised for the character consistency demands of comic illustration.
What Is Nano Banana?
Nano banana is a model available on the fal.ai platform — a fine-tuned variant built on top of one of the base diffusion architectures (Flux-based), specialised for a specific illustration style.
The "nano banana" naming follows fal.ai's convention of giving distinct names to fine-tuned model variants. It's not a different kind of AI — it's a specialised version of an existing model, trained further on a specific aesthetic to make it reliably produce that style on demand.
Fine-tuned models like nano banana are interesting because of what they trade away in exchange for consistency. A general model can produce almost anything. A fine-tuned model produces one thing very, very well. For applications where the style is the product — like a personalised comic that needs to look like a professional ligne claire illustration every time — this is exactly the right tradeoff.

Fine-Tuning: How Models Learn a Specific Style
Fine-tuning is the process of training an already-capable model on a specific dataset to specialise its outputs. The techniques used vary by approach:
LoRA (Low-Rank Adaptation): Trains a small set of additional weights that modify the model's behaviour without retraining the whole model. Efficient and widely used. Produces consistent style application when trained well.
DreamBooth: Specialises a model on a specific subject (a person's face, a specific character, a specific object) so it can reliably generate that subject in new contexts.
IP-Adapter: Allows an image to be used as a visual reference during generation — "generate a character who looks like this photo, in this style." The reference image guides the character's appearance without being copied directly.
These techniques combine to make possible what would have seemed like science fiction five years ago: provide a photo of a real person, describe a story, and receive a nine-panel illustrated comic featuring that person consistently across every panel, in a specific illustration style, in under two minutes.
The Hardest Problem: Consistent Characters
Any developer who has tried to build a product on top of AI image generation runs into the same wall: getting the same character to look like the same person across multiple images.
Single-image AI generation has been impressive for years. Multi-image character consistency is where products fail or succeed.
The approaches that work:
Image conditioning at generation time. Rather than just describing the character in text, providing the reference photo to every generation request via IP-Adapter or a similar technique. The model sees the reference and anchors the character appearance to it.
Character embedding. Creating a persistent representation of the character that's applied consistently across all panel generations, rather than re-deriving the character from scratch for each panel.
Iterative generation with feedback. Generating each panel with awareness of what came before — not fully independent generations, but a sequence where later panels respect the visual decisions made in earlier ones.
The best current implementations get character consistency right for most face types and most panel compositions. Complex scenes with unusual angles or heavy occlusion remain harder, which is why regenerating individual panels is a useful feature.
From Model to Personalised Gift
The gap between "AI model that generates images" and "personalised gift product" is larger than it looks from the outside.
The model handles the image generation. Everything else is product:
- A language model that expands a short story brief into detailed, panel-specific image prompts
- A story structure that gives the nine panels a narrative arc (not just nine similar images)
- A character pipeline that establishes visual consistency from the reference photo
- An editing interface that lets users regenerate specific panels without losing overall coherence
- An output pipeline that produces print-resolution files rather than web-resolution previews
- A style that's been chosen specifically because it produces beautiful, frameable results — not because it's the default output of any available model

What This Means for You
You don't need to know what nano banana is, or understand the difference between LoRA and DreamBooth, to use MyComicGift or appreciate the output.
But it's worth knowing that behind the two-minute generation experience is a reasonably sophisticated stack: a fine-tuned image generation model, a character consistency pipeline, a language model for story structure, and a set of deliberate style choices that add up to something that looks and feels like professional illustration.
The tools are doing real work. The choice of which tools — and how they're configured — is what separates outputs that look like impressive AI art from outputs that look like something genuinely beautiful and gift-worthy.
For the best character likeness, upload a clear, well-lit face photo when creating your comic. The image conditioning technique works best with high-quality reference photos where the face is clearly visible and front-facing or at a gentle three-quarter angle.
See what the models produce from your story
A personalised comic in under 2 minutes. First preview is free.
Try it nowRelated reading: how AI image generators are transforming personalised gifts and the broader guide to AI-powered gifts.
