20 May 2026

For years, the hardest problem in AI image generation wasn't quality. It was consistency. You could generate a beautiful character in one prompt, then ask for "the same character, walking down a street," and the model would hand you a different person.
That problem is mostly solved now. Nano Banana — Google's Gemini 2.5 Flash Image, and the newer Nano Banana 2 (Gemini 3.1 Flash Image, released Feb 2026) — can hold a character's face, outfit, and proportions across multiple edits and scenes. Per Google's launch announcement, Nano Banana 2 maintains the resemblance of up to five characters and the fidelity of up to 14 objects in a single workflow. That's not a marketing line. In practice, it's the reason this model is becoming the default for comics, storyboards, product photography and virtual try-on.
This post is the practical version. Twelve prompts you can copy, the anatomy of what works, and an honest list of where the model still misses. If you've been pulling your hair out over Midjourney character drift, this is the guide.
When prompters say "character consistency," they usually mean one of three different things:
Same face, same hair, same age across scenes.
Same outfit, same accessories, same colors.
Same art style, same rendering, same vibe.
The hardest of the three is identity. Diffusion models (Stable Diffusion, Midjourney, DALL-E) re-roll the latent on every generation, which means subtle face features drift unless you train a LoRA, supply embeddings, or stack reference-image conditioning. That's a lot of setup for "draw my character one more time."
Nano Banana approaches the problem differently. Because it's a native multimodal model — image generation built into a language model rather than bolted on — it can take an existing image as input and edit it rather than redraw it. Same face, same outfit, new pose. That single architectural choice is why it wins on consistency.
Every prompt below assumes you've already uploaded a reference image of your character. The first prompt creates the character; the rest hold it consistent across scenes, outfits, and angles.
Use this first. It creates your reference image.
"Generate a portrait of a woman in her early 30s, with shoulder-length wavy dark brown hair, hazel eyes, a small scar above her left eyebrow, wearing a cream linen shirt and gold hoop earrings. Soft natural light from the left. Neutral grey background. Photorealistic, magazine portrait quality, 3:2 aspect ratio."
Save this image. Every prompt below starts with this image uploaded as a reference.
"Same character as the reference image. Now show her standing in front of a window, holding a cup of coffee, looking out. Same face, same hair, same earrings, same cream linen shirt. Soft natural morning light. Photorealistic."
"Same character as the reference. Now show her wearing a charcoal grey wool blazer over a white t-shirt, walking down a city street. Same face, same hair, same scar above the left eyebrow. Late afternoon light. Cinematic.”
"Same character as the reference. Show her in profile, looking to the right, against a soft white background. Same face shape, same hair, same earrings. Portrait lighting. Photorealistic."
"Same character as the reference. Place her at a wooden kitchen counter, slicing an apple. Same cream linen shirt, same face, same hair, same earrings. Warm afternoon light from a window behind her. Lifestyle photography style."
"Same character as the reference, now aged five years older. Subtle laugh lines, slightly longer hair. Same hazel eyes, same scar above the left eyebrow, same general bone structure. Wearing a navy crewneck sweater. Same overall identity. Photorealistic portrait."
Establish a second reference (call it Character B) before running this.
"Show Character A from the first reference image and Character B from the second reference image standing next to each other, having a casual conversation in a sunlit café. Maintain both faces exactly as in the references. Same outfits as in the references. Natural ambient light. Photorealistic."
"Same character as the reference image, redrawn in the style of a Studio Ghibli animation cel. Maintain her recognizable features — wavy dark brown hair, hazel eyes, small scar above the left eyebrow, cream linen shirt, gold hoop earrings. 2D animation style."
"Same character as the reference image, holding a glass perfume bottle in her right hand at chest height. Soft beauty lighting, white seamless background. Same face, same hair, same earrings. Editorial product photography style."
"Generate a 3-panel storyboard with the same character from the reference image. Panel 1: she enters a coffee shop. Panel 2: she orders at the counter. Panel 3: she sits down at a window seat with her coffee. Same face, same outfit, same earrings in every panel. Cinematic style. 16:9 each panel."
"Same character as the reference. Show her wearing five different outfits in a 5-panel layout: (1) a black cocktail dress, (2) a casual white sundress, (3) a tailored grey suit, (4) a denim jacket over a white tee, (5) a cream cable-knit sweater. Same face in every panel. Same hair. Same earrings. Photorealistic."
"Same character as the reference, shown in two side-by-side images. Left: the original reference photo. Right: the same person in three-quarter view turning her head to look toward the camera. Same face, same hair, same lighting style. This is a continuity test — the two images should be unmistakably the same person."
That last one is the test. If prompt 12 returns two people who could plausibly be the same person across both panels, you've got a working consistency setup. If they look like sisters, your reference photo is too low-resolution or too ambiguous — go back to prompt 1 and add more specific identifying details (scar, freckles, mole, distinctive earrings, etc.).
Every prompt above has the same five anchors. Memorize these.
Always start with "Same character as the reference image." Not "the same character" or "this person." The model parses the explicit reference call more reliably.
Repeat the 3–4 most distinctive features in every prompt: hair color and length, eye color, a unique mark (scar, freckle, mole), and one accessory the character always wears. The repetition is what locks identity.
Either "same outfit as the reference" (locks wardrobe) or "now wearing X" (changes wardrobe while identity is held by anchor 2). Decide which one applies and be explicit.
What's happening, where, when. Be specific about lighting — "morning light from the left" beats "natural light."
Photorealistic, editorial, cinematic, 2D animation, magazine portrait — set the rendering style explicitly. Don't assume the model will match the reference's style by default.
A useful frame: the reference image carries identity, the prompt carries everything else. The more you let the reference do the work of identity, the more the prompt can change.
Three reasons, in order of importance.
Diffusion-based models add image conditioning on top of a generative process. Nano Banana operates over the reference image directly. The result is dramatically lower identity drift across iterations.
Because Nano Banana is built on Gemini, it understands what "the same person" means semantically, not just visually. Ask it for "the same character five years older" and it knows what aging looks like — Stable Diffusion just averages pixels.
Nano Banana 2 holds up to 5 characters and 14 objects across a single workflow. That's enough budget to maintain a protagonist, a sidekick, and the entire wardrobe across 10+ panels.
The combination is why Nano Banana is rapidly becoming the default for virtual try-on workflows, product photography, and any task where the same subject needs to show up twice.
The Feb 2026 release of Nano Banana 2 (Gemini 3.1 Flash Image Preview) tightened three things that matter for character work:
v1: Good for 3–4 sequential edits.
v2: Stable across 8–10+ sequential edits.
v1: Reliably 2.
v2: Up to 5.
v1: Approximately 6 objects.
v2: Up to 14 objects.
v1: Frequently garbled (book titles, name tags).
v2: Mostly legible.
v1: $0.30 / $2.50 per 1M tokens (input / output).
v2: $0.50 / $3.00 per 1M tokens.
If you're already on the original Nano Banana and shipping work, you don't have to switch. The v1 model is cheaper and the quality jump for character consistency specifically is incremental, not transformational. If you're starting fresh in May 2026, default to v2.
Even with v2, you'll hit these. They're the rough edges.
Across 5+ edits, hand shape and finger count are usually the first thing to degrade.
Workaround: when hands matter (close-ups, gesture-heavy scenes), describe them explicitly — "hands relaxed at her sides, five visible fingers on the right hand."
Hazel becomes light brown becomes amber over multiple edits.
Workaround: name the eye color in every prompt and use a specific shade ("hazel with green flecks") rather than a generic color.
"Five years older" might return someone who looks ten years older.
Workaround: bracket explicitly — "subtle aging, three to five years older, no dramatic changes."
Converting a photorealistic character to anime style sometimes generalizes the face into a generic anime style.
Workaround: name the unique features in the stylized prompt explicitly — "anime style, but maintain the small scar above the left eyebrow and the wavy dark hair length."
A cream linen shirt becomes a generic light shirt when the camera pulls back.
Workaround: name the garment specifics every time, even when zoomed out.
Place 4 of your established characters in a group scene and the model may swap features between two of them.
Workaround: scaffold one character at a time — start with two characters, lock the result, then add the third character to the locked image, then the fourth.
None of these are dealbreakers. They're things to design around.
A non-exhaustive list of where this matters:
If your work involves the same recognizable subject more than once, Nano Banana's consistency is the feature you came for.
Honest comparison, ordered by how well each handles identity consistency in 2026:
Approach: Native multimodal image editing.
Best for: Most consistency-critical work.
Trade-off: Pricey for very high volume.
Approach: Same approach, smaller object budget.
Best for: High-volume consistency at lower cost.
Trade-off: Slightly weaker across 8+ edits.
Approach: Character reference flag.
Best for: Strong stylization, brand-aesthetic shoots.
Trade-off: More face drift than Nano Banana.
Approach: Reference conditioning.
Best for: Open ecosystem, China-region users.
Trade-off: Identity weaker than NB on long sequences.
Approach: Native editing.
Best for: Open-weights, on-prem use.
Trade-off: Looser face hold than NB.
Approach: Multi-turn editing.
Best for: Conversational workflows.
Trade-off: Generally lags NB v2 on consistency.
For most people in 2026, the right default is Nano Banana 2 for face / identity-critical work, Midjourney v7 for visually-driven hero shots where slight drift is acceptable.
Yes. Every image generated or edited by Nano Banana ships with an invisible SynthID watermark. It's imperceptible to the human eye and survives most common edits (cropping, compression, screenshots) per Google DeepMind's documentation.
For most commercial uses this is fine. The watermark doesn't appear in the image; it doesn't affect quality; it doesn't restrict commercial use. What it does mean is that AI-detection tools can verify your image as Google-generated. If your use case requires that to be ambiguous, Nano Banana isn't the right tool.
Across a single conversation / workflow, reliably for 8–10 sequential edits in v2. Beyond that, identity slowly drifts. The fix: re-anchor to your original reference image every 5–8 prompts.
No. That's a diffusion-model workflow. Nano Banana works from a single reference image — no LoRA, no embedding, no fine-tuning.
v2 holds identity across roughly twice as many sequential edits, handles up to 5 characters in one scene (vs 2 reliably in v1), and renders text on character props much more accurately.
Yes. Commercial use is allowed; outputs carry an invisible SynthID watermark that doesn't affect commercial rights.
Generate a single high-quality, well-lit reference portrait first (Prompt 1 above). Save it. Use it as the upload in every subsequent prompt. Include 3–4 specific identifying features in every prompt's description.
Identity drift. The model is making small interpretive choices each time. Re-anchor: in your next prompt, upload the original reference image again and explicitly reference both ("Same character as in the reference image attached, holding to the original look").
Yes. Nano Banana 2 is the default model in the Gemini app and free tier limits are usually enough for casual experimentation. For production volume, use Google AI Studio or OpenRouter.
Yes, the anatomy translates. Substitute "same character" with "same mascot" or "same dog as the reference" and identify 3–4 distinctive features (breed, coat color, ear shape, eye color).