Writing AI image prompts is the single most important skill in AI art. The same model that produces mediocre, generic output from a vague prompt can create stunning, gallery-worthy images when given precise, well-structured instructions. The difference is not the tool — it is the prompt.
This guide covers everything from the fundamental anatomy of a prompt to advanced techniques like weighted syntax, negative prompts, and multi-concept composition. Whether you are writing your first prompt or refining your thousandth, you will find actionable techniques you can use immediately.
1. Anatomy of a Great AI Image Prompt
Every effective AI image prompt consists of six layers. Not every prompt needs all six, but understanding them gives you a complete toolkit for controlling the output. Think of each layer as a dial you can turn up or down depending on what matters most for your particular image.
Layer 1: Subject
The subject is what is in the image. This is the foundation of every prompt. Be specific about what you want to see: "a woman" is weaker than "a young woman in her 20s with short silver hair and round glasses." The more precise your subject description, the less the AI has to guess — and guessing is where unwanted randomness enters.
Effective subject descriptions include:
- Physical characteristics (age, build, hair, clothing)
- Action or pose (running, sitting cross-legged, looking over shoulder)
- Environment (in a neon-lit alleyway, on a cliff overlooking the ocean)
- Objects and details (holding a vintage camera, surrounded by floating lanterns)
Layer 2: Style and Medium
The style layer tells the AI what the image should look like artistically. This is often the most impactful single element in a prompt. The same subject described in different styles produces completely different images:
- Photographic styles: portrait photography, street photography, fashion editorial, documentary photography
- Art mediums: oil painting, watercolor, charcoal drawing, digital illustration, vector art
- Aesthetic movements: art nouveau, cyberpunk, vaporwave, brutalism, cottagecore
- Artist references: "in the style of Studio Ghibli", "Wes Anderson color palette" (use carefully — some models handle this better than others)
Layer 3: Lighting
Lighting is what separates amateur prompts from professional ones. Most beginners never mention lighting. Specifying it gives you dramatic control over the mood and realism of the output:
- Natural light: golden hour, blue hour, overcast diffused light, harsh midday sun, dappled forest light
- Artificial light: neon glow, studio softbox, candlelight, fluorescent, RGB backlighting
- Direction: backlit, side-lit, rim lighting, Rembrandt lighting, split lighting
- Quality: soft diffused, hard dramatic shadows, volumetric rays, god rays through windows
Layer 4: Composition
Composition describes how the image is framed. This is the layer most AI art beginners overlook entirely, yet it has enormous impact:
- Camera angle: low angle, bird's eye view, eye level, Dutch angle, worm's eye view
- Shot type: extreme close-up, medium shot, wide establishing shot, over-the-shoulder
- Depth: shallow depth of field (blurred background), deep focus (everything sharp), tilt-shift miniature effect
- Framing: rule of thirds, centered symmetry, leading lines, frame within a frame
Layer 5: Mood and Atmosphere
The mood layer adds emotional context. It influences color grading, tonal range, and the overall feel of the image:
- Emotional tones: serene, melancholic, joyful, ominous, whimsical, nostalgic
- Color temperature: warm golden tones, cool blue palette, muted desaturated, vibrant saturated
- Atmosphere: foggy, misty, rainy, dusty, ethereal, dreamlike
Layer 6: Model-Specific Parameters
Each AI model has its own parameter syntax that controls technical aspects of the generation:
- Midjourney:
--ar 16:9(aspect ratio),--v 6.1(version),--style raw(photorealistic),--stylize 250(artistic intensity),--chaos 30(variation) - Stable Diffusion: CFG scale, sampling steps, sampler (Euler, DPM++), seed, model checkpoint
- Flux: Aspect ratio in natural language, guidance scale
- DALL-E 3: Size (1024x1024, 1792x1024, 1024x1792), quality (standard, hd)
2. Prompt Structure Differences Across Models
The same creative vision requires different prompt syntax depending on which model you use. Understanding these differences is essential — a prompt optimized for Midjourney will underperform in Stable Diffusion, and vice versa.
Midjourney Prompt Structure
Midjourney responds best to comma-separated descriptors followed by parameter flags. The model interprets natural language well but also responds to specific vocabulary that has been "trained in" by the community.
a lone samurai standing on a hilltop at sunset, cinematic lighting, volumetric fog, dramatic sky, wide angle shot, hyper detailed, 8k --ar 16:9 --v 6.1 --style raw
Key traits: Midjourney excels at artistic interpretation. It adds visual flair even to simple prompts. Use --style raw for more literal interpretation, or higher --stylize values for more creative liberty.
Stable Diffusion Prompt Structure
Stable Diffusion uses a weighted tag system. Important concepts can be emphasized with parenthetical weighting, and a separate negative prompt field tells the model what to avoid.
Positive: (lone samurai:1.2) standing on hilltop, (sunset:1.1), cinematic lighting, volumetric fog, (dramatic sky:1.3), wide angle, hyper detailed, 8k resolution, masterpiece, best quality
Negative: blurry, low quality, deformed, watermark, text, extra limbs, bad anatomy
Key traits: Stable Diffusion gives you the most granular control of any model. Weighted tags, negative prompts, LoRA models, and checkpoint selection let you dial in exactly the output you want — at the cost of a steeper learning curve.
Flux Prompt Structure
Flux prefers detailed, natural language descriptions. It handles long, descriptive prompts well and does not use special syntax or parameter flags within the prompt itself.
A lone samurai stands motionless on a grassy hilltop as the sun sets behind distant mountains. The scene is bathed in warm golden light with volumetric fog rolling through the valley below. The sky is dramatic with deep oranges and purples streaking across the clouds. Shot from a low angle with a wide-angle lens, the samurai's silhouette is sharp against the blazing sky. Every detail of the armor is visible, from individual lacquer plates to the silk cord wrapping on the katana hilt.
Key traits: Flux excels at understanding complex spatial relationships and following detailed descriptions faithfully. Write prompts as if you are describing the image to a skilled artist. More detail generally produces better results.
DALL-E 3 Prompt Structure
DALL-E 3 works best with complete, grammatically correct sentences. It interprets natural language extremely well and handles multi-sentence descriptions effectively.
Create a cinematic wide-angle photograph of a lone samurai standing on a grassy hilltop at sunset. The sky should be dramatic with deep orange and purple clouds. Volumetric fog rolls through the valley below. The samurai is backlit by the setting sun, creating a striking silhouette. The image should be hyper-detailed with the texture of the samurai's armor clearly visible.
Key traits: DALL-E 3 is the most instruction-following model. It interprets prompts very literally and handles compositional requests well. It also adds safety modifications to prompts automatically, which means some subject matter may be adjusted.
3. 20 Power Words That Improve Any Prompt
These keywords consistently improve output quality across all major AI image generators. They have been tested extensively by the AI art community and produce measurably better results than their absence.
| Power Word | What It Does | Best For |
|---|---|---|
| cinematic | Adds film-quality lighting, color grading, and depth of field | Any scene, portraits, landscapes |
| ethereal | Creates soft, dreamlike, otherworldly atmosphere | Fantasy, portraits, nature |
| volumetric lighting | Adds visible light rays and atmospheric depth | Interiors, forests, moody scenes |
| hyperrealistic | Pushes output toward photographic realism | Portraits, products, architecture |
| moody | Deepens shadows, desaturates slightly, adds emotional weight | Portraits, landscapes, noir |
| intricate details | Adds fine texture and micro-detail across the image | Fantasy, sci-fi, ornamental designs |
| golden hour | Warm directional light with long shadows | Landscapes, portraits, outdoor scenes |
| dramatic | Increases contrast, adds tension to composition | Action scenes, weather, portraits |
| bokeh | Creates beautiful out-of-focus background blur | Portraits, close-ups, night scenes |
| masterpiece | General quality booster (especially in SD) | Any subject (strongest in SD) |
| concept art | Clean, professional illustration style | Characters, environments, vehicles |
| matte painting | Grand, sweeping environment with painterly quality | Landscapes, fantasy worlds, sci-fi |
| tilt-shift | Miniature effect with selective focus | Cityscapes, overhead views, dioramas |
| chiaroscuro | Strong contrast between light and dark | Portraits, still life, dramatic scenes |
| iridescent | Rainbow-like color shifts on surfaces | Sci-fi, fantasy, abstract, fashion |
| octane render | Clean, high-quality 3D rendering look | Product design, architecture, sci-fi |
| double exposure | Two images blended within one silhouette | Portraits, conceptual art, posters |
| macro photography | Extreme close-up with tiny subject detail | Insects, flowers, textures, jewelry |
| award-winning | Signals high production value and composition | Any professional-quality output |
| surreal | Dreamlike distortions, impossible geometry, Dali-esque | Fantasy, conceptual, artistic |
These words work because AI models have seen them associated with high-quality images in their training data. "Cinematic" is associated with professional film stills. "Volumetric lighting" is associated with high-end 3D renders and photography. The model draws on these associations to elevate the output.
4. Common Mistakes and How to Fix Them
Mistake 1: Being Too Vague
Bad: a cat
Better: a fluffy orange tabby cat sitting on a windowsill, afternoon sunlight, cozy interior, shallow depth of field, warm tones
Vague prompts give the AI maximum freedom to interpret — and that freedom usually means generic, forgettable output. The fix is always more specificity. Describe what you see in your mind's eye: what does the cat look like? Where is it? What is the light doing?
Mistake 2: Contradictory Descriptions
Bad: a dark moody scene, bright and cheerful, neon lights, natural sunlight
AI models average contradictory instructions rather than choosing one. The result is muddy, confused output that satisfies neither direction. Pick one mood, one lighting scheme, one style. If you want to explore alternatives, generate separate images with different prompts.
Mistake 3: Ignoring Negative Prompts (Stable Diffusion)
In Stable Diffusion, the negative prompt is not optional — it is half of your instruction set. Without negative prompts, you will consistently get watermarks, text overlays, deformed hands, blurry backgrounds, and low-quality artifacts. A solid baseline negative prompt:
blurry, low quality, watermark, text, signature, deformed, bad anatomy, extra fingers, mutated hands, poorly drawn face, jpeg artifacts
Mistake 4: Prompt Stuffing
Adding every quality keyword you can think of — "8k, ultra HD, hyper detailed, masterpiece, best quality, award-winning, professional, magazine quality" — has diminishing returns. One or two quality boosters are sufficient. Beyond that, the extra keywords compete for the model's attention and dilute your actual content descriptors.
Mistake 5: Wrong Prompt Format for the Model
Using Stable Diffusion's weighted tag syntax (subject:1.3) in Midjourney does nothing. Using Midjourney's --ar 16:9 flags in DALL-E 3 is ignored. Always format your prompt for the specific model you are using. This is one area where ImageToPrompt helps enormously — it automatically generates prompts in the correct syntax for your selected model.
Mistake 6: Neglecting Composition
Many prompts describe the subject in detail but say nothing about how the image is composed. Adding camera angle, shot type, and framing gives the AI critical spatial information. "Wide establishing shot from a low angle" produces a fundamentally different image than "close-up portrait at eye level" — even with the same subject and style descriptors.
5. Advanced Techniques
Weighted Syntax (Stable Diffusion)
In Stable Diffusion, you can assign numerical weights to emphasize or de-emphasize specific concepts:
(beautiful sunset:1.4), mountain landscape, (snow-capped peaks:1.2), alpine meadow, (wildflowers:0.8), cinematic, masterpiece
Values above 1.0 increase emphasis. Values below 1.0 decrease it. The default weight for unweighted terms is 1.0. Keep weights between 0.5 and 1.5 — extreme values cause artifacts. Use emphasis on the 2–3 most important elements, not everything.
Negative Prompts (Stable Diffusion & Midjourney)
Negative prompts tell the AI what to avoid. In Stable Diffusion, there is a dedicated negative prompt field. In Midjourney, use the --no flag:
Midjourney: beautiful forest landscape, misty morning --no people, text, watermark
SD Negative: blurry, low quality, deformed hands, watermark, text, ugly, duplicate
Effective negative prompts focus on common failure modes for your subject. Portrait negative prompts should target anatomical issues. Landscape negatives should target artifacts and unwanted elements.
Aspect Ratios
Aspect ratio has a surprising impact on composition. Wide ratios (16:9, 21:9) naturally produce cinematic landscapes. Tall ratios (9:16, 2:3) produce portrait-oriented compositions. Square (1:1) centers the subject symmetrically.
Midjourney: --ar 16:9 (cinematic), --ar 9:16 (portrait/mobile), --ar 3:2 (photography standard)
DALL-E 3: 1792x1024 (landscape), 1024x1792 (portrait), 1024x1024 (square)
Seed Control
Seeds let you reproduce or iterate on specific results. In Midjourney, use --seed 12345 to lock the random seed. In Stable Diffusion, set the seed value in the UI. Same prompt + same seed = same image. Change one element while keeping the seed to see how that specific change affects the output.
Multi-Prompt Syntax (Midjourney)
Midjourney supports multi-prompt with the :: separator, letting you weight different concepts independently:
cyberpunk city::2 rainy night::1.5 neon reflections::1 --ar 16:9
Higher numbers after :: give more weight to that concept. This is more precise than comma separation because each segment is interpreted independently.
Prompt Blending
Combine two distinct concepts to create hybrid results. This works in both Midjourney and Stable Diffusion:
Midjourney: a portrait of a woman --style raw --sref [url1] --sref [url2]
SD (prompt mixing): [cat:dog:0.5] sitting in a garden
6. 10 Before/After Prompt Examples
These examples demonstrate how adding specificity, structure, and the right keywords transforms mediocre prompts into effective ones. Each "before" prompt is the kind of thing a beginner would write. Each "after" prompt applies the principles from this guide.
Example 1: Portrait
Before: a woman with flowers
After: a young woman with long auburn hair wearing a white linen dress, standing in a field of lavender, golden hour sunlight, soft bokeh background, warm color palette, shallow depth of field, editorial fashion photography --ar 2:3 --v 6.1
Example 2: Landscape
Before: a mountain scene
After: a vast alpine valley at dawn, snow-capped peaks reflecting pink and orange sunrise light, a winding river through emerald meadows, low morning fog, panoramic wide angle, Ansel Adams style black and white with subtle warm toning --ar 21:9 --v 6.1
Example 3: Product Shot
Before: a watch on a table
After: a luxury chronograph watch with a midnight blue dial on a dark slate surface, studio lighting with a single softbox from the upper left, reflections on the polished bezel, shallow depth of field, clean minimal background, commercial product photography, 4K --ar 1:1 --v 6.1 --style raw
Example 4: Fantasy Character
Before: a wizard
After: an elderly wizard with a long silver beard and weathered face, wearing dark robes embroidered with glowing arcane runes, holding a gnarled oak staff topped with a faintly glowing blue crystal, standing in the doorway of an ancient stone tower, dramatic backlight from a stormy sky, concept art, intricate details --ar 2:3 --v 6.1
Example 5: Interior Design
Before: a modern living room
After: a Scandinavian minimalist living room with floor-to-ceiling windows overlooking a pine forest, warm oak flooring, a low-profile beige linen sofa, a round marble coffee table, a single monstera plant in a ceramic pot, soft diffused natural light, neutral color palette with warm undertones, architectural photography --ar 16:9 --v 6.1 --style raw
Example 6: Food Photography
Before: pasta on a plate
After: a rustic ceramic bowl of handmade pappardelle with slow-braised lamb ragu, fresh basil leaves and shaved parmesan on top, olive oil glistening, on a weathered wooden table, warm side lighting from a window, shallow depth of field, overhead angle, food editorial photography, Bon Appetit style --ar 4:5 --v 6.1
Example 7: Sci-Fi Environment
Before: a space station
After: the interior of a massive orbital space station, curved glass walls revealing Earth below, bioluminescent plants growing in hydroponic columns, people in sleek jumpsuits walking along a wide promenade, warm artificial lighting mixed with blue earth-glow, volumetric haze, concept art, Chris Foss inspired architecture --ar 21:9 --v 6.1
Example 8: Animal Portrait
Before: a cat
After: a Bengal cat with vivid rosette markings, sitting regally on a velvet cushion, sharp green eyes staring directly at the camera, soft studio lighting with a dark background, shallow depth of field, every whisker visible, professional pet portrait photography --ar 1:1 --v 6.1 --style raw
Example 9: Abstract Art
Before: abstract colorful art
After: an abstract expressionist painting with bold sweeping brushstrokes of deep cobalt blue, cadmium red, and burnt sienna on a large canvas, thick impasto texture visible, paint drips and splatters, raw emotional energy, inspired by Franz Kline and Willem de Kooning, gallery-quality, high resolution --ar 3:4 --v 6.1
Example 10: Urban Photography
Before: a city at night
After: a rain-soaked Tokyo street at midnight, neon signs reflecting in puddles on the pavement, a lone figure with an umbrella walking away from camera, steam rising from a ramen shop entrance, shallow depth of field, cinematic color grading with teal and orange tones, street photography, 35mm lens --ar 9:16 --v 6.1
7. Using ImageToPrompt to Learn from Great Images
One of the fastest ways to improve your prompt writing is to study what makes existing great images work — at the prompt level. When you see an AI-generated image that impresses you, or a photograph with a style you want to replicate, the visual elements that make it work are encoded in describable attributes: the lighting direction, the color palette, the composition, the texture.
ImageToPrompt extracts these attributes automatically. Upload any image and the AI identifies every visual element that matters for prompt writing: subject details, artistic style, lighting setup, composition technique, color palette, mood, and medium. The output is a ready-to-use prompt formatted for your chosen model.
But beyond generating usable prompts, this is a powerful learning tool. Compare the generated prompt against the source image and ask yourself: which descriptors correspond to which visual elements? What keywords did the AI use to capture the lighting? How did it describe the composition? Over time, this builds your intuition for which words produce which visual effects.
You can also use our Describe Image tool for a detailed natural language analysis of any image, or visit the model-specific prompt generators for Midjourney and Stable Diffusion to see how the same image is described differently for each model.
Learn by Example: Generate Prompts from Any Image
Upload a reference image and see exactly how the AI describes it as a prompt for Midjourney, Stable Diffusion, Flux, or DALL-E 3. The fastest way to learn prompt writing is to study great examples.
Try ImageToPrompt Free →Common Questions
What makes a good AI image prompt?
A good AI image prompt has five elements: a clear subject, a defined style or medium, lighting description, composition details, and mood or atmosphere. The most important factor is specificity — "a golden retriever running through autumn leaves in a forest, soft morning light, shallow depth of field, warm color palette" produces dramatically better results than "a dog in nature". Each detail reduces randomness and increases the chance of getting what you envision.
How long should an AI image prompt be?
It depends on the model. For Midjourney, 20–60 words is the sweet spot. Stable Diffusion has a 75-token limit (roughly 60 words) before truncation, so concise weighted tags work best. Flux thrives on detailed natural language and can effectively use 100+ words. DALL-E 3 prefers full sentences and handles up to several hundred words. Write enough to fully describe your vision, but do not pad with redundant terms.
Do I need to learn different prompt formats for each AI model?
Yes. Midjourney uses comma-separated descriptors with parameter flags. Stable Diffusion uses weighted tags with separate negative prompts. Flux prefers detailed natural language. DALL-E 3 works best with complete sentences. A prompt optimized for one model will produce mediocre results in another. Tools like ImageToPrompt automatically format prompts for your chosen model.
Can AI write image prompts for me?
Yes. Use an image-to-prompt tool like ImageToPrompt to upload a reference image and get a prompt optimized for your target model. You can also use text-to-prompt tools that expand simple descriptions into detailed, model-specific prompts. These tools are excellent for learning prompt structure — study the prompts they generate to understand what descriptors produce which effects.