Stable Diffusion is the most powerful and most demanding AI image generator available. Unlike Midjourney's managed cloud or DALL-E 3's conversational interface, SD gives you direct access to the model's internals: token weights, samplers, CFG scale, negative prompts, and a vast ecosystem of fine-tuned models. With that power comes a steeper learning curve.
This guide covers everything: from the basic comma-separated token syntax to advanced weight manipulation, model selection, sampler choice, and the differences between SD 1.5, SDXL, and SD 3.5. You'll also learn how to use our Stable Diffusion prompt generator and tools like ImageToPrompt to generate SD-ready prompts from reference images.
Why Stable Diffusion Prompting Is the Most Complex — and Most Powerful
SD's complexity is a feature. Every parameter is exposed because the community demanded it. Here's what SD offers that most cloud-based generators don't:
- Negative prompts: A separate field for explicitly excluding unwanted elements — critical for avoiding common SD artifacts
- Token weighting: Assign numerical importance to any concept in your prompt
- CFG scale: Control how closely the model follows your prompt (vs. creative freedom)
- Sampler selection: Choose the denoising algorithm, each with different quality/speed/style tradeoffs
- Model ecosystem: Thousands of fine-tuned models specialized for anime, photorealism, architecture, product photography, etc.
- LoRA and embeddings: Inject specific styles, characters, or concepts with a few extra tokens
If you're coming from Midjourney or DALL-E 3, expect a 2–4 week adjustment period before SD results match your expectations. The ceiling is much higher, but the floor requires work to reach.
Prompt Syntax Basics: Token Lists vs Natural Language
SD was originally trained on LAION datasets with short, descriptive alt-text captions. As a result, comma-separated short phrases (tokens) work more reliably than full sentences — especially on SD 1.5 and SDXL.
Token-based (recommended for SD 1.5 and SDXL)
masterpiece, best quality, 1girl, long silver hair, blue eyes, white dress, standing in a moonlit forest, ethereal lighting, bokeh background, photorealistic
Natural language (works better on SD 3.5)
A photorealistic portrait of a young woman with long silver hair and blue eyes, wearing a flowing white dress, standing in a moonlit forest with soft ethereal lighting and a blurred background.
SD 3.5 was specifically trained with natural language prompts and handles them much better than previous versions. For SD 1.5 and SDXL, stick to comma-separated tokens for best results.
The Token Weight System
Token weights let you tell the model which elements of your prompt are most important. Without weights, all tokens receive equal attention — the model may de-emphasize something you care about in favor of a generic element.
Increasing weight (parentheses)
(important concept:1.3) ← 30% more emphasis
((very important:1.5)) ← double parentheses also increase weight
(normal importance) ← single parentheses = slight boost (~1.05x)
Decreasing weight (square brackets)
[less important:0.8] ← 20% less emphasis
[barely noticeable:0.5] ← 50% less emphasis
Practical weight examples
masterpiece, (photorealistic:1.4), portrait of a woman, (red hair:1.3), green eyes, [background:0.7], soft lighting, sharp focus
This prompt prioritizes photorealism and red hair, while slightly de-emphasizing the background so it doesn't compete with the subject.
Structuring Your Positive Prompt
SD responds to token position — earlier tokens receive slightly more weight in the attention mechanism. The recommended structure for maximum control:
Recommended token order
- Quality tokens (most important)
- Medium/format (photography, painting, illustration, etc.)
- Subject (main focus of the image)
- Subject details (appearance, attributes)
- Action/pose
- Setting/environment
- Lighting
- Style and artist references
- Camera/technical details
Quality Tokens: The SD Vocabulary for "Make It Good"
Quality tokens are phrases that SD associates with high-quality training images. Including them at the start of your prompt biases the model toward better outputs. Not all of these work equally well on all models — test and remove any that don't improve your results.
Universal quality tokens
masterpiece, best quality, ultra-detailed, high resolution
Photography-specific quality tokens
photorealistic, hyperrealistic, photograph, RAW photo, 8k uhd, DSLR, sharp focus, high detail, professional photography
Illustration/art quality tokens
highly detailed illustration, intricate details, sharp lines, beautiful artwork, trending on artstation, detailed digital painting
Tokens to use cautiously
Some tokens (like "award winning" or "featured on artstation") have become diluted from overuse. Test their effect by running with and without them — on some models they add noise rather than quality.
Negative Prompts: Your Most Powerful SD Tool
Negative prompts are what separates experienced SD users from beginners. While positive prompts tell the model what to include, negative prompts tell it what to exclude. The effect is dramatic — a well-crafted negative prompt can prevent the most common SD failure modes entirely.
For a complete treatment of negative prompts, see our Negative Prompts in Stable Diffusion guide. Here's the essential starter negative prompt:
Universal negative prompt (copy and use)
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name, deformed, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, bad proportions, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck
Photography-specific negative additions
illustration, painting, cartoon, anime, drawing, sketch, 3d render, CGI
Portrait-specific negative additions
bad eyes, asymmetrical eyes, crossed eyes, bad teeth, bad lips, uneven skin tone, skin blemishes, acne
CFG Scale: Following Your Prompt vs Creative Freedom
CFG (Classifier-Free Guidance) scale controls how strictly the model adheres to your prompt. Low values give the model creative freedom but may drift from your description. High values force strict adherence but cause artifacts and oversaturation.
| CFG Value | Behavior | Best For |
|---|---|---|
| 1–4 | Very loose, model does its own thing | Artistic experimentation, happy accidents |
| 5–7 | Balanced, some creative interpretation | Most art styles, illustrations, creative images |
| 7–9 | Good prompt adherence, standard quality | Default for most use cases |
| 10–12 | Strong prompt adherence, some saturation | Precise compositions, complex scenes |
| 13–17 | Very strict, increasing artifacts | Rarely recommended |
| 18+ | Overcooked, flat, posterized | Avoid |
Recommended default: CFG 7 for most art styles, CFG 6 for photorealistic, CFG 8–9 for complex prompts with many elements.
Sampler Selection: Which One to Use and When
Samplers are the denoising algorithm that generates your image step by step. Different samplers produce different results at the same step count and can produce dramatically different image character.
| Sampler | Speed | Quality | Best For |
|---|---|---|---|
| DPM++ 2M Karras | Fast | Excellent | Default recommendation, photorealism |
| DPM++ SDE Karras | Slow | Excellent, detailed | High-detail portraits, fine textures |
| Euler a | Fast | Good, varied | Exploration, getting diverse results |
| Euler | Fast | Good, consistent | Consistent iterations on the same prompt |
| DDIM | Fast | Good, smooth | Inpainting, img2img workflows |
| LMS Karras | Medium | Good | Artistic styles, illustrations |
| Heun | Slow (2x steps) | High quality | Final renders when quality matters most |
Recommendation for beginners: Start with DPM++ 2M Karras at 20–30 steps. It produces excellent results efficiently and is the community default for good reason.
Step Count Recommendations
More steps generally produce more detailed, refined images — up to a point. Beyond that point, you waste compute time with diminishing returns or slight degradation.
- 10–15 steps: Rough draft, quick experiments, exploring composition
- 20–25 steps: Good quality, efficient, covers 90% of use cases
- 30–40 steps: High quality, more detail, slower generation
- 50+ steps: Diminishing returns for most samplers. Only worth it for Euler a, which keeps changing at high step counts
Rule of thumb: 25 steps with DPM++ 2M Karras is the standard. Increase to 35 for final renders; decrease to 15 for rapid exploration.
SD 1.5 vs SDXL vs SD 3.5: Key Prompt Differences
Stable Diffusion 1.5
The original widely-used version. Has the largest ecosystem of fine-tuned models, LoRAs, and embeddings. Prompt token limit: 75 tokens (77 with separator). Responds best to comma-separated tokens. Negative prompts are essential. Default resolution: 512×512 (upscale after).
masterpiece, best quality, photorealistic, (portrait of a young woman:1.2), brown hair, brown eyes, white blouse, sitting at a cafe, warm lighting, bokeh background, sharp focus, 8k
Stable Diffusion XL (SDXL)
Larger model with significantly improved understanding of complex prompts and human anatomy. Default resolution: 1024×1024. Handles longer prompts better than SD 1.5. Still benefits from quality tokens at start. Has separate base and refiner models for best results.
professional portrait photography, young woman, brown hair pulled back, warm smile, sitting at a sunlit coffee shop, shallow depth of field, golden hour light from window, Canon 5D Mark IV, 85mm lens, f/1.8, photorealistic, high detail
Stable Diffusion 3.5
The latest generation. Understands natural language much better than previous versions. Produces more accurate text in images. Better prompt coherence for complex multi-element scenes. Uses a different architecture (Multimodal Diffusion Transformer). Natural language works well alongside or instead of token lists.
A professional portrait of a young woman with warm brown hair, sitting in a sunlit coffee shop with golden afternoon light coming through the window. She's wearing a white blouse and has a natural, warm smile. Shot with shallow depth of field, photorealistic, high detail.
LoRA and Embedding Integration in Prompts
LoRAs (Low-Rank Adaptations) are small model add-ons that inject a specific style, character, or concept. They're activated by including their trigger word(s) in your prompt.
Using a LoRA in A1111/Forge
masterpiece, best quality, [LoRA trigger word], [rest of your prompt] <lora:lora_filename:0.8>
The number after the colon (0.8) is the LoRA strength. Values between 0.6 and 1.0 work for most LoRAs. Higher values amplify the style but can cause artifacts. Lower values blend the style more subtly with the base model.
Textual Inversions (Embeddings)
Embeddings are activated simply by typing their name as a token in your prompt:
masterpiece, best quality, EasyNegative, bad_prompt_version2, [rest of negative prompt]
EasyNegative and bad_prompt_version2 are popular negative prompt embeddings that pack hundreds of exclusion tokens into a single word — include them in your negative prompt for a quick quality boost.
Anatomy of a Complete SD Prompt
Here's a fully annotated example showing every element in action:
POSITIVE:
masterpiece, best quality, ultra-detailed, ← quality tokens
photorealistic, RAW photo, ← medium/format
(1girl:1.1), ← subject (weighted)
long auburn hair, amber eyes, freckles, ← subject details
wearing a leather jacket, sitting on a ← action/setting
motorcycle on a rain-slicked city street,
night scene, neon reflections on wet asphalt, ← environment details
(volumetric lighting:1.2), (rim light:1.1), ← lighting (weighted)
cinematic, film grain, 35mm, ← style/technical
bokeh, f/2.8, sharp focus on face ← camera details
NEGATIVE:
lowres, bad anatomy, bad hands, extra fingers,
worst quality, low quality, blurry, watermark,
poorly drawn face, deformed, ugly, bad proportions
15 Complete Example Prompts Across Categories
1. Photorealistic Female Portrait
masterpiece, best quality, photorealistic, RAW photo, portrait of a beautiful woman in her 30s, dark red hair, green eyes, natural makeup, white silk blouse, soft studio lighting, rembrandt lighting, shallow depth of field, bokeh, 85mm lens, sharp focus, 8k
2. Fantasy Landscape
masterpiece, best quality, ultra-detailed, epic fantasy landscape, ancient ruined temple overgrown with vines, glowing magical stones, misty atmosphere, golden light through forest canopy, digital painting, concept art, artstation trending, cinematic composition, volumetric lighting
3. Cyberpunk City
masterpiece, best quality, cyberpunk city street at night, rain-slicked asphalt, neon signs in Japanese and English, holographic advertisements, crowded with pedestrians in futuristic clothing, (atmospheric fog:1.2), volumetric neon lighting, cinematic, ultra-detailed, 8k
4. Product Photography
masterpiece, best quality, product photography, perfume bottle on white marble surface, soft studio lighting, bokeh background, water droplets on glass, luxury editorial style, clean and minimal, sharp focus, commercial photography, white background
5. Anime Character
masterpiece, best quality, 1girl, anime style, silver hair, blue eyes, school uniform, cherry blossom background, sunlight filtering through petals, detailed illustration, artstation, beautiful detailed eyes, sharp focus, vibrant colors
6. Abstract Digital Art
masterpiece, best quality, abstract digital art, flowing liquid metal, iridescent colors, fractal patterns, dark background, (bioluminescent:1.3), surreal, cinematic lighting, ultra-detailed, 8k, trending on artstation
7. Architecture Exterior
masterpiece, best quality, architectural photography, modern minimalist house, large glass windows, concrete and wood exterior, surrounded by lush garden, golden hour lighting, (dramatic sky:1.2), architectural visualization, ultra-detailed, sharp focus
8. Food Photography
masterpiece, best quality, food photography, overhead shot of ramen bowl with rich broth, soft-boiled egg, chashu pork, green onions, nori, steam rising, (warm lighting:1.2), natural light, wooden table surface, editorial food styling, shallow depth of field
9. Historical Portrait
masterpiece, best quality, oil painting portrait, 17th century Dutch master style, merchant in dark clothes, fur collar, holding a globe, dark background, (Rembrandt lighting:1.3), detailed brushwork, museum quality, highly detailed
10. Sci-Fi Spaceship
masterpiece, best quality, ultra-detailed, sci-fi spacecraft approaching a ringed gas giant, (hard science fiction:1.2), NASA concept art style, photorealistic rendering, solar light from the right, stars and nebula in background, cinematic composition
11. Nature Macro
masterpiece, best quality, macro photography, dew drops on spider web at sunrise, (golden hour light:1.3), bokeh background, sharp focus on droplets, (prismatic light refraction:1.2), nature photography, canon 100mm macro lens
12. Character Concept Art
masterpiece, best quality, character concept art, female warrior in ornate dark plate armor, (battle-worn:1.1), scarred face, determined expression, full body shot, neutral background, detailed armor design, artstation trending, sharp focus, professional concept art
13. Cozy Interior
masterpiece, best quality, interior photography, cozy home library, floor-to-ceiling bookshelves, leather armchair, warm fireplace light, (golden hour window light:1.2), Persian rug, plants, candles, atmospheric, soft shadows, inviting atmosphere
14. Watercolor Illustration
masterpiece, best quality, watercolor illustration, charming street cafe in Paris, loose expressive brushwork, soft washes of color, warm afternoon light, people at outdoor tables, flowering window boxes, impressionist influence, detailed watercolor painting
15. Portrait with Environment
masterpiece, best quality, photorealistic, environmental portrait of an old fisherman, weathered face, kind eyes, yellow rain slicker, harbor in fog behind him, (moody overcast light:1.2), documentary photography style, Sebastião Salgado influence, deep focus, grain, black and white
Using ImageToPrompt to Generate SD Prompts from References
The fastest way to develop your SD prompting skills is to analyze images that already look the way you want. ImageToPrompt uses Claude Vision to extract detailed, SD-compatible prompts from any uploaded image.
When you upload an image, the tool identifies:
- Subject and composition elements
- Lighting type and direction
- Color palette and mood
- Art style and medium
- Technical characteristics (depth of field, grain, etc.)
You can select the Stable Diffusion output format to get a prompt already structured in the comma-separated token format SD expects. This eliminates guesswork when trying to replicate a specific visual style.
For comparison with other generators, see our Stable Diffusion vs Midjourney vs DALL-E 3 comparison. For negative prompt strategies, see the negative prompts deep dive. For general prompt engineering principles, see prompt engineering for AI art.