You have a photo — a landscape you shot on vacation, a portrait with perfect light, a product image from a competitor, a screenshot from a film. You want to generate something similar, or use that photo's aesthetic as the foundation for AI-generated images. But you don't know what prompt would produce that look.
This is one of the most common problems for anyone using AI image generation seriously, and there are three distinct methods for solving it. This guide covers all three — automatic extraction, manual analysis, and a hybrid approach — with specific guidance for different types of photos.
Why Real Photos Are Great References for AI Generation
Photographs contain information that's hard to specify from imagination alone:
- Precise lighting: A photo shows you the exact quality, direction, color temperature, and behavior of light in a scene. Describing it in words is much harder than seeing it.
- Specific composition: The exact relationship between elements, the focal length feel, the depth of field — all visible in a reference photo.
- Color grading: The particular color treatment of a photo — film stock emulation, mood grading, saturation decisions — is difficult to specify abstractly but easy to extract from the image itself.
- Style authenticity: Real photography has a truth to it. Using it as a reference tends to ground AI generation in visual reality rather than the generically "AI-looking" outputs that poorly-prompted models produce.
The goal isn't to copy the photo — it's to extract the visual language so you can apply it to new subjects.
Method 1: Automatic Extraction Using ImageToPrompt
The fastest method: upload your photo to ImageToPrompt and let Claude Vision analyze it. The tool examines every visual element of the image and returns a structured prompt that captures the key characteristics.
Step-by-Step Walkthrough
- Prepare your photo. Any common image format works (JPEG, PNG, WebP). The tool handles everything from phone photos to high-resolution DSLRs. Clear images produce better prompts than blurry or highly compressed ones.
- Navigate to imagetoprompt.dev. The tool is free and requires no account.
- Upload your image. Drag and drop or use the file selector. The upload processes in a few seconds.
- Select your target model. Choose the AI generator you're planning to use — the prompt format differs between Midjourney, Stable Diffusion, DALL-E 3, and Flux. The tool adjusts its output accordingly.
- Review the generated prompt. The output is organized by visual element: subject, lighting, style, composition, and technical characteristics. You'll see what the model identified as the defining features of your photo.
- Copy and adapt. Use the prompt as-is for a similar image, or modify the subject while keeping the style, lighting, and composition elements to create something new.
What the Tool Extracts Well
- Lighting type, direction, and quality
- Color palette and tone (warm/cool, saturation level)
- Photographic style (editorial, documentary, commercial, etc.)
- Composition and framing
- Depth of field and focus characteristics
- Atmosphere and mood
- Subject description
What It Extracts Less Reliably
- Specific geographic locations (produces descriptions rather than place names)
- Very subtle film stock or grading characteristics
- Exact focal lengths (produces equivalent descriptions)
- Identities of specific people
Method 2: Manual Photo Analysis
Manual analysis takes longer but teaches you the skill of visual reading — which ultimately makes you better at both prompting and photography. It's especially valuable when you want to deeply understand what makes a particular photo work.
The Systematic Analysis Framework
Work through these six dimensions in order. Take notes as you go — your final prompt is built from these notes.
1. Subject
What is the primary subject? Describe it without assuming context:
- What type of thing is it? (person, animal, object, landscape, architecture)
- What are its specific visual characteristics? (age, build, expression, condition, size)
- What is it doing? (sitting, standing, in motion, static)
- What details are emphasized? (specific clothing, texture, color)
2. Environment
- Where is the subject? (interior/exterior, urban/rural/natural)
- What is visible in the background? (specific or blurred)
- What is the relationship between subject and environment? (isolated, embedded, contrasted)
- What season or time period does it suggest?
3. Lighting Analysis
This is the most important dimension. Train yourself to identify:
- Direction: Where is the main light coming from? Front-lit (flat, even), side-lit (shadows on one side), backlit (silhouette), top-lit (harsh overhead shadows), Rembrandt (triangle highlight)
- Quality: Is it hard (sharp shadows, high contrast) or soft (gradual transitions, low contrast)?
- Color temperature: Warm (orange/yellow) = afternoon/golden hour/tungsten. Cool (blue) = shade/overcast/blue hour/fluorescent. Neutral = flash/midday.
- Source: Natural (sun, overcast, window) or artificial (studio, neon, candle, lamp)
- Fill: Is the shadow side completely dark or gently filled? (fill light reduces contrast)
4. Color Analysis
- What is the dominant color family? (warm, cool, neutral)
- How saturated are the colors? (muted/desaturated, natural, vivid/oversaturated)
- What is the tonal range? (high key = bright, airy; low key = dark, shadowy)
- Are there color grading signatures? (cross-processing, film emulation, teal-and-orange)
- What 2–3 colors dominate the palette?
5. Camera and Technical
- Focal length feel: Wide (lots of environment, distorted perspective) or telephoto (compressed, isolated subject, background blur)?
- Depth of field: Is the background sharp or blurred (bokeh)? How selective is the focus?
- Camera angle: Eye level, low angle (looking up), high angle (looking down), bird's eye (straight down)?
- Grain/noise: Clean and digital, or gritty film grain?
- Motion blur: Static/sharp, or motion captured?
6. Style and Medium Assessment
- What genre of photography does this most resemble? (documentary, editorial, commercial, fine art, street, portrait)
- Does it feel like a specific era? (film era, digital era, specific decade aesthetic)
- Are there artistic references? (cinematic influence, specific photographer's style)
- What is the overall mood? (melancholy, joyful, tense, peaceful, mysterious)
Building the Manual Prompt
Once you've worked through all six dimensions, assemble your notes into a prompt:
[STYLE/MEDIUM] + [SUBJECT DESCRIPTION] + [ENVIRONMENT] + [LIGHTING] + [CAMERA/TECHNICAL] + [MOOD]
Method 3: Hybrid — Auto-Extract Then Refine Manually
This method combines the speed of automatic extraction with the accuracy of manual analysis. It's the recommended approach for most use cases because it's both fast and produces the highest quality prompts.
The Hybrid Workflow
- Upload your photo to ImageToPrompt and generate an initial prompt
- Read the extracted prompt carefully — identify what it got right and what it missed or mis-described
- Apply the manual analysis framework (above) to the same image, focusing on the dimensions the auto-extract seemed weakest on
- Merge: keep the auto-extracted elements that seem accurate, replace or supplement with your manual observations
- Test the combined prompt, evaluate the output, iterate
This hybrid approach typically produces prompts 30–40% more accurate than auto-extraction alone, in a fraction of the time of full manual analysis.
Photo Categories: Specific Tips for Each Type
Portrait Photos → Character Prompts
Portraits carry enormous amounts of information that AI can use. The key elements to extract:
- Lighting setup (this is the most valuable information in a portrait)
- Lens/focal length feel (85mm portrait compression vs 35mm environmental portrait)
- Color grading (this defines the "feel" more than almost anything else)
- Expression and emotional quality
What to adapt: Replace the specific person's description with your desired character description, but keep all the lighting, style, and technical information.
Landscape Photos → Environment Prompts
Landscapes yield rich environmental vocabulary. Focus on:
- Time of day and its specific light qualities
- Weather and atmosphere (mist, clarity, dramatic clouds, calm)
- Scale and framing (intimate valley vs vast panorama)
- Foreground/midground/background relationship
- Color palette at this specific time and weather condition
Architecture Photos → Building and Scene Prompts
- Architectural style and era
- Material and texture vocabulary (glass curtain wall, weathered brick, polished concrete)
- Light and shadow play on surfaces
- Human scale reference (tiny people in frame give scale)
- Perspective choice (straight on vs dynamic angle)
Food Photos → Product and Food Photography Prompts
- Shooting angle (overhead is most common; 45-degree is classic; straight-on is dramatic)
- Styling approach (rustic, clean/editorial, abundance, minimal)
- Light source quality and direction (window light vs studio light)
- Props and surface (the "environment" of the dish)
- Focus plane and depth of field treatment
Abstract and Detail Photos → Texture and Pattern Prompts
- Material identification (what is this made of?)
- Surface condition (new, aged, worn, wet, dry)
- Light behavior (specular highlights, subsurface scattering, matte diffusion)
- Scale and macro level
- Color and tonal variation
What Gets Lost in Translation
No prompt — manually written or auto-extracted — perfectly captures everything in a photograph. Understanding the limitations helps you compensate for them:
Emotion and Human Presence
A photograph of a real person carries the weight of that person's genuine emotion, history, and presence. AI prompts describe the visual surface. The "feel" of genuine emotion in a photo is extremely hard to prompt for and often results in AI-generated faces that look pleasant but hollow. Compensate by being very specific about expression and using mood language.
Specific People
AI cannot reproduce specific individuals from a prompt (without LoRA or reference image workflows). A prompt extracted from a photo of a specific person will produce a similar-feeling image with a different, AI-generated face.
Copyrighted or Trademarked Elements
Brand logos, trademarks, and copyrighted characters visible in a photo cannot and should not be included in prompts. Remove these from the extracted prompt or substitute generic descriptions.
Location-Specific Uniqueness
The specific character of a real place — the exact quality of the light at Santorini, the particular stone of Florence — can be approximated but not precisely replicated. Use the description as a guide and accept some variation.
Improving Photo-Derived Prompts for Each AI Model
| Model | Adaptation Tip |
|---|---|
| Midjourney | Add --style raw for maximum fidelity to prompt. Add --ar to match original photo proportions. Consider adding reference image with --sref or --iw for visual guidance alongside the text prompt. |
| Stable Diffusion | Convert extracted description to comma-separated tokens. Add quality tokens at the front. Move unwanted elements to negative prompt. Add photography-specific tokens: "RAW photo, DSLR, photorealistic." |
| DALL-E 3 | Convert to a descriptive paragraph rather than a list. DALL-E 3 handles natural language well. Add "photograph" or "photography" to anchor the output style. |
| Flux | Natural language works well. Be specific about technical elements — Flux handles "shot on Canon 5D at f/1.8, 85mm, golden hour" type descriptions effectively. See our Flux prompt guide. |
Real Examples: 5 Photos with Extracted Prompts
Example 1: Street Portrait
Photo: A man in his 60s photographed on a narrow European street, late afternoon, looking directly at camera, slight smile, shallow depth of field.
Extracted prompt:
environmental portrait, elderly man in his 60s, weathered kind face, slight warm smile, direct eye contact, standing on a narrow cobblestone street in a European old town, late afternoon golden light, warm golden hour, shallow depth of field, blurred buildings behind, documentary portrait photography, candid realism, 85mm lens feel, slight grain, film photography aesthetic
Adaptation (new subject): Replace "elderly man in his 60s" with "young woman in her 30s, dark hair, dark eyes, confident expression" — keep everything else.
Example 2: Mountain Landscape
Photo: Jagged mountain peaks at blue hour, snow on peaks, dark valley below, one star visible, deep blue-purple sky.
Extracted prompt:
landscape photography, dramatic mountain peaks with snow-capped summits, blue hour, deep blue-purple twilight sky, dark valley below, first stars appearing, silhouetted foreground rocks, cold and serene atmosphere, long exposure feel, no people, National Geographic quality, --ar 16:9
Example 3: Product Shot
Photo: A skincare serum bottle on white marble, soft diffused light from left, water droplets on bottle, minimal white background.
Extracted prompt:
luxury product photography, glass serum bottle, water droplets on surface, white marble surface with subtle veining, soft diffused light from left, soft shadow to right, white background, minimal styling, editorial beauty photography, clean and premium, shallow depth of field, sharp focus on bottle, commercial quality --ar 1:1
Example 4: Food Overhead
Photo: Overhead shot of a bowl of ramen, steam rising, wooden surface, chopsticks, warm ambient light.
Extracted prompt:
overhead food photography, bird's eye view, bowl of ramen with rich golden broth, soft-boiled egg halved, chashu pork slices, green onion, nori, steam rising, chopsticks resting on bowl edge, dark wooden table surface, warm ambient lighting, no harsh shadows, food editorial styling, rustic Japanese restaurant aesthetic, --ar 1:1
Example 5: Abstract Texture
Photo: Close-up of weathered concrete wall, peeling paint layers in teal and orange, cracks, aged texture.
Extracted prompt:
macro texture photography, weathered concrete wall surface, layers of peeling paint in teal and orange, exposed concrete underneath, cracks and imperfections, aged and worn surface, flat even lighting, no strong shadows, texture reference, close-up detail, abstract photography, film grain, muted palette --ar 1:1
Privacy and Ethics of Using Photos of Real People
Before uploading a photo containing identifiable people:
- Public figures: Using photos of public figures as style references is generally acceptable for generating fictional characters with similar aesthetics. Do not attempt to generate images that impersonate or falsely represent specific real people.
- Private individuals: Use photos of private individuals only with their knowledge and consent. This includes photos from social media — the fact that something is publicly accessible does not automatically make it appropriate to use as AI training material or reference.
- Your own photos: Using your own photographs as references is always appropriate.
- Uploaded data: ImageToPrompt processes images to extract prompts and does not store your images beyond the immediate session. Check our privacy policy for current data handling details.
- Deepfakes and misrepresentation: Generating realistic images that make specific real people appear to do or say things they didn't is unethical and increasingly illegal in many jurisdictions. Don't use extracted prompts for this purpose.
The most ethical and most effective use of photo-to-prompt tools is to extract aesthetic vocabulary — lighting, style, composition — from images and apply that vocabulary to new, fictional subjects.
For more on extracting prompts from images, see our complete image-to-prompt guide and reverse-engineering AI art prompts. For model-specific guidance on using extracted prompts, see our best image-to-prompt tools comparison.