How To Create Hyper-Realistic AI Images With Stable Diffusion

June 9, 2024

39 Views 0

How To Create Hyper-Realistic AI Images With Stable Diffusion

Are you ready to blur the line between reality and AI-generated art?

If you follow the generative AI space, and image generation in particular, you’re likely familiar with Stable Diffusion. This open-source AI platform has ignited a creative revolution, empowering artists and enthusiasts alike to explore the realms of human creativity—all on their own computers, for free.

With any simple prompt, you can get a picturesque landscape, a fantasy illustration, a 3D creature or a cartoon. But the real eye-popping capabilities are in the ability of these tools to create stunningly realistic imagery.

To do so requires some finesse, however, and some attention to detail that generalistic models sometimes lack. Some avid users can quickly tell when an image is generated with MidJourney or Dall-e just by looking at it. But when it comes to creating images that fool the human brain, Stable Diffusion’s versatility is unbeaten.

From the meticulous handling of color and composition to the uncanny ability to convey human emotion and expression, some custom models are redefining what’s possible in the world of generative AI. Here are some specialized models that we think are la crème de la crème of hyper-realistic image generation with Stable Diffusion.

We used the same prompt with all of our models and avoided using LoRas—Low-Rank Adaptation add-on modifiers—to be more fair in our comparisons. Our results were based on prompting and text embeddings. We also used incremental changes to test small variations in our generations.

The prompts

Our positive prompt was: professional photo, closeup portrait photo of caucasian man, wearing a black sweater, serious face, dramatic lighting, nature, gloomy, cloudy weather, bokeh

Our negative prompt (instructing Stable Diffusion on what not to generate) was: embedding:BadDream, embedding:UnrealisticDream, embedding:FastNegativeV2, embedding:JuggernautNegative-neg, (deformed iris, deformed pupils, semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, anime:1.4), text, cropped, out of frame, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, embedding:negative_hand-neg.

All of the resources used will be listed at the end of this article.

Stable Diffusion 1.5: the AI veteran that’s aging with grace

Stable Diffusion 1.5 is like a good old American muscle car that beat fancier, latest-model cars in a drag race. Developers have been messing around with SD1.5 for so long that it effectively buried Stable Diffusion 2.1 in the ground. In fact, a lot of users today still prefer this version over SDXL, which is two generations newer.

When it comes to creating images that are virtually indistinguishable from real-life photos, these models are your new best friends.

1. Juggernaut Rborn

Juggernaut Rborn is a fan-favorite model is known for its realistic color composition and impressive ability to differentiate between subjects and backgrounds. This model is particularly good at generating high-quality skin details, hair, and bokeh effects in portraits.

The latest version has been fine-tuned to deliver even more compelling results. Juggernaut has always offered color compositions that tend to be more realistic than the saturated, unnatural colors of many other Stable Diffusion models. Its generations tend to be warmer, more washed out, similar to an unedited RAW photo.

Getting the best results will still require some tweaking: use the DPM++ 2M Karras sampler, set to around 35 steps, and an average CFG scale of 7.

2. Realistic Vision v5.1

A true trailblazer in the realm of photorealistic image generation, Realistic Vision v5.1 brought a pivotal moment in the evolution of Stable Diffusion, enabling it to compete against MidJourney and any other model in terms of photorealism. The v5.1 iteration excels at capturing facial expressions and imperfections, making it a top choice for portrait enthusiasts. It also handles emotions well and focuses more on the subject than the background, ensuring the final result is always realistic. This model is a popular choice thanks to its impressive performance and versatility.

There is a newer version (v6.0), but we like V5.1 more because we feel it is still better in the little details that matter in realistic images. Things like skin, hair, or nails tend to be more convincing in 5.1, but other than that, results are similar, and the improvements seem incremental.

3. I Can’t Believe It’s Not Photography

With its versatility and impressive lighting effects, the cheekily named I Can’t Believe It’s Not Photography model is a great all-around option for hyper-realistic image generation. It is very creative, handles different angles well, and can be used for a variety of subjects, not just people.

This model is particularly good at 640×960 resolution —which is higher than original SD1.5— but can also deliver great results at 768×1152 which is a level of resolution native to SDXL.

For optimal results, use the DPM++ 3M SDE Karras or DPM++ 2M Karras sampler, 20-30 steps, and a 2.5-5 CFG scale (which is lower than usual).

Honorable Mentions:

Photon V1: This versatile model excels in producing realistic results for a wide range of subjects, including people.

Realistic Stock Photo: If you want to generate people with the polished and perfected look of stock photos, this model is an excellent choice. It creates convincing and accurate images without any skin imperfections.

aZovya Photoreal: Although not as well-known, this model produces impressive results and can enhance the performance of other models when merged with their training recipes.

Stable Diffusion XL: The Versatile Visionaries

While Stable Diffusion 1.5 is our top pick for photorealistic images, Stable Diffusion XL offers more versatility and high-quality results without resorting to tricks like upscaling. It requires a little bit of power, but can be run with GPUs with 6GB of vRAM—2GB less than SD1.5 requires.

Here are the models that are leading the charge.

1. Juggernaut XL (Version x)

Building on the success of its predecessor, Juggernaut XL brings a cinematic look and impressive subject focus to Stable Diffusion XL. This model delivers the same characteristic color composition that steps away from saturation, along with good body proportions and the ability to understand long prompts. It focuses more on the subject and it defines the factions very well—as well as any SDXL model can right now.

For the best results, use a resolution of 832×1216 (for portraits), the DPM++ 2M Karras sampler, 30-40 steps, and a low CFG scale of 3-7.

2. RealVisXL

Customized with realism in mind, RealVisXL is a top choice for capturing the subtle imperfections that make us human. It excels at generating skin lines, moles, changes of tones, and jaws, ensuring that the final result is always convincing. It is probably the best model to generate realistic humans.

For optimal results, use 15-30+ sampling steps and the DPM++ 2M Karras sampling method.

3. HelloWorld XL v6.0

Generalistic model HelloWorld XL v6.0 offers a unique approach to image generation, thanks to its use of GPT4v tagging. While it may take some time to get used to, the results are well worth the effort.

This model is particularly good at delivering the analog aesthetic that is often missing in AI-generated images. It also handles body proportions, imperfections, and lighting well. However, it is different from other SDXL models at its core, which means that you may need to adjust your prompts and tags to achieve the best results.

For comparison, here is a similar generation using the GPT4v tagging, with the positive prompt: film aesthetic, professional photo, closeup portrait photo of caucasian man, wearing black sweater, serious face, in the nature, gloomy and cloudy weather, wearing a wool black sweater, deeply atmospheric, cinematic quality, hints of analog photography influence.

Honorable mentions for SDXL include: PhotoPedia XL, Realism Engine SDXL and the deprecated Fully Real XL.

Pro tips for hyper-realistic images

No matter which model you choose, here are some expert tips to help you achieve impressive, lifelike results:

Experiment with embeddings: To enhance the aesthetics of your images, try using embeddings recommended by the model creator or use widely popular ones like BadDream, UnrealisticDream, FastNegativeV2, and JuggernautNegative-neg. There are also embeddings available for specific features, such as hands, eyes, and specific .
Embrace the power of LoRAs: While we left them out here, these handy tools can help you add details, adjust lighting, and enhance skin texture in your images. There are many LoRAs available, so don’t be afraid to experiment and find the ones that work best for you.
Use face detailing extension tools: These features can help you achieve excellent results in faces and hands, making your images even more convincing. The Adetailer extension is available for A1111, while the Face Detailer Pipe node can be used in ComfyUI.
Get creative with ControlNets: If you’re a perfectionist when it comes to hands, ControlNets can help you achieve flawless results. There are also ControlNets available for other features, such as faces and bodies, so don’t be afraid to experiment and find the ones that work best for you.

For help gettings started, you can read our guide to Stable Diffusion.