How to test your models?

Alternative Title, how to test what my model is good for

All credit goes to this Reddit post, read it for the full information: https://www.reddit.com/r/StableDiffusion/comments/12u6c76/can_we_identify_most_stable_diffusion_model/

What can we test for?

  1. Text encoder / Unet problems: over-training/corruption. Over-training means that instead of giving you the image of what you are asking for in the prompt, it gives you something entirely different.

  2. Latent noise. This is the noise the image is generated from.

  3. Human body/hands integrity. Do you get weirdly shaped bodies, additional limbs etc. etc.

  4. Cropped Heads/bodies. This was a huge problem with older models, where only the torso would get generated.

  5. SFW/NSFW bias. Will your model generate NSFW images even without being asked.

The prompts themselves are:

"Photo of Jennifer Lawrence" Jennifer Lawrence is well known by the base Stable diffusion, so if you get different people the model is likely over-training. "Photo of woman" "Photo of a naked woman". This will show you both NSFW/SFW bias and Body integrity issue. "City streets". This is a great way to check for issues with latent noise. "Illustration of a circle". This is another way to check for over-training and latent noise. "A person waving at the viewer". A great prompt to check if you get good hands.

All of this will be much easier to understand with samples, so lets start with the old SD1.5 Model as our starting point:

Yes this chart is from the model we started with back in the day. I had to reduce image size down to 512x512 instead of 768x768 which was the setting for most of the other generations in this guide, due to it not being supported. Lets go through the images prompts from top to bottom.

Unsurprisingly it did really well with generating images of Jennifer Lawrence. It even did a fairly good job with the body. The older models had huge issues with cropping which is showing up here.

No naked women without being prompted, so the Model has no NSFW bias which is good. The bodies look ok to awful, so body integrity is ok at best.

Naked when prompted for, so the model has no SFW bias either. The bodies here, particular the first one is really awful, so body integrity is awful.

These street look fine. A single street, not too busy so Latent noise appears to be ok.

3/5 give us a circle, but 2 give us something completely different. Could be a problem with over-training, or with the encoder not understanding what we mean with a circle.

Waving at the viewer seems to give us AWFUL hands. Also no hands at all in 2 images.

Lets compare this to the UMI model. The overview first:

Already looks a lot better on first glance, so lets go through it from top to bottom.

We still get Jennifer Lawrence, but not realistic images, but a bit more of an anime style which is the bias this model has. No cropped images so already much better too in that regard. These images also are much better body integrity even at this point.

Good body proportion, no NSFW. This model has no NSFW bias and makes great looking bodies.

We ask for NSFW and get NSFW. This model has no SFW bias. Bodies here are great.

Streets are kind of noisy, lots of people and houses, so could be a slightly noisier than default SD1.5.

We get circles in all images, so that's nice. However the inside of the circles are a bit noisy in 3 out of 5 of them, so it could indicate some corruption.

Hands, and even 5 Finger in most of them, yay!

Conclusions: The tests are by no means a "THIS MODEL IS BETTER THAN THAT MODEL" but rather a study on what models do well, and what models do bad. If you want to generate a lot of fruit baskets, then I would not do these tests, but rather generate fruit baskets.

Last updated