Clip Skip

Clip Skip is kinda hard to explain and understand. But the shortest explanation is that when you feel like the AI isn't following your prompts or ignoring your Lora, try turning up Clip Skip gradually.

Going too high will break your images. Generally you shouldn’t need more than Clip Skip 3. But the max you can go is 12.

Some Models and Loras are trained to work at Clip Skip 2, so they work better at that setting. And when adding multiple Loras higher values can help pull things together more consistently.

Some examples may make this easier to understand.

Examples

I'll use these settings in both tests, with and without Lora. Clip Skip 1-5

Positive Prompt: best quality, highest detail, 8k, swamp, fireflies, Shrek, reeds

Negative Prompt: mutated, deformed, amateur drawing, lowres, worst quality, low quality, jpeg artifacts, text, error, signature, watermark, username, blurry, censorship, sketch, monochrome

Steps: 30, Sampler: Euler a, CFG scale: 7, Seed: 5, Size: 512x512, Model hash: 69528490df, Model: anything-v4.0-pruned

Shrek Lora; https://civitai.com/models/16615

You can see how Clip Skip varies the results a bit without Loras enabled. Not even getting the Lora result until Clip Skip 2-3 But after activating the Lora it made a huge difference in getting a desirable result.

More Detailed Explanation of Clip Skip

The Clip part of your base model, it's a neural network made up of 12 layers, with 768 nodes per layer.
This neural network takes your text prompts and turns them into images by comparing it to data in that network of other images. So the AI can say, "These are what red sneakers should be." Or, "This is what you mean by 'a warm summer night.'"
Every layer of Clip Skip takes off one of those layers from the end, working back towards the input every number you go up. So the more you add the less it listens to you, because it has no context to understand your prompts. Understanding what is in those layers is not easy to figure out, because there's 768 of the image encoder 'nodes'
Higher values can mean less conflict with prompts or embeddings trying to do opposing things, but Clip Skip 1 will be far more obedient to what you ask for.

PreviousSampler NextSeed

Last updated 2 years ago