August 9, 2024

A Quick Comparison of Text-to-Image Models: Flux, Stable Diffusion 3, DALL·E 3, and Kling

Last week, a new state-of-the-art text-to-image model called Flux was released by Black Forest Labs (the original creators of Stable Diffusion), which is open-sourced and offers capabilities comparable to Midjourney. Curious about its quality compared to other models, I conducted a quick one-shot generation test for the following models (prices are estimated based on official pricing websites and replicate.com):

Model Name Company Type Price per Image
Flux Schnell Black forest labs Open Source $0.003 / image
Flux Pro Black forest labs Open Source $0.055 / image
Stable Diffusion 3 Stability.ai Open Source $0.035 / image
Dalle 3 OpenAI Closed Source $0.040 / image
Kling KuaiShou Closed Source $0.002 / image

I used the following prompt for general image with an artist style:

a surreal landscape with floating islands and a giant glowing moon in the style of Hayao Miyazaki

and another prompt to test the text generation:

gateau cake spelling out the words “Takin.AI”, tasty, food photography, dynamic shot

The testing results are listed below.

You can use text2image models such as Flux, SD3, Dalle3, and ControlNets with one single account from Takin.ai - start with a free account to try the examples in this post.

Flux Schnell (fastest - only took 1.3 second):

Flux Pro (took about 8.1 second):

Dalle 3:

SD 3:

Kling:

PS. The featured image for this post is generated using HiddenArt tool from Takin.ai.