Rendered at 17:56:57 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
mattnewton 2 days ago [-]
Hi HN, we're releasing weights for our latest text to image model and publishing this writeup on how we trained it in quite a bit of depth.
I hope there is something in the report for everyone, we included a fair bit on the actual training and data infrastructure usually not written about much, that I think will be interesting to people here. There's more that didn't fit, happy to answer questions!
ttul 1 days ago [-]
This is a massive technical report for an open weights image gen model. As someone who has followed this space closely, it’s really cool to read about the behind-the-scenes experimentation and effort that went into the final product. I hope you will release some of the find tuning tools so the community can experiment with them as well and really push what the model’s capable of.
mattnewton 1 days ago [-]
You can find some links and details in the GitHub readme for finetuning / LoRA support. Ostiris, musubi tuner, fal and hugging face diffusers are all day-0 supported :)
https://github.com/krea-ai/krea-2
We recommend training off the undistilled, Raw checkpoint, and then applying the LoRA to the Turbo model for inference.
ttul 1 days ago [-]
It's pretty great that you are providing the undistilled model on day 0. Here's a pro-tip: With Flux.2 Klein, someone created a turbo slider LoRA - basically a diff of the turbo 9B model vs. the undistilled 9B model. What's great about this LoRA is that you can sample using a heavier weighting of the undistilled weights during early sampling steps and then finish the sampling off with mostly the distilled weights. The result is a better "finish" (taking advantage of the distilled model's refinement for image quality) without sacrificing the undistilled model's greater ability to adhere to the prompt, because the undistilled model doesn't have to devote its weights so much to looking good.
dvrp 1 days ago [-]
Thanks! You should definitely check out the r/stablediffusion sub-reddit; people are going crazy over it!
We also had 0-day support from people like Ostris and ComfyUI from the open source community
Taek 1 days ago [-]
What is Krea's approach to content such as pornography and gore? It's been frustrating to see all of the leading models take a very hard line on excluding vice content, even when it is perfectly legal, in the name of safety.
kouteiheika 19 hours ago [-]
> What is Krea's approach to content such as pornography and gore?
You mean whether it's censored? Of course it is. They even say so themselves[1]:
> The open source version needed to go through some alignment training so there might be some inconsistencies between closed / open version.
The "alignment training" is pretty much a code word for censorship/lobotomy, because unlike the API version they can't slap on a safe/unsafe classifier on an uncensored model. Of course, just to be clear: I don't blame. If they don't do this then the next post we'd see is them being in hot water. (Remember what happened with Grok? That's what happens when you don't take a hard line here.)
> If they don't do this then the next post we'd see is them being in hot water. (Remember what happened with Grok? That's what happens when you don't take a hard line here.)
I disagree.
Grok is a hosted service that can use classifiers and has been personally guided by Elon Musk directly before, that was choosing to generate and distribute child porn for users. Then after getting in trouble, had the gall to continue doing it behind a paywall.
b112 7 hours ago [-]
that was choosing to generate and distribute child porn for users.
This is a slant, a lie of presumption and ignorance.
Ignorance of what "open" means, presumption of assigning intent without validation.
Some believe that lobotomizing models leads to suboptimal results overall. Some also believe that the user, not the tool is responsible.
In these contexts, you'd better call a paint manufacturing company "choosing to enable child porn".
I am so sick of US style politics, with its insane labelling by both parties, and their worshippers, with intent to besmirch on every breath, because someone doesn't agree with them in some narrow political context.
From outside the US, most can barely tell the difference between a democrat and a republican. Both sides are slimy, smarmy, dishonest, without honour, and spew lies and bull about anyone they don't agree with.
It's absolutely disgusting.
vunderba 1 days ago [-]
Neat! Between Ideogram4, Flux2, Qwen-Image, ZiT, and Krea - there's been a lot of positive movement in the open-weights space.
The original Flux.1 Krea is actually in my GenAI Showdown benchmark site from all the way back in July of last year (which feels like a lifetime in this space), so I’m looking forward to putting this new one through its paces.
dvrp 1 days ago [-]
Hello HN,
I am Diego Rodriguez, Co-founder & CTO at Krea.
We are releasing the weights and a _juicy_ technical report---at least given current industry standards. In it we describe data curation/captioning, model architecture, post-training, RL pipelines, prompt expansion, style references, and our infrastructure in great detail.
When it comes to theweights themselves, there's actually 2 releases:
* Krea 2 Turbo. This model is both guidance- and timestep- distilled for faster inference.
* Krea 2 RAW. This model is actually meant to be hackable/fine-tunable
One of the things we think the (open) LLM community does well is release models in different sizes and also at different stages of the training pipelines; we are releasing two checkpoints at both the mid-training and post-training stage. This is rare in the image & multimedia community, so we can't help it but to feel proud of this release.
Some of our team members will be answering questions since we are at the front page for now (thank you HN!).
Happy hacking!
vunderba 20 hours ago [-]
Results are in! This is a really impressive showing especially given how fast the Turbo model at 8 steps. The only locally hostable model that managed to outperform it was Ideogram 4 which is significantly slower (think minutes vs seconds).
It did fall to the usual “model killers”: the nine-pointed star, Count Rugen, the overcrowded flat Earth. But overall, it really punched above its weight class, scoring the highest among locally hostable models and coming in just below Ideogram 4 passing 6 of the 15 tests.
Great job Krea team!
GenAI link to compare locally hostable models only:
> It did fall to the usual “model killers”: the nine-pointed star, Count Rugen, the overcrowded flat Earth.
I'd never heard of text to image model killers so I had a good chuckle at this. Such oddly specific things for us to arrive at as a test method
vunderba 4 hours ago [-]
Haha yeah, the site automatically assigns the term to any benchmark that fewer than 25% of the tested models are able to pass.
What’s more surprising to me is that, unlike the “pelican riding a bicycle” whose objectivity has been slightly compromised as newer models have incorporated it into their training data, the arbitrary-point star has been wiping models out ever since the early days of Flux back in 2024.
I personally love the test because it's something that even an elementary school child with no artistic experience at all can do, but state of the art models struggle heavily.
ACCount37 1 days ago [-]
Good to have more open weight models, and I really appreciate the in-depth write-up.
I also like the "keep the manifold wide" approach of trying to make a model capable of many styles as opposed to getting it "dialed in" for a dozen of style presets.
But it does feel very much like "fighting the past war" - now that advanced "image-to-image"/"agentic composition" models like Nano Banana 2 or Images 2.0 are out there in force.
I seriously doubt that the basic Qwen 3 VL in cross can get anywhere near that level of I2I. And robust I2I is very desirable - editing, adjustment, character consistency, the generalization of whatever you're doing with style transfer now (underexplained BTW).
Trying to hit that level of I2I is not by any means easy, but it's pretty clear to me that this is where the next frontier for image models lies. Feels like Ideogram might be building up to it, but I'm yet to see it anywhere else in open weight space.
dvrp 1 days ago [-]
I appreciate the skepticism but we find internally that this model is used more than Nano Banana for many cases like moodboarding (also, 4x cheaper than NBP never hurts). Agentic workflows are compatible with Krea 2 so I’m not sure I follow there. If you are talking about an edit model, that’s coming too.
Also, we are on par with them in t2i benchmarks, check the artificial analysis link I posted in my top comment.
And you cannot re-train nano banana or ChatGPT to understand your brand, which is what our customers complain about constantly.
Plus open-source! It’s hard to do an apple to apple comparison.
ACCount37 24 hours ago [-]
"Compatible" is one thing - "built for" is a different beast. The difference can be like that between Images 1.0 and Images 2.0 - the sheer leap in compositional capabilities was staggering.
"Edit model" is a part of it, yes. So is style transfer. But less as an endpoint and more of a subset of what advanced I2I enables.
"Re-train to understand your brand" is a fine marketing pitch, but in practical terms, it's hard to justify burning a LoRA for most uses. Enthusiasts absolutely do it, but enthusiasts are built different. Robust I2I can accomplish a lot of the same, but with a workflow that's closer to "drag and drop your references" than to "try to get a LoRA to do what you wanted it to do on a very slim set of images".
Modern LoRA pipelines are getting closer to "reliable" and "braindead simple", but you can't escape the "wait N hours for the GPUs to churn" of fine tune no matter what you do. And iteration time kills - a lot of the value of AI in workflows is that it does what it does fast and allows you to iterate at speed.
You can think of "LoRA vs I2I" as of an image twin of "SFT vs in-context learning" of LLM land. Both are useful, neither substitutes for the other fully, but there's a reason why most reach for the latter way before they reach for the former.
I like the T2I from what I've seen, mind. Perhaps more than Images 2.0 or even NB2. I just think that focusing solely on T2I to the exclusion of advanced editing and composition capabilities is a very 2024 thing.
dvrp 22 hours ago [-]
"it's hard to justify burning a LoRA for most uses" -> Not really, it's literally cheaper on Krea than using ChatGPT Images; NBP and GPT-Images 2.0 are quite expensive, you'd be surprised. LoRAs are one of our most stickiest features (this doesn't mean they are intuitive; this just means that customers who use it, suddenly are retained way more because of how much better their images become). But yeah, anything out there doesn't offer a nice training UIs like Krea where you can just drag-and-drop a moodboard and get a LoRA in a few minutes. It literally takes only a few minutes on Krea; definitely not "N hours for GPUs to churn".
This model does image to image; whats the issue with Qwen 3 VL; is style transfer unexplained? " reference" is mentioned 11 times on the page (more specifically, I read it and it seemed to discuss it a lot)
> 2.3 Revenue Threshold for Commercial Use. Commercial Use under this Agreement of the Krea Model, Derivatives, or Outputs is permitted only if you (including all affiliated entities under common ownership or control) have total company-wide annual revenue of less than one million United States dollars ($1,000,000 USD), calculated on a trailing twelve-month basis and including all revenue from
all sources. If you meet or exceed this threshold, you must obtain a separate enterprise license from Krea prior to any Commercial Use. If your revenue meets or exceeds this threshold at any time during your use of the Krea Model under this Agreement, you must immediately cease Commercial Use and contact Krea. Enterprise license inquiries may be directed to opensource@krea.ai.
> 4.1 General Restrictions. You shall not, and shall not permit any third party to: (a) Use the Krea Model, any Derivative, or any Output in violation of applicable law, regulation, this Agreement, or the Acceptable Use Policy;
> 4.2 Content Filtering Requirement. You must implement reasonable and appropriate Content Filter measures to detect, prevent, and mitigate the generation or distribution of prohibited, harmful, or unlawful content through your deployment of the Krea Model or any Derivative. Such measures may include, but are not limited to: (a) open-source content classifiers, such as Falconsai/nsfw_image_detection, NudeNet, or CompVis safety checker; (b) commercial content
moderation APIs, such as Hive Moderation or Microsoft Azure AI Content Safety; (c) manual human review processes; and/or (d) any combination of the foregoing or other technically appropriate measures.
> 4.4 Acceptable Use Policy Compliance. You must comply with the Acceptable Use Policy, which is incorporated herein by reference.
> You shall not use or allow others to use the Krea 2 Raw Model or Krea 2 Turbo Model, any Derivative, or any Output for any of the following purposes:
> (8) Circumventing or removing any safety measures, usage restrictions, content filters, content provenance, or watermarking mechanisms implemented by Krea or any deployer;
It's a good model sadly the use of the qwen vae is a bit of a downer.
mattnewton 1 days ago [-]
Krea 2 Large (on the website and api) was trained with the FLUX 2 VAE, if you want to test it out and push realism. After working with both I think the flux VAE has a slight edge in learning realistic textures but it's smaller than you might think, the Qwen VAE was overall very good in ablations and good at learning to produce a diverse set of styles.
Edit: your account has unfortunately been breaking the site guidelines like this in other places as well (e.g. https://news.ycombinator.com/item?id=48567675). Can you please fix this? I don't want to ban you, but we've already had to ask you this before.
BoredPositron 1 days ago [-]
Kill the account dang.
TheSpiceIsLife 22 hours ago [-]
[dead]
mattnewton 1 days ago [-]
Definitely encourage you to test the models. We tried to optimize for realistic focus and not over-sharpening, which leads to a "hyper" AI-look.
It's hard to benchmark because people generally prefer sharp, saturated orangish pictures all else equal, but I believe these are bad shortcuts for the model to learn realism.
BoredPositron 1 days ago [-]
Is my taste the problem, or am I simply holding it wrong? The qwen VAE's shortcomings are well documented, and Krea 2 produces the same blurry, airbrushed output as qwen image. Between the chaotic release and every interaction I've had with your team, I've grown to genuinely dislike your platform/company. Good luck.
mobiuscog 1 days ago [-]
It's been mentioned by some that using the wan2.1 vae instead solves this.
I haven't personally had time to try yet.
dvrp 1 days ago [-]
There is a lot of discourse about it on Reddit. Check the AMA link I put in the comment above for learning more. The basics is it wasn’t released when we started and we use it for internal models and hope to do further open source releases.
pwython 1 days ago [-]
Looking forward to playing with Krea 2, I use Z-Image Turbo daily -- it has replaced my stock photo subscriptions, for realism and illustrations.
May I ask how much did the training cost you?
sangwulee 1 days ago [-]
A lot of coffee for sure. Regarding the training cost, it's hard to give a good estimate because we used a shared kubernetes cluster with inference + research workloads.
Pxtl 17 hours ago [-]
What are people using for self hosting these? I tried ollama with open-webui and it didn't support image generation at all.
kadoban 15 hours ago [-]
I haven't done this model yet, but comfyui will definitely support it and I've found it a nice interface once you get used to it. Copy/paste a workflow to start with if you're lost.
Eisenstein 8 hours ago [-]
Koboldcpp supports image generation, but you will have to wait for the next release for Krea2 support.
I hope there is something in the report for everyone, we included a fair bit on the actual training and data infrastructure usually not written about much, that I think will be interesting to people here. There's more that didn't fit, happy to answer questions!
We recommend training off the undistilled, Raw checkpoint, and then applying the LoRA to the Turbo model for inference.
We also had 0-day support from people like Ostris and ComfyUI from the open source community
You mean whether it's censored? Of course it is. They even say so themselves[1]:
> The open source version needed to go through some alignment training so there might be some inconsistencies between closed / open version.
The "alignment training" is pretty much a code word for censorship/lobotomy, because unlike the API version they can't slap on a safe/unsafe classifier on an uncensored model. Of course, just to be clear: I don't blame. If they don't do this then the next post we'd see is them being in hot water. (Remember what happened with Grok? That's what happens when you don't take a hard line here.)
[1] -- https://www.reddit.com/r/StableDiffusion/comments/1udnm0a/we...
I disagree.
Grok is a hosted service that can use classifiers and has been personally guided by Elon Musk directly before, that was choosing to generate and distribute child porn for users. Then after getting in trouble, had the gall to continue doing it behind a paywall.
This is a slant, a lie of presumption and ignorance.
Ignorance of what "open" means, presumption of assigning intent without validation.
Some believe that lobotomizing models leads to suboptimal results overall. Some also believe that the user, not the tool is responsible.
In these contexts, you'd better call a paint manufacturing company "choosing to enable child porn".
I am so sick of US style politics, with its insane labelling by both parties, and their worshippers, with intent to besmirch on every breath, because someone doesn't agree with them in some narrow political context.
From outside the US, most can barely tell the difference between a democrat and a republican. Both sides are slimy, smarmy, dishonest, without honour, and spew lies and bull about anyone they don't agree with.
It's absolutely disgusting.
The original Flux.1 Krea is actually in my GenAI Showdown benchmark site from all the way back in July of last year (which feels like a lifetime in this space), so I’m looking forward to putting this new one through its paces.
I am Diego Rodriguez, Co-founder & CTO at Krea.
We are releasing the weights and a _juicy_ technical report---at least given current industry standards. In it we describe data curation/captioning, model architecture, post-training, RL pipelines, prompt expansion, style references, and our infrastructure in great detail.
When it comes to theweights themselves, there's actually 2 releases:
* Krea 2 Turbo. This model is both guidance- and timestep- distilled for faster inference.
* Krea 2 RAW. This model is actually meant to be hackable/fine-tunable
One of the things we think the (open) LLM community does well is release models in different sizes and also at different stages of the training pipelines; we are releasing two checkpoints at both the mid-training and post-training stage. This is rare in the image & multimedia community, so we can't help it but to feel proud of this release.
We are on par with Nano Banana in terms of image quality as per Artificial Analysis text-to-image benchmarks (https://artificialanalysis.ai/image/leaderboard/text-to-imag...).
We also attached a permissive license for individuals and small businesses.
Useful links:
- Marketing page around the OSS release: https://www.krea.ai/krea-2-open-source
- Huggingface model: https://www.krea.ai/krea-2/huggingface
- GitHub repository: https://www.krea.ai/krea-2/github
- Reddit AMA: https://www.reddit.com/r/StableDiffusion/comments/1udnm0a/we...
- Technical report: https://www.krea.ai/blog/krea-2-technical-report Thank you and I hope you enjoy this release---happy hacking!
Some of our team members will be answering questions since we are at the front page for now (thank you HN!).
Happy hacking!
It did fall to the usual “model killers”: the nine-pointed star, Count Rugen, the overcrowded flat Earth. But overall, it really punched above its weight class, scoring the highest among locally hostable models and coming in just below Ideogram 4 passing 6 of the 15 tests.
Great job Krea team!
GenAI link to compare locally hostable models only:
https://genai-showdown.specr.net/?models=fd,hd,kd,qi,f2d,zt,...
I'd never heard of text to image model killers so I had a good chuckle at this. Such oddly specific things for us to arrive at as a test method
What’s more surprising to me is that, unlike the “pelican riding a bicycle” whose objectivity has been slightly compromised as newer models have incorporated it into their training data, the arbitrary-point star has been wiping models out ever since the early days of Flux back in 2024.
I personally love the test because it's something that even an elementary school child with no artistic experience at all can do, but state of the art models struggle heavily.
I also like the "keep the manifold wide" approach of trying to make a model capable of many styles as opposed to getting it "dialed in" for a dozen of style presets.
But it does feel very much like "fighting the past war" - now that advanced "image-to-image"/"agentic composition" models like Nano Banana 2 or Images 2.0 are out there in force.
I seriously doubt that the basic Qwen 3 VL in cross can get anywhere near that level of I2I. And robust I2I is very desirable - editing, adjustment, character consistency, the generalization of whatever you're doing with style transfer now (underexplained BTW).
Trying to hit that level of I2I is not by any means easy, but it's pretty clear to me that this is where the next frontier for image models lies. Feels like Ideogram might be building up to it, but I'm yet to see it anywhere else in open weight space.
Also, we are on par with them in t2i benchmarks, check the artificial analysis link I posted in my top comment.
And you cannot re-train nano banana or ChatGPT to understand your brand, which is what our customers complain about constantly.
Plus open-source! It’s hard to do an apple to apple comparison.
"Edit model" is a part of it, yes. So is style transfer. But less as an endpoint and more of a subset of what advanced I2I enables.
"Re-train to understand your brand" is a fine marketing pitch, but in practical terms, it's hard to justify burning a LoRA for most uses. Enthusiasts absolutely do it, but enthusiasts are built different. Robust I2I can accomplish a lot of the same, but with a workflow that's closer to "drag and drop your references" than to "try to get a LoRA to do what you wanted it to do on a very slim set of images".
Modern LoRA pipelines are getting closer to "reliable" and "braindead simple", but you can't escape the "wait N hours for the GPUs to churn" of fine tune no matter what you do. And iteration time kills - a lot of the value of AI in workflows is that it does what it does fast and allows you to iterate at speed.
You can think of "LoRA vs I2I" as of an image twin of "SFT vs in-context learning" of LLM land. Both are useful, neither substitutes for the other fully, but there's a reason why most reach for the latter way before they reach for the former.
I like the T2I from what I've seen, mind. Perhaps more than Images 2.0 or even NB2. I just think that focusing solely on T2I to the exclusion of advanced editing and composition capabilities is a very 2024 thing.
Learn more here: https://www.krea.ai/blog/krea-2-lora-training.
> 2.3 Revenue Threshold for Commercial Use. Commercial Use under this Agreement of the Krea Model, Derivatives, or Outputs is permitted only if you (including all affiliated entities under common ownership or control) have total company-wide annual revenue of less than one million United States dollars ($1,000,000 USD), calculated on a trailing twelve-month basis and including all revenue from all sources. If you meet or exceed this threshold, you must obtain a separate enterprise license from Krea prior to any Commercial Use. If your revenue meets or exceeds this threshold at any time during your use of the Krea Model under this Agreement, you must immediately cease Commercial Use and contact Krea. Enterprise license inquiries may be directed to opensource@krea.ai.
> 4.1 General Restrictions. You shall not, and shall not permit any third party to: (a) Use the Krea Model, any Derivative, or any Output in violation of applicable law, regulation, this Agreement, or the Acceptable Use Policy;
> 4.2 Content Filtering Requirement. You must implement reasonable and appropriate Content Filter measures to detect, prevent, and mitigate the generation or distribution of prohibited, harmful, or unlawful content through your deployment of the Krea Model or any Derivative. Such measures may include, but are not limited to: (a) open-source content classifiers, such as Falconsai/nsfw_image_detection, NudeNet, or CompVis safety checker; (b) commercial content moderation APIs, such as Hive Moderation or Microsoft Azure AI Content Safety; (c) manual human review processes; and/or (d) any combination of the foregoing or other technically appropriate measures.
> 4.4 Acceptable Use Policy Compliance. You must comply with the Acceptable Use Policy, which is incorporated herein by reference.
The acceptable use policy is on the website (https://www.krea.ai/krea-2-use-policy) and includes:
> You shall not use or allow others to use the Krea 2 Raw Model or Krea 2 Turbo Model, any Derivative, or any Output for any of the following purposes:
> (8) Circumventing or removing any safety measures, usage restrictions, content filters, content provenance, or watermarking mechanisms implemented by Krea or any deployer;
Please edit out swipes from your HN comments, as the guidelines request: https://news.ycombinator.com/newsguidelines.html.
Edit: your account has unfortunately been breaking the site guidelines like this in other places as well (e.g. https://news.ycombinator.com/item?id=48567675). Can you please fix this? I don't want to ban you, but we've already had to ask you this before.
May I ask how much did the training cost you?
* https://github.com/LostRuins/koboldcpp
I tried two of the Krea 2 models in LM Studio, but loading the downloaded models errored out. (Maybe I'm doing it wrong, since it's an image model.)
Previously: https://news.ycombinator.com/item?id=47800562
you can try it right away at krea.ai/image (warning: you need to sign-up)