DeepSeek, the Chinese AI test that just upended industry assumptions about market development costs, has released a new family of open-source bidirectional AI models that apparently outperform OpenAI’s DALL-E 3 on essential benchmarks.
The model, which is dubbed Janus Pro, has sizes of 1 billion ( extremely small ) and 7 billion ( approximately SD 3. 5L ) and can be downloaded right away from Huggingface’s machine learning and data science hub.
The largest type, Janus Pro 7B, beats no just OpenAI’s DALL-E 3 but also other leading types like PixArt-alpha, Emu3-Gen, and SDXL on business measures GenEval and DPG-Bench, according to data shared by DeepSeek AI.
Its release comes shortly after DeepSeek’s R1 language model, which matched GPT-4’s capabilities for only$ 5 million to develop, sparked a contentious debate about the state of the AI industry right now.
The Chinese company’s solution has also sparked sector-wide concerns that it could dethrone incumbents and reverse Nvidia’s upward trend of growth, which had suffered on Monday with the largest single-day market cap loss in history.
DeepSeek’s Janus Pro type uses what the company calls a “novel stochastic model” that decouples sensory processing into distinct pathways while maintaining a single, integrated transformer architecture.
This style allows the design to both examine images and create images at 768×768 resolution.
In its launch release document, DeepSeek claimed that Janus Pro outperforms past integrated models and matches or exceeds the effectiveness of task-specific models. ” The simplicity, great agility, and performance of Janus Pro make it a strong candidate for next-generation integrated bidirectional versions”.
The company continued its practice of open-sourcing releases, which contrasts starkly with the finished, proprietary approach of U.S. tech giants, by publishing a full whitepaper on the model, but it did release its technical documentation and, as with DeepSeek R1, made the model available for immediate download for free of charge.
But, what’s our conviction? Also, the model is very flexible.
Nevertheless, don’t expect it to remove any of the most specific designs you love. It can make words, examine images, and create photos, but when pitted against models that simply do one of those things also, at best, it’s on par.
Testing the model
Note that there is no immediate way to use traditional UIs to run it—Comfy, A1111, Focus, and Draw Things are not compatible with it right now. This implies that setting up the model locally is a little difficult, and that using text commands in a terminal is required.
However, some Hugginface users have created spaces to try the model. DeepSeek’s official space is not available, so we recommend using NeuroSenko’s free space to try Janus 7b.
Be aware of what you do, as some titles may be misleading.  , For example, the , Space run by AP123 says it runs Janus Pro 7b, but instead runs Janus Pro 1.5b—which may end up making you lose a lot of free time testing the model and getting bad results.  , Trust us: we know because it happened to us.
Visual understanding
The model is skilled in visual analysis and is able to accurately represent the components in a photo.
It demonstrated a good sense of space and the relationship between various objects.
Additionally, it is more accurate than LlaVa, the most well-known open-source vision model, because it can give more accurate descriptions of scenes and how the user interacts with the user based on visual prompts.
However, it is still inferior to GPT Vision, especially for tasks that call for some additional logic or analysis beyond what is clearly visible in the image. For instance, we asked the model to evaluate and explain the message of this image.
The model responded,” The image appears to be a humorous cartoon that depicts a scene where a woman is licking the end of a long, red tongue that is attached to a boy.”
The overall tone of the image appears to be lighthearted and playful, possibly suggesting a scenario where the woman is performing a mischievous or teasing act, according to it.
The model frequently fails in these situations where there is more logic than just a simple description required.
On the other hand, ChatGPT, for example, actually understood the meaning behind the image:” This metaphor suggests that the mother’s attitudes, words, or values are directly influencing the child’s actions, particularly in a negative way such as bullying or discrimination”, it concluded—accurately, shall we add.
a league of its own
Image generation appears robust and accurate, but it does call for careful instruction to produce good results.
DeepSeek claims Janus Pro beats SD 1.5, SDXL, and Pixart Alpha, but it’s important to emphasize this must be a comparison against the base, non fine-tuned models.
In other words, the only true comparison is between the worst versions of the current models because, arguably, nobody uses a base SD 1.5 for creating art when hundreds of fine tunes are available to produce results that can compete with even the most advanced models like Flux or Stable Diffusion 3. 5.
The generations are not particularly high-quality, but they do seem to be better than what SD1. 5 or SDXL did when they first came out.
For example, here is a face-to-face comparison of the images generated by Janus and SDXL for the prompt:
In terms of understanding the fundamental idea, Janus surpasses SDXL in producing a baby fox as opposed to a mature fox, as in SDXL’s case.
It also understood the photorealistic style better, and the other elements ( fluffy, cinematic ) were also present.
Despite not sticking to the prompt, SDXL produced a crisper image. The overall quality is better, the eyes are realistic, and the details are easier to spot.
This pattern was consistent in other generations: good prompt understanding but poor execution, with blurry images that feel outdated considering how good current state-of-the-art image generators are.
However, it’s important to note that Janus is a multimodal LLM capable of generating text conversations, analyzing images, and generating them as well. Flux, SDXL, and the other models aren’t built for those tasks.
Janus is therefore much more versatile at its core—just not great at anything in comparison to specialized models who are exceptionally good at a particular task.
Being open-source, Janus’s position as a leader among those who enjoy generative AI will depend on a slew of updates that attempt to enhance those aspects.
edited by Sebastian Sinclair and Josh Quittner
Generally Intelligent Newsletter
A generative AI model’s generative AI model, Gen, tells a weekly AI journey.