With GPT-4o, ChatGPT can generate art with readable text

Key Takeaways

ChatGPT has a new model, GPT-4o, and it can integrate text, visuals, and audio.
More importantly, the new model allows correct text placement on generated images.
GPT-4o’s updated capabilities could help with graphic design. It also doesn’t require a paid subscription, and there’s a desktop app.

ChatGPT can spit out text and, with the DALL-E integration, churn out images, but ask the artificial intelligence platform to combine the two and the result is typically an unreadable, jumbled mess. That’s changing, however, with the move to ChatGPT’s GPT-4o or Omni. While OpenAI’s demonstration on May 13 focused on using the end-to-end text, vision, and audio capabilities to have a real-time conversation, the update could bring key graphic design capabilities to ChatGPT. Early demos show the AI not just generating images that have legible, correctly-spelled text, but using an existing image of a person to replicate that face in the new image.

8 ways ChatGPT Plus is better than Gemini Advanced and Copilot Pro

ChatGPT Plus faces stiff competition from Gemini Advanced and Copilot Pro. But I’ve tried all three AI subscription services. Here’s how it’s better.

GPT-4o’s approach to text, visuals, and audio

Everything is integrated into a single model

The key change coming with the launch of GPT-4o is the ability to both input and generate any mix of text, audio, and images. That’s because OpenAI trained a new model end-to-end that works across text, vision, and audio. Previously, GPT-4 would use separate models for audio, text, and images. With everything integrated into a single model, OpenAI explains that ChatGPT doesn’t lose information between models, which opens up a number of new possibilities.

I tried ChatGPT Plus. Here’s everything it can do

ChatGPT Plus is for those who want a reliable, efficient ChatGPT experience. But, for $20 monthly, what else does it include?

While the live demo on May 13 focused on how that single end-to-end model allows you to use video to solve homework problems or have a real-time audio conversation, it also helps correct something the AI model is notoriously bad at: Placing text on an image. GPT-4 can attempt to place text, but it typically results in misspellings, even when you tell the chatbot exactly how to spell it.

ChatGPT was able to generate images with legible, correctly spelled text taken from the prompt.

In several samples of the upcoming GPT-4o’s capabilities, the AI was able to place writing on an image of a typewriter, create a graphic with a poem, and create a movie poster. In the demonstrations, the wording was given to the AI, with misspellings in generated text not explicitly spelled out. But ChatGPT was able to generate images with legible, correctly spelled text taken from the prompt.

chatgpt-gpt-4o-image-with-text-6 — OpenAI

You can use real faces in generated images

Imagine making a movie poster with actors’ faces

In one demonstration, ChatGPT created a movie poster with the actors’ faces on it along with the correctly spelled text. This was made possible by uploading the photos of the actors and spelling out the text to include. While some AI platforms can create a new photo with a real person’s face, ChatGPT wasn’t previously able to create a photo that had much likeness to the original.

ChatGPT created a movie poster with the actors’ faces on it along with the correctly spelled text.

In another deomstration, the chatbot was able to place the OpenAI logo on an image. Another tasked the bot with creating a concrete poem where the word Omni appeared in the shape of the OpenAI logo.

The generated images in OpenAI’s demonstrations are not perfect — when asked to take one correctly spelled poem image to dark mode, the software generates some misspellings. But the demonstration shows a much more legible, sensible result than the nonsensical way that GPT-4 generates text on images.

What you should know about ChatGPT Voice: How it works, what it can do and more

Having a voice conversion with ChatGPT is a completely different experience – and one that you really need to try.

The software’s new capabilities in handling a mix of text-photos-speech also allow it to answer questions about a photo and extract text from images.

The demonstrations suggest ChatGPT could have more capabilities in graphic design with the launch of GPT-4o over the next few weeks. However, those capabilities could have some consequences. One of the easiest ways to tell if an image was generated by AI is to look at things like street signs or laptop screens where text appears jumbled. If AI learns to spell on images, that’s one less feature to signal the authenticity of an image floating around the web.

The end-to-end model integration text vision and audio also comes with faster speed, more features without a paid subscription, and a desktop app for Mac. OpenAI says that GPT-4o will roll out over the next few weeks.

With GPT-4o, ChatGPT can generate art with readable text

Key Takeaways

8 ways ChatGPT Plus is better than Gemini Advanced and Copilot Pro

GPT-4o’s approach to text, visuals, and audio

Everything is integrated into a single model

I tried ChatGPT Plus. Here’s everything it can do

You can use real faces in generated images

Imagine making a movie poster with actors’ faces

What you should know about ChatGPT Voice: How it works, what it can do and more

FAQ

Q: When will GPT-4o be available and how much does it cost?

Cooler Master MasterBox Q300L Micro-ATX Tower with Magnetic Design Dust Filter, Transparent Acrylic Side Panel…

ASUS TUF Gaming GT301 ZAKU II Edition ATX mid-Tower Compact case with Tempered Glass Side Panel, Honeycomb Front Panel…

ASUS TUF Gaming GT501 Mid-Tower Computer Case for up to EATX Motherboards with USB 3.0 Front Panel Cases GT501/GRY/WITH…

be quiet! Pure Base 500DX Black, Mid Tower ATX case, ARGB, 3 pre-installed Pure Wings 2, BGW37, tempered glass window

ASUS ROG Strix Helios GX601 White Edition RGB Mid-Tower Computer Case for ATX/EATX Motherboards with tempered glass…

Corsair 5000D Airflow Tempered Glass Mid-Tower ATX PC Case – Black

CORSAIR 7000D AIRFLOW Full-Tower ATX PC Case, Black

Bgears b-Voguish Gaming PC with Tempered Glass ATX Mid Tower, USB3.0, Support E-ATX, ATX, mATX, ITX. (Note: Fan NOT…

Phanteks (PH-EC360ATG_DWT01) Eclipse P360A Ultra-fine Performance Mesh, Mid-Tower case, Tempered Glass, Digital-RGB…

Corsair iCUE 4000X RGB Mid-Tower ATX PC Case – White (CC-9011205-WW)

Workout Hacks for Busy Moms

Baked Chicken Breasts | Cookies & Cups

PINEAPPLE BBQ CHICKEN – The Southern Lady Cooks

Cilantro-Lime Chicken Quesadillas

Leave a reply Cancel reply

Compare items

Shopping cart