2.2 | Overview: two types of GenAI
In this module, you will get an overview of two fundamental types of generative AI models. You will learn the differences between language and image processing models, understand their functionalities, and discover typical application areas for both technologies.
What you already know & what you will learn
1. Language Models (LLMs)
Definition & How They Work
Large Language Models (LLMs) analyze vast amounts of text to learn statistical patterns. This allows them to understand contexts and generate human-like text for a wide variety of tasks.
2. Image Models
Definition & How They Work
Image generation models convert text prompts into images, often through a „diffusion process“: starting from noise, an image is gradually formed that matches the textual description.
From Knowledge to Application: Practical Tips
Tip 1: Engage in a Dialogue
Treat language models like ChatGPT as a conversational partner. Start simple and gradually refine the result in a dialogue until it perfectly fits your requirements.
Tip 2: Choose the Right Tool for Image Enhancement
For professional results, the targeted improvement of an image (iterative design) is crucial. Depending on the requirement, there are suitable tools for this.
For a quick start:
- ChatGPT (with DALL-E): Ideal for refining an image in a dialogue. After the first generation, simply give follow-up instructions like „Change the main color to blue“ or „Add a person in the background.“
- Canva („Text to Image“): Perfect for integration into your daily design workflow. You generate an image and immediately continue editing it in the context of your presentation or social media post.
For advanced users & maximum control:
- Midjourney: Offers powerful commands for variation, panning, and zooming to precisely develop an idea and control the composition.
- Adobe Photoshop (with Generative Fill): The first choice for professionals who need to integrate, retouch, or extend AI-generated elements into existing images with pixel-perfect precision.
Tip 3: Create a Consistent Brand Look
For professional branding, images (e.g., for a blog series) must have a consistent style. This is achieved by fixing a „seed“. The seed is the random starting point for the image generation.
The workflow is simple:
- Create a „key visual“ that perfectly captures the desired style.
- Fix its seed number and use it for all subsequent images.
- Only change the subject part of the prompt.
Result: You get different subjects, but they all share the same visual character. Platforms like Midjourney or specialized business solutions, such as the image generator of the xpandAI Platform, offer such functions to ensure a persistent visual language.
Tip 4: Combine Language and Image Models
Use a language model as an intelligent assistant to develop a first-class prompt for an image model.
Example – Instruction for ChatGPT:
I need a highly detailed prompt for an image model. The goal is a header image for the "About Us" page of a tech consulting firm.
- Subject: A diverse team working in a bright, modern office.
- Action: They are discussing around a conference table with a glowing, holographic data graphic.
- Style: Photorealistic, professional camera, slightly blurred background (bokeh).
- Colors: The office furniture is neutral; the data graphic should incorporate our brand colors (vibrant blue and silver).
Create a concise, English prompt from this.
Your Takeaway
- Generative AI is divided into language models for text and image models for visuals.
- For professional results, iterative improvement is crucial. Choose the right tool for the job, from accessible entry-level options to professional tools.
- Use techniques like fixing a „seed“ to ensure a consistent visual language for your brand across multiple images.
- Use language models as strategic partners to develop first-class prompts for image models and thus achieve your goals faster.