How Can Designers Make Generative AI Interfaces More Usable

Kendall Jeong

Summer Associate, Design

Read Time

8 min read

Published On

July 25, 2023

In recent years, artificial intelligence (AI) has stepped into the limelight due to the rapid advancement and efficacy of generative AI technology. Generative AI refers to the development of systems capable of autonomously generating complex and realistic content, such as images, videos, music, and text, through deep learning techniques.

The impressive capabilities of new generative tools like ChatGPT, Google’s Bard, and DALLE-2 have particularly captured the attention of the public. These interfaces make it easy for anyone with access to a computer or smartphone to directly interact with an AI model. Not to mention, said AI models have become more powerful than ever due to recent breakthroughs in machine learning, neural networks, and the vast amounts of data now available for training.

The rise of generative AI has opened up exciting possibilities across various industries, from creative arts and entertainment to healthcare and robotics, promising to reshape our world in ways we never thought possible.

What is Prompt-Driven Input?

Generative AI interfaces currently use prompt-driven input. Prompt-driven input refers to the questions, instructions, and information that a user gives to an underlying AI model to specify their intended output. This interaction is typically displayed to the user as a single text field and allows for a truly versatile experience. From drafting business proposals to troubleshooting software bugs to explaining rocket science to a 5-year-old, unlimited output are now possible within a single, compact text field.

Prompt-driven inputs allow generative AI to answer questions by consolidating a sea of online data into bite-sized replies. While AI image/video generation interfaces currently use one-way interactions (user to AI), chatbots incorporate prompt-driven inputs within dynamic conversation threads. These back-and-forth messaging systems are particularly powerful in their ability to mimic natural dialogue and drawback to previous points in the conversation. This is useful for generating content because users can continually build upon a conversation, allowing for the resolution of errors on behalf of the AI model. Not to mention, AI chatbots have recently improved their conversational charm with drastic enhancements to context and tone recognition.

A key challenge with prompt-driven generative AI is its learning curve for articulation. Especially for those with a limited understanding of AI capabilities and even low literacy, it can be difficult to prompt the precise results that generative AI is now capable of. Frustration no longer stems from the chatbot’s inadequacy, but rather from the user’s own limitations. Education about how to use AI can alleviate these pain points, but effective User Experience (UX) design also has the potential to bridge the gap between users’ current understanding and seamless interaction with AI.

Improvement in the sophistication of AI chatbots surely implies necessary improvement to their interfaces. Generative AI design will need to continue evolving to become more usable and widely adopted.

Current Methods for Improving Prompt-Driven AI Design

Explicit Augmentation Features

The versatility of post-modification in AI chatbots certainly has its own benefits but can be ineffective when a user knows that they want to use AI for a specific purpose, like content writing or image generation. Recall the Berry Blast smoothie example. After initially prompting ChatGPT to write a social media caption, I’m unhappy with the results and realize that I should have added the business address, specified the sentence length, and clarified that I want to come up with the hashtags on my own. Now I have to consecutively ask ChatGPT to make these adjustments, which seems easy enough, but what if we could bypass these steps at the outset?

Instead of having to write out individual prompts for augmentations, these actions can be condensed into a simple click with explicit augmentation features.

Explicit augmentation features seem most fundamental to image generation, and Adobe Firefly precisely demonstrates this. Their augmentation features act as a supportive toolkit, allowing the user to make quick alterations without having to type them into the text field. For necessary functions like defining aspect ratio, a drop-down menu with commonly used options makes more sense than having to type that command in each time. The same goes for specifying color/tone and general style. Explicit augmentation features improve efficiency when it comes to image generation by cutting down on the time it takes the user to specify their needs.

Explicit augmentation features can also be adapted for chatbots. For instance, alternative chatbots like ChatFlash and Chatsonic allow their users to select the AI’s personality through a drop-down menu conveniently located above the search bar. Some personalities include math teacher, interviewer, English translator, travel guide, stand-up comedian, SEO author, influencer, journalist, and more. ChatFlash even has a “create your own” personality that enables customization to meet each user’s unique needs.

Chatsonic interface includes a “Current Personality” augmentation feature

Templates

Building on explicit augmentation features, templates are another design choice that can enhance user experience with generative AI. Templates are essentially “fill-in-the-blank” forms that are tailored to write content for different use cases. Multiple fields are formulated to effectively generate the template’s intended output. These fields might include company, product name, product description, tone of voice, audience, keywords, and output length. Instead of the buttons used to select explicit augmentation features, templates rely on text input from the user. The primary difference here from traditional prompts is that input text is now subdivided and organized for optimized output.

Example use case of Jasper’s dynamic template

Jasper templates can also generate image output

Jasper is an AI application that’s successfully adopting this generative AI feature. The company markets itself as “a better AI for business” and offers 50+ hyper-specific templates to its users. These templates can write personal bios, headlines, real estate listings, SEO-optimized title tags, PAS frameworks, and more.

Templates fine-tune generative AI for specialized uses. In Jasper’s case, it enables accurate content writing for business professionals. However, this versatile tool can also enhance the capabilities of generative AI for various other professions and fields. For example, imagine a template designed for medical professionals that could assist doctors in generating accurate and detailed medical reports. Similarly, templates tailored for legal professionals could aid lawyers in drafting contracts or legal documents, providing them with a head start and ensuring accuracy in their work.

The ability to customize AI’s output through templates can enable users in various fields to streamline their work, enhance productivity, and focus on higher-level tasks.

Looking to the Future of Generative AI

As generative AI interfaces continue to evolve, incorporating personalized and context-aware features holds great potential for enhancing user experiences and maximizing efficiency. Contextual suggested searches provide ongoing assistance and better match users’ needs, while explicit augmentation features allow for quick adjustments with a single click. And for specialized use cases, templates subdivide input for optimized, custom output.

Moving forward, generative AI platforms are now considering alternatives to text-driven prompts. It’s understandable why text was the first choice: it’s incredibly compatible with a wide range of applications and intuitive to human conversation. However, typing prompts can still create a level of friction between input and output. Speaking, on the other hand, might allow for faster and even more efficient communication, so long as its NLP model exceeds the capabilities of current, clunky voice recognition programs.

Beyond speech, emerging technologies that allow for eye tracking and spatial movement inputs can be interesting thought experiments to begin considering. Foveated rendering is a current technique used in some VR headsets that concentrates rendering resources on the area that the eye is looking at, reducing the rendering workload for an optimized experience. Though this is a hyper-specific example, foveated rendering sets the precedent that non-verbal input is within our capabilities. These non-traditional inputs combined with generative AI might be closer than anticipated, especially considering the recent release of Apple’s Vision Pro.

Regardless, an ongoing exploration into how designers package generative AI has the potential to enrich user experiences in a fast-paced and ever-changing technological landscape. Innovative approaches to this problem, iteration, and time will only tell.