What is generative AI? Everything you need to know

What is generative AI? Everything you need to know

What is generative AI? Everything you need to know

Generative artificial intelligence is artificial intelligence capable of generating text, images, videos, or other data using generative models, often in response to prompts

 

Enterprise

AI

 

Home AI technologies

Tech Accelerator

DEFINITION

What is generative AI? Everything you need to know

George Lawton

Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.

 

The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.

On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.

 

Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.

 

The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.

 

These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers. Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.

How does generative AI work?

Generative AI starts with a prompt that could be in the form of a text, an image, a video, a design, musical notes, or any input that the AI system can process. Various AI algorithms then return new content in response to the prompt. Content can include essays, solutions to problems, or realistic fakes created from pictures or audio of a person.

 

Early versions of generative AI required submitting data via an API or an otherwise complicated process. Developers had to familiarize themselves with special tools and write applications using languages such as Python.

 

Now, pioneers in generative AI are developing better user experiences that let you describe a request in plain language. After an initial response, you can also customize the results with feedback about the style, tone and other elements you want the generated content to reflect.

Generative AI models

Generative AI models combine various AI algorithms to represent and process content. For example, to generate text, various natural language processing techniques transform raw characters (e.g., letters, punctuation and words) into sentences, parts of speech, entities and actions, which are represented as vectors using multiple encoding techniques. Similarly, images are transformed into various visual elements, also expressed as vectors. One caution is that these techniques can also encode the biases, racism, deception and puffery contained in the training data.

 

Once developers settle on a way to represent the world, they apply a particular neural network to generate new content in response to a query or prompt. Techniques such as GANs and variational autoencoders (VAEs) -- neural networks with a decoder and encoder -- are suitable for generating realistic human faces, synthetic data for AI training or even facsimiles of particular humans.

 

Recent progress in transformers such as Google's Bidirectional Encoder Representations from Transformers (BERT), OpenAI's GPT and Google AlphaFold have also resulted in neural networks that can not only encode language, images and proteins but also generate new content. 

 

How neural networks are transforming generative AI

Researchers have been creating AI and other tools for programmatically generating content since the early days of AI. The earliest approaches, known as rule-based systems and later as "expert systems," used explicitly crafted rules for generating responses or data sets.

 

Neural networks, which form the basis of much of the AI and machine learning applications today, flipped the problem around. Designed to mimic how the human brain works, neural networks "learn" the rules from finding patterns in existing data sets. Developed in the 1950s and 1960s, the first neural networks were limited by a lack of computational power and small data sets. It was not until the advent of big data in the mid-2000s and improvements in computer hardware that neural networks became practical for generating content.

 

The field accelerated when researchers found a way to get neural networks to run in parallel across the graphics processing units (GPUs) that were being used in the computer gaming industry to render video games. New machine learning techniques developed in the past decade, including the aforementioned generative adversarial networks and transformers, have set the stage for the recent remarkable advances in AI-generated content.

 

What are Dall-E, ChatGPT and Bard?

ChatGPT, Dall-E and Bard are popular generative AI interfaces.

 

Dall-E. Trained on a large data set of images and their associated text descriptions, Dall-E is an example of a multimodal AI application that identifies connections across multiple media, such as vision, text and audio. In this case, it connects the meaning of words to visual elements. It was built using OpenAI's GPT implementation in 2021. Dall-E 2, a second, more capable version, was released in 2022. It enables users to generate imagery in multiple styles driven by user prompts.

 

ChatGPT. The AI-powered chatbot that took the world by storm in November 2022 was built on OpenAI's GPT-3.5 implementation. OpenAI has provided a way to interact and fine-tune text responses via a chat interface with interactive feedback. Earlier versions of GPT were only accessible via an API. GPT-4 was released March 14, 2023. ChatGPT incorporates the history of its conversation with a user into its results, simulating a real conversation. After the incredible popularity of the new GPT interface, Microsoft announced a significant new investment into OpenAI and integrated a version of GPT into its Bing search engine.

 

Enterprise

AI

 

Home AI technologies

Tech Accelerator

DEFINITION

What is generative AI? Everything you need to know

George Lawton

Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.

 

The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.

 

On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.

 

Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.

 

The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.

 

These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers. Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.

 

 

How does generative AI work?

Generative AI starts with a prompt that could be in the form of a text, an image, a video, a design, musical notes, or any input that the AI system can process. Various AI algorithms then return new content in response to the prompt. Content can include essays, solutions to problems, or realistic fakes created from pictures or audio of a person.

 

Early versions of generative AI required submitting data via an API or an otherwise complicated process. Developers had to familiarize themselves with special tools and write applications using languages such as Python.

 

Now, pioneers in generative AI are developing better user experiences that let you describe a request in plain language. After an initial response, you can also customize the results with feedback about the style, tone and other elements you want the generated content to reflect.

 

 

Generative AI models

Generative AI models combine various AI algorithms to represent and process content. For example, to generate text, various natural language processing techniques transform raw characters (e.g., letters, punctuation and words) into sentences, parts of speech, entities and actions, which are represented as vectors using multiple encoding techniques. Similarly, images are transformed into various visual elements, also expressed as vectors. One caution is that these techniques can also encode the biases, racism, deception and puffery contained in the training data.

 

Once developers settle on a way to represent the world, they apply a particular neural network to generate new content in response to a query or prompt. Techniques such as GANs and variational autoencoders (VAEs) -- neural networks with a decoder and encoder -- are suitable for generating realistic human faces, synthetic data for AI training or even facsimiles of particular humans.

 

Recent progress in transformers such as Google's Bidirectional Encoder Representations from Transformers (BERT), OpenAI's GPT and Google AlphaFold have also resulted in neural networks that can not only encode language, images and proteins but also generate new content. 

 

How neural networks are transforming generative AI

Researchers have been creating AI and other tools for programmatically generating content since the early days of AI. The earliest approaches, known as rule-based systems and later as "expert systems," used explicitly crafted rules for generating responses or data sets.

 

Neural networks, which form the basis of much of the AI and machine learning applications today, flipped the problem around. Designed to mimic how the human brain works, neural networks "learn" the rules from finding patterns in existing data sets. Developed in the 1950s and 1960s, the first neural networks were limited by a lack of computational power and small data sets. It was not until the advent of big data in the mid-2000s and improvements in computer hardware that neural networks became practical for generating content.

 

The field accelerated when researchers found a way to get neural networks to run in parallel across the graphics processing units (GPUs) that were being used in the computer gaming industry to render video games. New machine learning techniques developed in the past decade, including the aforementioned generative adversarial networks and transformers, have set the stage for the recent remarkable advances in AI-generated content.

 

What are Dall-E, ChatGPT and Bard?

ChatGPT, Dall-E and Bard are popular generative AI interfaces.

 

Dall-E. Trained on a large data set of images and their associated text descriptions, Dall-E is an example of a multimodal AI application that identifies connections across multiple media, such as vision, text and audio. In this case, it connects the meaning of words to visual elements. It was built using OpenAI's GPT implementation in 2021. Dall-E 2, a second, more capable version, was released in 2022. It enables users to generate imagery in multiple styles driven by user prompts.

 

ChatGPT. The AI-powered chatbot that took the world by storm in November 2022 was built on OpenAI's GPT-3.5 implementation. OpenAI has provided a way to interact and fine-tune text responses via a chat interface with interactive feedback. Earlier versions of GPT were only accessible via an API. GPT-4 was released March 14, 2023. ChatGPT incorporates the history of its conversation with a user into its results, simulating a real conversation. After the incredible popularity of the new GPT interface, Microsoft announced a significant new investment into OpenAI and integrated a version of GPT into its Bing search engine.

 

Here is a snapshot of the differences between ChatGPT and Bard.

Bard. Google was another early leader in pioneering transformer AI techniques for processing language, proteins and other types of content. It open sourced some of these models for researchers. However, it never released a public interface for these models. Microsoft's decision to implement GPT into Bing drove Google to rush to market a public-facing chatbot, Google Bard, built on a lightweight version of its LaMDA family of large language models. Google suffered a significant loss in stock price following Bard's rushed debut after the language model incorrectly said the Webb telescope was the first to discover a planet in a foreign solar system. Meanwhile, Microsoft and ChatGPT implementations also lost face in their early outings due to inaccurate results and erratic behavior. Google has since unveiled a new version of Bard built on its most advanced LLM, PaLM 2, which allows Bard to be more efficient and visual in its response to user queries.

 

What are use cases for generative AI?

Generative AI can be applied in various use cases to generate virtually any kind of content. The technology is becoming more accessible to users of all kinds thanks to cutting-edge breakthroughs like GPT that can be tuned for different applications. Some of the use cases for generative AI include the following:

 

Implementing chatbots for customer service and technical support.

Deploying deepfakes for mimicking people or even specific individuals.

Improving dubbing for movies and educational content in different languages.

Writing email responses, dating profiles, resumes and term papers.

Creating photorealistic art in a particular style.

Improving product demonstration videos.

Suggesting new drug compounds to test.

Designing physical products and buildings.

Optimizing new chip designs.

Writing music in a specific style or tone.

Enterprise

AI

 

Home AI technologies

Tech Accelerator

DEFINITION

What is generative AI? Everything you need to know

George Lawton

Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.

 

The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.

 

On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.

 

Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.

 

The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.

 

These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers. Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.

 

 

How does generative AI work?

Generative AI starts with a prompt that could be in the form of a text, an image, a video, a design, musical notes, or any input that the AI system can process. Various AI algorithms then return new content in response to the prompt. Content can include essays, solutions to problems, or realistic fakes created from pictures or audio of a person.

 

Early versions of generative AI required submitting data via an API or an otherwise complicated process. Developers had to familiarize themselves with special tools and write applications using languages such as Python.

 

Now, pioneers in generative AI are developing better user experiences that let you describe a request in plain language. After an initial response, you can also customize the results with feedback about the style, tone and other elements you want the generated content to reflect.

 

 

Generative AI models

Generative AI models combine various AI algorithms to represent and process content. For example, to generate text, various natural language processing techniques transform raw characters (e.g., letters, punctuation and words) into sentences, parts of speech, entities and actions, which are represented as vectors using multiple encoding techniques. Similarly, images are transformed into various visual elements, also expressed as vectors. One caution is that these techniques can also encode the biases, racism, deception and puffery contained in the training data.

 

Once developers settle on a way to represent the world, they apply a particular neural network to generate new content in response to a query or prompt. Techniques such as GANs and variational autoencoders (VAEs) -- neural networks with a decoder and encoder -- are suitable for generating realistic human faces, synthetic data for AI training or even facsimiles of particular humans.

 

Recent progress in transformers such as Google's Bidirectional Encoder Representations from Transformers (BERT), OpenAI's GPT and Google AlphaFold have also resulted in neural networks that can not only encode language, images and proteins but also generate new content. 

 

How neural networks are transforming generative AI

Researchers have been creating AI and other tools for programmatically generating content since the early days of AI. The earliest approaches, known as rule-based systems and later as "expert systems," used explicitly crafted rules for generating responses or data sets.

 

Neural networks, which form the basis of much of the AI and machine learning applications today, flipped the problem around. Designed to mimic how the human brain works, neural networks "learn" the rules from finding patterns in existing data sets. Developed in the 1950s and 1960s, the first neural networks were limited by a lack of computational power and small data sets. It was not until the advent of big data in the mid-2000s and improvements in computer hardware that neural networks became practical for generating content.

 

The field accelerated when researchers found a way to get neural networks to run in parallel across the graphics processing units (GPUs) that were being used in the computer gaming industry to render video games. New machine learning techniques developed in the past decade, including the aforementioned generative adversarial networks and transformers, have set the stage for the recent remarkable advances in AI-generated content.

 

What are Dall-E, ChatGPT and Bard?

ChatGPT, Dall-E and Bard are popular generative AI interfaces.

 

Dall-E. Trained on a large data set of images and their associated text descriptions, Dall-E is an example of a multimodal AI application that identifies connections across multiple media, such as vision, text and audio. In this case, it connects the meaning of words to visual elements. It was built using OpenAI's GPT implementation in 2021. Dall-E 2, a second, more capable version, was released in 2022. It enables users to generate imagery in multiple styles driven by user prompts.

 

ChatGPT. The AI-powered chatbot that took the world by storm in November 2022 was built on OpenAI's GPT-3.5 implementation. OpenAI has provided a way to interact and fine-tune text responses via a chat interface with interactive feedback. Earlier versions of GPT were only accessible via an API. GPT-4 was released March 14, 2023. ChatGPT incorporates the history of its conversation with a user into its results, simulating a real conversation. After the incredible popularity of the new GPT interface, Microsoft announced a significant new investment into OpenAI and integrated a version of GPT into its Bing search engine.

 

Here is a snapshot of the differences between ChatGPT and Bard.

Bard. Google was another early leader in pioneering transformer AI techniques for processing language, proteins and other types of content. It open sourced some of these models for researchers. However, it never released a public interface for these models. Microsoft's decision to implement GPT into Bing drove Google to rush to market a public-facing chatbot, Google Bard, built on a lightweight version of its LaMDA family of large language models. Google suffered a significant loss in stock price following Bard's rushed debut after the language model incorrectly said the Webb telescope was the first to discover a planet in a foreign solar system. Meanwhile, Microsoft and ChatGPT implementations also lost face in their early outings due to inaccurate results and erratic behavior. Google has since unveiled a new version of Bard built on its most advanced LLM, PaLM 2, which allows Bard to be more efficient and visual in its response to user queries.

 

What are use cases for generative AI?

Generative AI can be applied in various use cases to generate virtually any kind of content. The technology is becoming more accessible to users of all kinds thanks to cutting-edge breakthroughs like GPT that can be tuned for different applications. Some of the use cases for generative AI include the following:

 

Implementing chatbots for customer service and technical support.

Deploying deepfakes for mimicking people or even specific individuals.

Improving dubbing for movies and educational content in different languages.

Writing email responses, dating profiles, resumes and term papers.

Creating photorealistic art in a particular style.

Improving product demonstration videos.

Suggesting new drug compounds to test.

Designing physical products and buildings.

Optimizing new chip designs.

Writing music in a specific style or tone.

 

What are the benefits of generative AI?

Generative AI can be applied extensively across many areas of the business. It can make it easier to interpret and understand existing content and automatically create new content. Developers are exploring ways that generative AI can improve existing workflows, with an eye to adapting workflows entirely to take advantage of the technology. Some of the potential benefits of implementing generative AI include the following:

 

Automating the manual process of writing content.

Reducing the effort of responding to emails.

Improving the response to specific technical queries.

Creating realistic representations of people.

Summarizing complex information into a coherent narrative.

Simplifying the process of creating content in a particular style.

What are the limitations of generative AI?

Early implementations of generative AI vividly illustrate its many limitations. Some of the challenges generative AI presents result from the specific approaches used to implement particular use cases. For example, a summary of a complex topic is easier to read than an explanation that includes various sources supporting key points. The readability of the summary, however, comes at the expense of a user being able to vet where the information comes from.

 

Here are some of the limitations to consider when implementing or using a generative AI app:

 

It does not always identify the source of content.

It can be challenging to assess the bias of original sources.

Realistic-sounding content makes it harder to identify inaccurate information.

It can be difficult to understand how to tune for new circumstances.

Results can gloss over bias, prejudice and hatred.

Attention is all you need: Transformers bring new capability

In 2017, Google reported on a new type of neural network architecture that brought significant improvements in efficiency and accuracy to tasks like natural language processing. The breakthrough approach, called transformers, was based on the concept of attention.

 

At a high level, attention refers to the mathematical description of how things (e.g., words) relate to, complement and modify each other. The researchers described the architecture in their seminal paper, "Attention is all you need," showing how a transformer neural network was able to translate between English and French with more accuracy and in only a quarter of the training time than other neural nets. The breakthrough technique could also discover relationships, or hidden orders, between other things buried in the data that humans might have been unaware of because they were too complicated to express or discern.

 

Transformer architecture has evolved rapidly since it was introduced, giving rise to LLMs such as GPT-3 and better pre-training techniques, such as Google's BERT.

 

What are the concerns surrounding generative AI?

The rise of generative AI is also fueling various concerns. These relate to the quality of results, potential for misuse and abuse, and the potential to disrupt existing business models. Here are some of the specific types of problematic issues posed by the current state of generative AI:

 

It can provide inaccurate and misleading information.

It is more difficult to trust without knowing the source and provenance of information.

It can promote new kinds of plagiarism that ignore the rights of content creators and artists of original content.

It might disrupt existing business models built around search engine optimization and advertising.

It makes it easier to generate fake news.

It makes it easier to claim that real photographic evidence of a wrongdoing was just an AI-generated fake.

It could impersonate people for more effective social engineering cyber attacks.

Implementing generative AI is not just about technology. Businesses must also consider its impact on people and processes.

What are some examples of generative AI tools?

Generative AI tools exist for various modalities, such as text, imagery, music, code and voices. Some popular AI content generators to explore include the following:

 

Text generation tools include GPT, Jasper, AI-Writer and Lex.

Image generation tools include Dall-E 2, Midjourney and Stable Diffusion.

Music generation tools include Amper, Dadabots and MuseNet.

Code generation tools include CodeStarter, Codex, GitHub Copilot and Tabnine.

Voice synthesis tools include Descript, Listnr and Podcast.ai.

AI chip design tool companies include Synopsys, Cadence, Google and Nvidia.

 

Enterprise

AI

 

Home AI technologies

Tech Accelerator

DEFINITION

What is generative AI? Everything you need to know

George Lawton

Generative AI is a type of artificial intelligence technology that can produce various types of content, including text, imagery, audio and synthetic data. The recent buzz around generative AI has been driven by the simplicity of new user interfaces for creating high-quality text, graphics and videos in a matter of seconds.

 

The technology, it should be noted, is not brand-new. Generative AI was introduced in the 1960s in chatbots. But it was not until 2014, with the introduction of generative adversarial networks, or GANs -- a type of machine learning algorithm -- that generative AI could create convincingly authentic images, videos and audio of real people.

 

On the one hand, this newfound capability has opened up opportunities that include better movie dubbing and rich educational content. It also unlocked concerns about deepfakes -- digitally forged images or videos -- and harmful cybersecurity attacks on businesses, including nefarious requests that realistically mimic an employee's boss.

 

Two additional recent advances that will be discussed in more detail below have played a critical part in generative AI going mainstream: transformers and the breakthrough language models they enabled. Transformers are a type of machine learning that made it possible for researchers to train ever-larger models without having to label all of the data in advance. New models could thus be trained on billions of pages of text, resulting in answers with more depth. In addition, transformers unlocked a new notion called attention that enabled models to track the connections between words across pages, chapters and books rather than just in individual sentences. And not just words: Transformers could also use their ability to track connections to analyze code, proteins, chemicals and DNA.

 

The rapid advances in so-called large language models (LLMs) -- i.e., models with billions or even trillions of parameters -- have opened a new era in which generative AI models can write engaging text, paint photorealistic images and even create somewhat entertaining sitcoms on the fly. Moreover, innovations in multimodal AI enable teams to generate content across multiple types of media, including text, graphics and video. This is the basis for tools like Dall-E that automatically create images from a text description or generate text captions from images.

 

These breakthroughs notwithstanding, we are still in the early days of using generative AI to create readable text and photorealistic stylized graphics. Early implementations have had issues with accuracy and bias, as well as being prone to hallucinations and spitting back weird answers. Still, progress thus far indicates that the inherent capabilities of this generative AI could fundamentally change enterprise technology how businesses operate. Going forward, this technology could help write code, design new drugs, develop products, redesign business processes and transform supply chains.

 

 

How does generative AI work?

Generative AI starts with a prompt that could be in the form of a text, an image, a video, a design, musical notes, or any input that the AI system can process. Various AI algorithms then return new content in response to the prompt. Content can include essays, solutions to problems, or realistic fakes created from pictures or audio of a person.

 

Early versions of generative AI required submitting data via an API or an otherwise complicated process. Developers had to familiarize themselves with special tools and write applications using languages such as Python.

 

Now, pioneers in generative AI are developing better user experiences that let you describe a request in plain language. After an initial response, you can also customize the results with feedback about the style, tone and other elements you want the generated content to reflect.

 

 

Generative AI models

Generative AI models combine various AI algorithms to represent and process content. For example, to generate text, various natural language processing techniques transform raw characters (e.g., letters, punctuation and words) into sentences, parts of speech, entities and actions, which are represented as vectors using multiple encoding techniques. Similarly, images are transformed into various visual elements, also expressed as vectors. One caution is that these techniques can also encode the biases, racism, deception and puffery contained in the training data.

 

Once developers settle on a way to represent the world, they apply a particular neural network to generate new content in response to a query or prompt. Techniques such as GANs and variational autoencoders (VAEs) -- neural networks with a decoder and encoder -- are suitable for generating realistic human faces, synthetic data for AI training or even facsimiles of particular humans.

 

Recent progress in transformers such as Google's Bidirectional Encoder Representations from Transformers (BERT), OpenAI's GPT and Google AlphaFold have also resulted in neural networks that can not only encode language, images and proteins but also generate new content. 

 

How neural networks are transforming generative AI

Researchers have been creating AI and other tools for programmatically generating content since the early days of AI. The earliest approaches, known as rule-based systems and later as "expert systems," used explicitly crafted rules for generating responses or data sets.

 

Neural networks, which form the basis of much of the AI and machine learning applications today, flipped the problem around. Designed to mimic how the human brain works, neural networks "learn" the rules from finding patterns in existing data sets. Developed in the 1950s and 1960s, the first neural networks were limited by a lack of computational power and small data sets. It was not until the advent of big data in the mid-2000s and improvements in computer hardware that neural networks became practical for generating content.

 

The field accelerated when researchers found a way to get neural networks to run in parallel across the graphics processing units (GPUs) that were being used in the computer gaming industry to render video games. New machine learning techniques developed in the past decade, including the aforementioned generative adversarial networks and transformers, have set the stage for the recent remarkable advances in AI-generated content.

 

What are Dall-E, ChatGPT and Bard?

ChatGPT, Dall-E and Bard are popular generative AI interfaces.

 

Dall-E. Trained on a large data set of images and their associated text descriptions, Dall-E is an example of a multimodal AI application that identifies connections across multiple media, such as vision, text and audio. In this case, it connects the meaning of words to visual elements. It was built using OpenAI's GPT implementation in 2021. Dall-E 2, a second, more capable version, was released in 2022. It enables users to generate imagery in multiple styles driven by user prompts.

 

ChatGPT. The AI-powered chatbot that took the world by storm in November 2022 was built on OpenAI's GPT-3.5 implementation. OpenAI has provided a way to interact and fine-tune text responses via a chat interface with interactive feedback. Earlier versions of GPT were only accessible via an API. GPT-4 was released March 14, 2023. ChatGPT incorporates the history of its conversation with a user into its results, simulating a real conversation. After the incredible popularity of the new GPT interface, Microsoft announced a significant new investment into OpenAI and integrated a version of GPT into its Bing search engine.

 

Here is a snapshot of the differences between ChatGPT and Bard.

Bard. Google was another early leader in pioneering transformer AI techniques for processing language, proteins and other types of content. It open sourced some of these models for researchers. However, it never released a public interface for these models. Microsoft's decision to implement GPT into Bing drove Google to rush to market a public-facing chatbot, Google Bard, built on a lightweight version of its LaMDA family of large language models. Google suffered a significant loss in stock price following Bard's rushed debut after the language model incorrectly said the Webb telescope was the first to discover a planet in a foreign solar system. Meanwhile, Microsoft and ChatGPT implementations also lost face in their early outings due to inaccurate results and erratic behavior. Google has since unveiled a new version of Bard built on its most advanced LLM, PaLM 2, which allows Bard to be more efficient and visual in its response to user queries.

 

What are use cases for generative AI?

Generative AI can be applied in various use cases to generate virtually any kind of content. The technology is becoming more accessible to users of all kinds thanks to cutting-edge breakthroughs like GPT that can be tuned for different applications. Some of the use cases for generative AI include the following:

 

Implementing chatbots for customer service and technical support.

Deploying deepfakes for mimicking people or even specific individuals.

Improving dubbing for movies and educational content in different languages.

Writing email responses, dating profiles, resumes and term papers.

Creating photorealistic art in a particular style.

Improving product demonstration videos.

Suggesting new drug compounds to test.

Designing physical products and buildings.

Optimizing new chip designs.

Writing music in a specific style or tone.

 

What are the benefits of generative AI?

Generative AI can be applied extensively across many areas of the business. It can make it easier to interpret and understand existing content and automatically create new content. Developers are exploring ways that generative AI can improve existing workflows, with an eye to adapting workflows entirely to take advantage of the technology. Some of the potential benefits of implementing generative AI include the following:

 

Automating the manual process of writing content.

Reducing the effort of responding to emails.

Improving the response to specific technical queries.

Creating realistic representations of people.

Summarizing complex information into a coherent narrative.

Simplifying the process of creating content in a particular style.

What are the limitations of generative AI?

Early implementations of generative AI vividly illustrate its many limitations. Some of the challenges generative AI presents result from the specific approaches used to implement particular use cases. For example, a summary of a complex topic is easier to read than an explanation that includes various sources supporting key points. The readability of the summary, however, comes at the expense of a user being able to vet where the information comes from.

 

Here are some of the limitations to consider when implementing or using a generative AI app:

 

It does not always identify the source of content.

It can be challenging to assess the bias of original sources.

Realistic-sounding content makes it harder to identify inaccurate information.

It can be difficult to understand how to tune for new circumstances.

Results can gloss over bias, prejudice and hatred.

Attention is all you need: Transformers bring new capability

In 2017, Google reported on a new type of neural network architecture that brought significant improvements in efficiency and accuracy to tasks like natural language processing. The breakthrough approach, called transformers, was based on the concept of attention.

 

At a high level, attention refers to the mathematical description of how things (e.g., words) relate to, complement and modify each other. The researchers described the architecture in their seminal paper, "Attention is all you need," showing how a transformer neural network was able to translate between English and French with more accuracy and in only a quarter of the training time than other neural nets. The breakthrough technique could also discover relationships, or hidden orders, between other things buried in the data that humans might have been unaware of because they were too complicated to express or discern.

 

Transformer architecture has evolved rapidly since it was introduced, giving rise to LLMs such as GPT-3 and better pre-training techniques, such as Google's BERT.

 

What are the concerns surrounding generative AI?

The rise of generative AI is also fueling various concerns. These relate to the quality of results, potential for misuse and abuse, and the potential to disrupt existing business models. Here are some of the specific types of problematic issues posed by the current state of generative AI:

 

It can provide inaccurate and misleading information.

It is more difficult to trust without knowing the source and provenance of information.

It can promote new kinds of plagiarism that ignore the rights of content creators and artists of original content.

It might disrupt existing business models built around search engine optimization and advertising.

It makes it easier to generate fake news.

It makes it easier to claim that real photographic evidence of a wrongdoing was just an AI-generated fake.

It could impersonate people for more effective social engineering cyber attacks.

Implementing generative AI is not just about technology. Businesses must also consider its impact on people and processes.

What are some examples of generative AI tools?

Generative AI tools exist for various modalities, such as text, imagery, music, code and voices. Some popular AI content generators to explore include the following:

 

Text generation tools include GPT, Jasper, AI-Writer and Lex.

Image generation tools include Dall-E 2, Midjourney and Stable Diffusion.

Music generation tools include Amper, Dadabots and MuseNet.

Code generation tools include CodeStarter, Codex, GitHub Copilot and Tabnine.

Voice synthesis tools include Descript, Listnr and Podcast.ai.

AI chip design tool companies include Synopsys, Cadence, Google and Nvidia.

 

 

Use cases for generative AI, by industry

New generative AI technologies have sometimes been described as general-purpose technologies akin to steam power, electricity and computing because they can profoundly affect many industries and use cases. It's essential to keep in mind that, like previous general-purpose technologies, it often took decades for people to find the best way to organize workflows to take advantage of the new approach rather than speeding up small portions of existing workflows. Here are some ways generative AI applications could impact different industries:

 

Finance can watch transactions in the context of an individual's history to build better fraud detection systems.

Legal firms can use generative AI to design and interpret contracts, analyze evidence and suggest arguments.

Manufacturers can use generative AI to combine data from cameras, X-ray and other metrics to identify defective parts and the root causes more accurately and economically.

Film and media companies can use generative AI to produce content more economically and translate it into other languages with the actors' own voices.

The medical industry can use generative AI to identify promising drug candidates more efficiently.

Architectural firms can use generative AI to design and adapt prototypes more quickly.

Gaming companies can use generative AI to design game content and levels.

Ethics and bias in generative AI

Despite their promise, the new generative AI tools open a can of worms regarding accuracy, trustworthiness, bias, hallucination and plagiarism -- ethical issues that likely will take years to sort out. None of the issues are particularly new to AI. Microsoft's first foray into chatbots in 2016, called Tay, for example, had to be turned off after it started spewing inflammatory rhetoric on Twitter.

 

What is new is that the latest crop of generative AI apps sounds more coherent on the surface. But this combination of humanlike language and coherence is not synonymous with human intelligence, and there currently is great debate about whether generative AI models can be trained to have reasoning ability. One Google engineer was even fired after publicly declaring the company's generative AI app, Language Models for Dialog Applications (LaMDA), was sentient.

 

The convincing realism of generative AI content introduces a new set of AI risks. It makes it harder to detect AI-generated content and, more importantly, makes it more difficult to detect when things are wrong. This can be a big problem when we rely on generative AI results to write code or provide medical advice. Many results of generative AI are not transparent, so it is hard to determine if, for example, they infringe on copyrights or if there is problem with the original sources from which they draw results. If you don't know how the AI came to a conclusion, you cannot reason about why it might be wrong.

 

 

Generative AI vs. AI

Generative AI focuses on creating new and original content, chat responses, designs, synthetic data or even deepfakes. It's particularly valuable in creative fields and for novel problem-solving, as it can autonomously generate many types of new outputs.

 

Generative AI, as noted above, relies on neural network techniques such as transformers, GANs and VAEs. Other kinds of AI, in distinction, use techniques including convolutional neural networks, recurrent neural networks and reinforcement learning.

 

Generative AI often starts with a prompt that lets a user or data source submit a starting query or data set to guide content generation. This can be an iterative process to explore content variations. Traditional AI algorithms, on the other hand, often follow a predefined set of rules to process data and produce a result.

 

Both approaches have their strengths and weaknesses depending on the problem to be solved, with generative AI being well-suited for tasks involving NLP and calling for the creation of new content, and traditional algorithms more effective for tasks involving rule-based processing and predetermined outcomes.

 

Generative AI vs. predictive AI vs. conversational AI

Predictive AI, in distinction to generative AI, uses patterns in historical data to forecast outcomes, classify events and actionable insights. Organizations use predictive AI to sharpen decision-making and develop data-driven strategies.

 

Conversational AI helps AI systems like virtual assistants, chatbots and customer service apps interact and engage with humans in a natural way. It uses techniques from NLP and machine learning to understand language and provide human-like text or speech responses.

 

Generative AI history

The Eliza chatbot created by Joseph Weizenbaum in the 1960s was one of the earliest examples of generative AI. These early implementations used a rules-based approach that broke easily due to a limited vocabulary, lack of context and overreliance on patterns, among other shortcomings. Early chatbots were also difficult to customize and extend.

 

The field saw a resurgence in the wake of advances in neural networks and deep learning in 2010 that enabled the technology to automatically learn to parse existing text, classify image elements and transcribe audio.

 

Ian Goodfellow introduced GANs in 2014. This deep learning technique provided a novel approach for organizing competing neural networks to generate and then rate content variations. These could generate realistic people, voices, music and text. This inspired interest in -- and fear of -- how generative AI could be used to create realistic deepfakes that impersonate voices and people in videos.

 

Since then, progress in other neural network techniques and architectures has helped expand generative AI capabilities. Techniques include VAEs, long short-term memory, transformers, diffusion models and neural radiance fields.

 

Tags: ai