The wait has come to an end as OpenAI has finally announced the release of GPT-4, the latest version of a family of deep learning technologies capable of seemingly generating, editing, and iterating with users on creative and technical writing tasks. This cutting-edge model boasts advanced reasoning capabilities and the unique ability to simultaneously process both images and text, rendering it a powerful tool for businesses across various sectors.
Broadly speaking, GPT-4 is a large language model (LLM), the current state of the art for natural language processing (NLP) applications. These types of neural networks are behemoths with billions of learnable parameters, that are trained for weeks on cutting-edge hardware using humongous corpora of text scraped from the internet. Through this process, the models develop intrinsic abilities to extract contextual information from input prompts and (most of the time) accurately respond to many different user requirements.
GPT-4 builds on top of OpenAI’s ChatGPT and brings a much more appealing user experience thanks to a series of additional features. It has access to an 8 times bigger context window of 25,000 words, which allows it to outperform most of its competitors in benchmark tests and exams intended for human evaluation. In the developer’s demo, OpenAI showed how it can also follow instructions to develop code by itself, identify bugs based solely on a copy-pasted stack trace, and even suggest potential corrections when instructed to update to new library versions. Furthermore, GPT-4 is the first LLM to be actually multimodal: this means that it can take both text and image inputs and extract insights from both sources to produce surprising results, such as describing a meme or a funny image, suggesting recipes from ingredients on a picture, or even producing a fully working HTML website straight from a handwritten mockup.
We know only a few details about the model behind the curtains, as OpenAI has deliberately decided not to include much technical detail in its 98-page long technical report, due to “the competitive landscape and the safety implications of large-scale models like GPT-4”. We know, though, that it has a Transformer-like architecture at its core and that it was pre-trained, like its cousin BERT, to predict the next token in a document. Those two features give the actual name of GPT, which stands for Generative Pre-trained Transformer. Another fact that we know is that its training set included both data from the internet and samples licensed from third-party providers. Furthermore, the company reported that the pre-trained version was fine-tuned using Reinforcement Learning from Human Feedback (RLHF), the very same technique applied in its older sibling GPT-3.5 to turn it into ChatGPT. To this end, OpenAI also exploited real-world feedback from ChatGPT users, which already surpassed 100 million.
In terms of safety, OpenAI has stated in its report significant efforts to address potential social biases, hallucinations, and other security-related issues. The model was critically studied by 50 experts in AI safety to identify solutions to adversarial usage, unwanted content generation, and privacy concerns, and several firewalls were implemented to prevent it from, for example, giving instructions to make unsafe chemical compounds. Thanks to this labor, GPT-4 is 82% less likely to respond to requests for disallowed content and 40% more likely to produce factual responses than GPT-3.5. As a funny side note, OpenAI claimed that they even used GPT-4 itself on this safety research, e.g., using it to generate training data for fine-tuning and iterating on classifiers across training, evaluations, and monitoring.
With such a powerful tool in hand, OpenAI has already begun collaborating with other commercial partners to explore new, unforeseen applications. Duolingo, the most popular mobile app for learning new languages, has, for instance, created new features that allow users to interact with a GPT-4-powered chatbot that seemingly plays the role of a native professor. Taking advantage of GPT-4’s combined image and text multimodal capabilities, Be My Eyes has created a Virtual Volunteer feature for visually impaired people that provides human-like feedback about the observed environment, such as knowing what’s inside the fridge or reading details of a medicine in a prescription. These are just a few of the unimaginable things that this technology can allow us to do.
GPT-4 is already available for users of ChatGPT Plus and, with limitations, is at the core of Microsoft’s Bing Chat assistant. OpenAI has also opened a waitlist for accessing the GPT-4 API for commercial and research purposes. Unfortunately, the multimodal text+image feature is still not publicly accessible, but the company has ensured that it will be finished soon.
The release of GPT-4 marks a significant advancement in natural language processing, with its multimodal capabilities and advanced reasoning abilities. Part of our job as a product development company is to support our clients in incorporating this technology reliably, accurately, and securely into their products or to help them create new applications based on it. The potential of GPT-4 is endless, and at Arionkoder, we are excited to collaborate with businesses to create innovative solutions for their specific needs. Our expertise in building AI applications for various industries, combined with the power of GPT-4, can transform the way companies operate and interact with their customers. Contact us to learn more about how we can help you incorporate this groundbreaking technology into your products and services, and stay ahead of the competition. Let’s work together to unlock your potential with GPT-4!