Instead of providing examples of what you are asking the model to do, you can just simply ask it what to do directly. I am using their latest “zero-shot” style of prompting with their new Da Vinci Instruct model. OpenAI’s GPT-3 Da Vinci is currently the largest AI model for Natural Language Processing. The model is a lot smaller, but it’s free to use. And there’s the open-source GPT-Neo model from EleutherAI. There’s the latest GPT-3 Da Vinci model from OpenAI that does an excellent job, but you have to be enrolled in their beta program to use it. I use two different implementations of GPT to generate the captions. I present the top 10 images to the user to choose their favorite.įor example, if you search for “apple pie”, you will be presented with the top 10 images sorted by closest match. I then download the top 20 matching images using the OpenImages download API.įor a final filtering pass, I run the images from the 3 Wikipedia pages and the 20 images from the OpenImages through the image encoder and compare the results to the embedding of the text query. When the user types in a query, I run it through CLIP and compare it to the cached embeddings. I ran each of the descriptions through OpenAI’s CLIP system and cached the embeddings for quick access. A dataset of image descriptions is available for download. The OpenImages dataset from Google is comprised of 675,000 photos scraped from Flikr that were all released under the Creative Commons Attribution license. There are typically 3 to 10 images on a Wikipedia page so there will be about 9 to 30 images in total coming down. I use the pyfileobj() function in Python to download the image files. I use Goldsmith’s Wikipedia search API to find the top 3 pages related to the text query and gather the image descriptions using the CommonsAPI on the Magnus Toolserver. Most of them are released with permissive rights, like the Creative Commons Attribution license. The Wikimedia Commons has over 73 million JPEG files. For more information about how CLIP works, check out my article, here. The CLIP model was pre-trained on 40 million pairs of images with text labels such that the embeddings encoded from the images will be similar to the embeddings encoded from the text labels. The CLIP system accomplishes two functions, encoding both text and images into “embeddings”, which are strings of numbers that represent the gist of the original data. I use OpenAI’s CLIP to perform a semantic search. The background images are pulled from two sources, the Wikimedia Commons and the OpenImages dataset. You can read about his system here, and run it online here. He used 100M public meme captions by users of the Imgflip Meme Generator and trained the system to generate captions based on 48 commonly used background images. Here’s what others created before meĭylan Wenzlau created an automatic meme generator using a deep convolutional network. In the age of the Internet, the term meme has been narrowed to mean a piece of content, typically an image with a funny caption, that’s spread online via social media. The Wiktionary defines the word meme as “any unit of cultural information, such as a practice or idea, that is transmitted verbally or by repeated action from one mind to another in a comparable way to the transmission of genes.” The term originated in Richard Dawkins’ book, The Selfish Gene. Meme by AI-Memer, Image by Atsuko Sato, Caption by OpenAI GPT-3, License: CC BY-SA 4.0 What are memes, again? The user selects the best caption to create the new meme, which can be downloaded. Either the GPT-3 model from OpenAI or the GPT-Neo model from EleutherAI is used to generate 10 possible captions. The user checks out the top 10 images that match the query and selects their favorite. I then perform a semantic search on the images. A semantic search looks for matching concepts, not just a word search. I use the CLIP encoders from OpenAI to first perform a semantic search on the text descriptions. Both datasets have corresponding text descriptions of the images. The system then checks for matching images in Wikimedia Commons and the OpenImages dataset. The user starts by entering a search query to find a background image, like “apple pie”. AI-Memer Components, Diagram by Author, pie photo by W.carter
0 Comments
Leave a Reply. |