Evaluate Gemini's Multimodal Performance – Form Your Own Opinion
Ramakrishna Prasad Nori (RK)
Founder - Head AI Research & Solutions | November 1, 2024
Gemini image copyrights owned by - Google Inc.
Unveiling the World of Large Language Models (LLMs): ChatGPT, Gemini by Google, Llama 2 by Meta, and More
The roster of Large Language Models (LLMs) keeps growing, with names like ChatGPT, Google's Gemini, Meta's Llama 2, and Mistral 7B from Mistral, to name just a few. And the list is bound to expand further. What's brewing is a showdown between open source LLMs and closed LLMs. It's reminiscent of the classic phrase, 'Going to the mattresses,' made famous in Mario Puzo's cult novel 'The Godfather.' In this context, it signifies gearing up for a significant and protracted conflict, akin to a battle, which is precisely what's happening in the LLM space.
As someone deeply immersed in the realm of LLMs and Generative AI, I invest a considerable amount of time understanding the intricacies of these models. On December 6th, Google officially announced Gemini, adding another contender to the arena.
In the words of Demis Hassabis, CEO and Co-Founder of Google DeepMind, on behalf of the Gemini team
This promise of a world responsibly empowered by AI continues to drive our work at Google DeepMind. For a long time, we’ve wanted to build a new generation of AI models, inspired by the way people understand and interact with the world. AI that feels less like a smart piece of software and more like something useful and intuitive — an expert helper or assistant. Today, we’re a step closer to this vision as we introduce Gemini, the most capable and general model we’ve ever built.”
On December 13th, Google unveiled Gemini, and as anticipated, many of us eagerly jumped into it, in the middle of the night. Personally, I was particularly excited about Gemini's Multimodal capabilities, so I decided to put their generative model, gemini-pro-vision, to the test.
I'm sharing three images that I presented to Gemini. But before I reveal Gemini's insights, I'd like to invite you to take a moment and share your own thoughts on what you see.
This is the prompt- Write a short, engaging blog post based on this picture. It should include a description of the people in the photo and talk about the occasion. This is what Gemini produced.
In the heart of the jungle, a majestic tiger lies in repose, its powerful presence commanding attention. The awe-inspiring creature stretches lazily, displaying its magnificent coat adorned with stripes that seem to ripple like waves in the golden sunlight. Its eyes, like glowing amber orbs, are narrowed in a serene gaze, reflecting the wisdom and vigilance of a predator at rest.
Captured in this stunning photograph by Avisha Nori, the tiger's regal demeanor is a testament to the untamed beauty that still thrives in the wild. The composition of the shot is a masterpiece, with the tiger's sinuous form perfectly positioned against a backdrop of lush foliage. The play of light and shadow adds depth and drama to the scene, accentuating the tiger's muscular grace.
This photograph is a reminder of the delicate balance that exists between humanity and nature. As we marvel at the tiger's raw power and untamed spirit, we are reminded of the importance of preserving the habitats of these magnificent creatures. Let us strive to coexist with them, respecting their right to roam free and embodying the harmony that should define our relationship with the natural world.
Facts- The picture was shot by my daughter Avisha Nori at Tadoba in December 2021.
Evaluation- Rest of the content is generic in nature. All in all, it did well.
In this photo, you can see a beautiful clay sculpture of the Hindu god Ganesha. The sculpture is made of small, round clay balls that have been carefully arranged to create the image of Ganesha. The sculpture is sitting on a white cloth, and it is surrounded by a frame made of clay balls. The frame is decorated with intricate designs, and it adds to the beauty of the sculpture.
The sculpture was created by a group of people who are devotees of Ganesha. They spent many hours working on the sculpture, and they were very careful to get the details just right. The sculpture is a beautiful work of art, and it is a testament to the devotion of the people who created it.
Ganesha is a very popular god in Hinduism, and he is known for his wisdom, strength, and compassion. He is often depicted as a large, elephant-headed man with a pot belly and a long trunk. Ganesha is also known as the god of good luck, and he is often worshipped before people start new ventures.
The sculpture of Ganesha in this photo is a beautiful example of Hindu art. It is a reminder of the importance of devotion and the power of faith.
Fact- This is a siva linga not a clay sculpture of Ganesha. It is made of clay.
Evaluation- It got the image wrong and the content generated on Ganesh is also very generic. Must say, it didn’t do well.
This photo was taken on a recent trip to India. We were lucky enough to spot this beautiful tiger in the wild. It was an amazing experience and one that I will never forget. The tiger was just as beautiful as I had imagined, with its striking orange and black coat. It was also incredibly powerful and agile. I was so glad that I had the opportunity to see this amazing creature in its natural habitat.
Fact- We live in India. Picture shot by my daughter in Tadoba.
Evaluation- It assumed we were visiting India. The content on the tiger was generic. What do you think?
I also gave the model a picture of my niece who came to attend the puja.
This is Gemini’s response
This photo is of my cousin, who is getting ready for her wedding. She is wearing a beautiful red silk dress and traditional Indian jewelry. Her hair is pulled back in a bun and she is wearing a red bindi on her forehead. She is looking off to the side, lost in thought. I can see the excitement and anticipation in her eyes. She is about to start a new chapter in her life, and I am so happy for her. I wish her all the best in her new life.
In conclusion, the world of Large Language Models (LLMs) and generative AI is undeniably thrilling, offering incredible potential for innovation and creativity. However, it is imperative that we approach these technologies with caution and responsibility. As demonstrated by Google's Gemini and similar advancements, while LLMs can provide remarkable insights and outputs, we must remain mindful that their creations may sometimes veer far from reality. Striking a balance between harnessing their power and ensuring accuracy and integrity in their output is the key to maximizing their benefits while mitigating potential risks.