Google recently unveiled an upgraded version of its large language model, Gemini 1.5 Pro, which now includes an audio monitoring feature. With this enhancement, Gemini can analyze uploaded audio files, extract key information from earnings conference calls or video audio, and provide insights without the need for transcription.
During the Google Next conference held on Tuesday, Google announced the availability of Gemini 1.5 Pro to the public through its artificial intelligence application development platform, Vertex AI. This release marks the first time the model has been made accessible to external developers since its initial introduction in February of this year.
Gemini 1.5 Pro, positioned as the "middle-weight" model within the Gemini family, has surpassed the performance of its predecessor, Gemini Ultra, which was considered the largest and most powerful variant. Google claims that Gemini 1.5 Pro can comprehend complex instructions and requires no special adjustments for utilization.
It is important to note that the full functionality of Gemini 1.5 Pro is only accessible through Vertex AI. Currently, most users engage with Gemini's large language model through the Gemini chatbot. While Gemini Ultra offers robust support for the Gemini Advanced chatbot and can understand longer inputs, it is less responsive compared to Gemini 1.5 Pro.
In addition to the Gemini 1.5 Pro update, Google has also made advancements to other significant artificial intelligence models, including Imagen 2. This text-to-image generation model enhances Gemini's image generation capabilities by introducing features such as image outpainting and inpainting. These new functionalities allow users to manipulate images by adding or removing elements more flexibly.
To ensure copyright protection and source traceability for images generated by the Imagen model, Google has incorporated SynthID digital watermark technology into all generated images. This innovative approach utilizes virtually invisible watermarks that can be detected using specialized tools, effectively identifying the source of an image.
Several features of the Imagen model, such as image extension and infill technology, have been adopted by other text-to-image models, including Stability AI’s Stable Cascade and Getty’s Generative AI by iStock. Moreover, these technologies have found applications in consumer electronics products, such as Samsung Galaxy mobile phones.
Beyond advancements in image generation, Google has also demonstrated a method that combines artificial intelligence-generated answers with Google search results, aiming to provide users with real-time and accurate information. However, it is important to note that answers generated by large language models are not always accurate and can occasionally mislead users. As a result, Google has imposed certain restrictions on the Gemini model, including the prohibition of answering questions related to the 2024 US election.