With the Gemini action, you can generate text, process text-and-image inputs, and effortlessly count tokens.


Before you add this action, ensure to add the Gemini API key.

Adding Gemini action

To add a Gemini action, follow these steps:

  1. Select the Widget (e.g., Container, Button, etc.) on which you want to add the action.

  2. Select Actions from the Properties panel (the right menu), and click Open. This will open an Action Flow Editor in a new popup window.

  3. Click on the + Add Action.

  4. On the right side, search and select the Gemini (under Integrations) action.

  5. Set the Action Type. Note that If you set this type to Text from Image, you must provide the image as well.

  6. Provide the Text prompt that will be used to generate the result from the Gemini AI model. For this example, we use this prompt: When users upload a photo, you analyze the food in the photo and tell if it is healthy to eat.

  7. Provide the Action Output Variable Name where the result of the generation will be stored. Later, you can access this variable from anywhere on the page.

Types of Gemini action

Following are the types of Gemini actions you can add:

1. Generate Text

This action allows you to create natural language text based on the text prompts you provide.


  • Input: Text prompt - "Write a brief summary of the benefits of exercise."

  • Output: Action Output Variable Name - "Exercise can improve mental health, increase lifespan, enhance physical fitness, and reduce the risk of chronic diseases."

2. Count Tokens

With this action, you can analyze the number of tokens in a given text prompt. This is particularly useful for applications that need to monitor or restrict the length of text inputs, ensuring that content stays within desired limits or quotas.

A token can be a word, but it can also be a part of a word or even punctuation. The division of text into tokens depends on the tokenization algorithm being used. For Gemini models, a token is equivalent to about 4 characters. 100 tokens are about 60-80 English words.


  • Input: Text prompt - "Gemini is fun!"

  • Output: Action Output Variable Name - 5

3. Text from Image

This action enables your app to analyze images and generate descriptive text about them. It can interpret the content of an image, such as identifying objects, scenery, or activities, and then provide a textual description.


  • Input: Text prompt - "Identify the object in the image?"

  • Input: Image Type - There are two ways you can provide an image.

    • Image Network URL: You can provide the URL of the image hosted on the internet. If you upload an image to Firebase or Supabase, you can provide the image via Widget State > Uploaded File URL.

    • Uploaded Image File: You can also provide an image file directly from your device via Widget State > Uploaded Local File.

  • Output: Action Output Variable Name - "This is a pipe organ. It is a large musical instrument that is used in churches, concert halls, and other large buildings. The sound of a pipe organ is very powerful and can be used to create a wide variety of music."

Published Date: March 15, 2024

Last updated