Agent Platform RAG Engine Management

This skill provides instructions on how to interact with Agent Platform RAG Engine using the Agent Platform Python SDK. You MUST use the vertexai Python SDK to perform RAG Engine operations, rather than raw REST calls or MCP tools, because this code is intended to be run by external clients.

Phase 0: Environment Setup

CRITICAL: Before running any of the Python snippets below, you MUST ensure the environment is correctly initialized by following these steps:

Google Cloud Authentication: Authenticate with your Google Cloud credentials and configure active Application Default Credentials (ADC) for Agent Platform access:
```
gcloud auth login
gcloud auth application-default login
```
Virtual Environment: Create and activate a dedicated virtual environment:
```
python3 -m venv ~/rag_agent_venv
source ~/rag_agent_venv/bin/activate
```
Install Dependencies: Install the required Agent Platform SDKs:
```
pip install google-cloud-aiplatform google-genai
```
Execution: Advise the user that every time they execute a Python snippet, they must ensure this virtual environment is activated first.

Workflow Decision Tree

Information Gathering: Has the user provided the Project ID, Region, and Corpus ID?
- No -> Proceed to [1. Listing Corpora and Files] to discover the necessary Resource Names and IDs. Only ask the user if discovery fails.
- Yes -> Proceed.
Task Type: What does the user want to do?
- List Corpora and Files -> Proceed to [1. Listing Corpora and Files].
- Inspect a Corpus -> Proceed to [2. Getting / Inspecting a RAG Engine Corpus].
- Search for Contexts -> Proceed to [3. Retrieving Contexts].
- Answer questions using RAG Engine -> Proceed to [4. Answering the User with Retrieved Context].

[!TIP] Placeholder Parameter Replacement: The Python scripts below use bracketed string placeholders (like "{project_id}", "{region}", and "{corpus_id}"). You MUST dynamically replace these placeholders with the actual Project ID, Region, and Corpus ID values provided in the user's prompt (or active context) before generating, providing, or executing the scripts.

1. Listing Corpora and Files (Discovery)

If you do not know the Resource Name of the corpus or file, you MUST list them first to discover them. The SDK handles pagination automatically when converted to a list, but you can also use manual pagination for large sets.

1.1 Listing and Discovering Corpora

import vertexai
from vertexai.preview import rag

vertexai.init(project="{project_id}", location="{region}")

# Approach A: List ALL (Automatic Pagination)
# The SDK's Pager iterates through all pages for you.
all_corpora = list(rag.list_corpora())
print(f"Found {len(all_corpora)} corpora in total.")
for c in all_corpora:
    print(f"Corpus Name: {c.name} | Display Name: {c.display_name}")

# Approach B: Manual Pagination (for very large projects)
pager = rag.list_corpora(page_size=10)
# Process first page
for c in pager:
    print(f"Corpus: {c.display_name}")

# Get next page if needed
if pager.next_page_token:
    second_page = rag.list_corpora(
        page_size=10, page_token=pager.next_page_token
    )

1.2 Listing and Discovering Files

To understand what files (and types) are in a corpus, list them and inspect the display_name (usually includes the extension).

import vertexai
from vertexai.preview import rag

vertexai.init(project="{project_id}", location="{region}")
corpus_name = (
    "projects/{project_id}/locations/{region}/ragCorpora/{corpus_id}"
)

# List files with automatic pagination
files = list(rag.list_files(corpus_name=corpus_name))
print(f"Found {len(files)} files.")

for f in files:
    # High-level SDK RagFile objects usually have name, display_name,
    # description
    print(f"File: {f.display_name} | Resource: {f.name}")
    # Tip: Check extension to understand file type (PDF, TXT, etc.)
    if f.display_name.lower().endswith(".pdf"):
        print("  Type: PDF")
    elif f.display_name.lower().endswith(".txt"):
        print("  Type: Plain Text")

2. Getting / Inspecting an Agent Platform RAG Engine Corpus

To retrieve details about an existing Agent Platform RAG Engine corpus:

import vertexai
from vertexai.preview import rag

vertexai.init(project="{project_id}", location="{region}")

# To get details of a specific corpus
corpus_name = (
    "projects/{project_id}/locations/{region}/ragCorpora/{corpus_id}"
)
corpus = rag.get_corpus(name=corpus_name)
print(f"Corpus Name: {corpus.name}")
print(f"Display Name: {corpus.display_name}")

3. Retrieving Contexts

To retrieve relevant contexts from a RAG Engine corpus based on a query:

import vertexai
from vertexai.preview import rag

vertexai.init(project="{project_id}", location="{region}")

corpus_name = (
    "projects/{project_id}/locations/{region}/ragCorpora/{corpus_id}"
)
query = "What is the speed of light?"

# Retrieve contexts
response = rag.retrieval_query(
    rag_corpora=[corpus_name],
    text=query,
    similarity_top_k=3
)

for context in response.contexts.contexts:
    print(f"Context text: {context.text}")
    print(f"Source: {context.source_uri}")

4. Answering the User with Retrieved Context

To use the retrieved context alongside an Agent Platform model to generate a grounded response:

from google import genai
from google.genai import types

client = genai.Client(enterprise=True, project="{project_id}", location="{region}")
corpus_name = (
    "projects/{project_id}/locations/{region}/ragCorpora/{corpus_id}"
)

# Define the Agent Platform RAG Engine tool pointing to the corpus
rag_tool = types.Tool(
    retrieval=types.Retrieval(
        vertex_rag_store=types.VertexRagStore(
            rag_resources=[types.VertexRagStoreRagResource(rag_corpus=corpus_name)],
            rag_retrieval_config=types.RagRetrievalConfig(
                top_k=3,
                filter=types.RagRetrievalConfigFilter(
                    vector_similarity_threshold=0.5,
                ),
            ),
        )
    )
)

# Generate content using the RAG Engine tool
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What is the speed of light?",
    config=types.GenerateContentConfig(
        tools=[rag_tool]
    )
)
print(response.text)

Agent Platform Rag Engine Management