Name: Databricks Generative AI Engineer Associate (GAEA) Practice Exam
Brand: NotJustExam
SKU: NJE-DATABRICKS-DATABRICKS
Price: 9.99 USD
Availability: InStock
Rating: 4.8 (35 reviews)

Free Databricks Generative AI Engineer Associate (GAEA) Practice Questions Preview

Question 1

A Generative Al Engineer has created a RAG application to look up answers to questions about a series of fantasy novels that are being asked on the author’s web forum. The fantasy novel texts are chunked and embedded into a vector store with metadata (page number, chapter number, book title), retrieved with the user’s query, and provided to an LLM for response generation. The Generative AI Engineer used their intuition to pick the chunking strategy and associated configurations but now wants to more methodically choose the best values.
Which TWO strategies should the Generative AI Engineer take to optimize their chunking strategy and parameters? (Choose two.)
- A. Change embedding models and compare performance.
- B. Add a classifier for user queries that predicts which book will best contain the answer. Use this to filter retrieval.
- C. Choose an appropriate evaluation metric (such as recall or NDCG) and experiment with changes in the chunking strategy, such as splitting chunks by paragraphs or chapters. Choose the strategy that gives the best performance metric.
- D. Pass known questions and best answers to an LLM and instruct the LLM to provide the best token count. Use a summary statistic (mean, median, etc.) of the best token counts to choose chunk size.
- E. Create an LLM-as-a-judge metric to evaluate how well previous questions are answered by the most appropriate chunk. Optimize the chunking parameters based upon the values of the metric.
Correct Answer: CE
Explanation:
I agree with the suggested answer CE. Optimizing a RAG pipeline requires a methodical, metrics-driven approach to evaluate how changes in chunking strategy directly impact retrieval quality and generation accuracy.

Reason

Option C is correct because it advocates for using quantitative retrieval metrics like Recall or NDCG to measure the effectiveness of different chunking strategies (e.g., fixed-size vs. semantic boundaries). Option E is correct because LLM-as-a-judge provides a scalable way to evaluate the semantic relevance and quality of the generated answers relative to the retrieved chunks, allowing for fine-grained optimization of parameters based on model feedback.

Why the other options are not as suitable
- Option A is incorrect because changing the embedding model optimizes the vector representation of text but does not directly optimize the chunking strategy itself; it introduces a new variable rather than tuning the existing one.
- Option B is incorrect because adding a classifier for metadata filtering is a retrieval optimization technique, not a method for determining the ideal size or boundary of the chunks.
- Option D is incorrect because asking an LLM to provide a 'best token count' for a chunk is an arbitrary and non-standard heuristic; chunking needs to be evaluated based on the performance of the downstream task (retrieval and generation) rather than a predicted statistic.
Citations
- https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html
- https://www.databricks.com/blog/LLM-as-a-judge-metrics-evaluation-RAG
Question 2

A Generative AI Engineer is designing a RAG application for answering user questions on technical regulations as they learn a new sport.
What are the steps needed to build this RAG application and deploy it?
- A. Ingest documents from a source –> Index the documents and saves to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> Evaluate model –> LLM generates a response –> Deploy it using Model Serving
- B. Ingest documents from a source –> Index the documents and save to Vector Search –> User submits queries against an LLM –> LLM retrieves relevant documents –> LLM generates a response -> Evaluate model –> Deploy it using Model Serving
- C. Ingest documents from a source –> Index the documents and save to Vector Search –> Evaluate model –> Deploy it using Model Serving
- D. User submits queries against an LLM –> Ingest documents from a source –> Index the documents and save to Vector Search –> LLM retrieves relevant documents –> LLM generates a response –> Evaluate model –> Deploy it using Model Serving
Correct Answer: B
Explanation:
I agree with the community and the suggested answer of Option B. It correctly outlines the logical lifecycle of a RAG (Retrieval-Augmented Generation) application, moving from data preparation to inference testing, then evaluation, and finally production deployment.

Reason

Option B is correct because it follows the necessary technical sequence: 1) Ingestion and Vector Search indexing prepare the knowledge base. 2) The middle steps (Query -> Retrieval -> Generation) represent the 'inner loop' or functional logic of the RAG chain. 3) Evaluation must occur after the generation step to assess the quality of the retrieved context and the final response. 4) Model Serving is the final step to make the validated application available for production use.

Why the other options are not as suitable
- Option A is incorrect because it places Evaluate model before the LLM generates a response. You cannot fully evaluate a RAG system's end-to-end performance without the generated output.
- Option C is incorrect because it skips the core logic of the RAG application (the query, retrieval, and generation phase), meaning there is no application behavior to evaluate before deployment.
- Option D is incorrect because it starts with User submits queries before any documents have been ingested or indexed into the Vector Search, which would result in the retrieval step failing.
Citations
- https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html
- https://docs.databricks.com/en/generative-ai/generative-ai-workflow.html
Question 3

A Generative AI Engineer just deployed an LLM application at a digital marketing company that assists with answering customer service inquiries.
Which metric should they monitor for their customer service LLM application in production?
- A. Number of customer inquiries processed per unit of time
- B. Energy usage per query
- C. Final perplexity scores for the training of the model
- D. HuggingFace Leaderboard values for the base LLM
Correct Answer: A
Explanation:
I agree with the chosen answer A. In a production environment for a customer service application, throughput and latency metrics are critical for ensuring the system can handle the real-world load of user requests.

Reason

Option A is correct because monitoring the number of inquiries processed per unit of time (throughput) is a standard operational metric for production LLM applications. It helps engineers understand if the system is meeting service-level agreements (SLAs), managing traffic spikes, and maintaining responsiveness for the end users.

Why the other options are not as suitable
- Option B is incorrect because energy usage per query is generally a sustainability or cost-optimization metric rather than a primary performance or reliability metric used for monitoring the health of a customer service application in production.
- Option C is incorrect because perplexity scores are evaluated during the training or fine-tuning phase to measure how well the model predicts a sample; they are not a live production monitoring metric for application performance.
- Option D is incorrect because HuggingFace Leaderboard values are static benchmarks for base models used during the model selection phase of development, not dynamic metrics used to monitor a deployed application.
Citations
- https://docs.databricks.com/en/machine-learning/model-serving/metrics-export.html
- https://www.databricks.com/blog/how-monitor-generative-ai-applications-production
Question 4

A Generative AI Engineer is building a Generative AI system that suggests the best matched employee team member to newly scoped projects. The team member is selected from a very large team. The match should be based upon project date availability and how well their employee profile matches the project scope. Both the employee profile and project scope are unstructured text.
How should the Generative Al Engineer architect their system?
- A. Create a tool for finding available team members given project dates. Embed all project scopes into a vector store, perform a retrieval using team member profiles to find the best team member.
- B. Create a tool for finding team member availability given project dates, and another tool that uses an LLM to extract keywords from project scopes. Iterate through available team members’ profiles and perform keyword matching to find the best available team member.
- C. Create a tool to find available team members given project dates. Create a second tool that can calculate a similarity score for a combination of team member profile and the project scope. Iterate through the team members and rank by best score to select a team member.
- D. Create a tool for finding available team members given project dates. Embed team profiles into a vector store and use the project scope and filtering to perform retrieval to find the available best matched team members.
Correct Answer: D
Explanation:
I agree with the suggested answer D. In a RAG-based system dealing with a very large team, the most efficient architecture involves pre-computing embeddings for the static entities (the team member profiles) and using the dynamic input (the project scope) as a query. Using metadata filtering for availability ensures the search space is limited to valid candidates, optimizing performance.

Reason

Option D is correct because it aligns with vector search best practices for large datasets. By embedding team profiles into a vector store, the system can perform a semantic search using the project scope as the query vector. It correctly identifies that availability is a hard constraint that should be handled via filtering within the vector store or before the search, ensuring that only eligible team members are ranked by similarity.

Why the other options are not as suitable
- Option A is incorrect because it suggests embedding the project scopes and using team profiles as the query; since there are many team members and only one project scope at a time, it is far more efficient to search a database of team profiles.
- Option B is incorrect because keyword matching (lexical search) is less effective than semantic search for unstructured text like project scopes and profiles, and iterating through all members is not scalable.
- Option C is incorrect because it suggests iterating through team members to calculate scores individually; for a very large team, this O(n) approach is computationally expensive and slow compared to indexed approximate nearest neighbor (ANN) search in a vector store.
Citations
- https://docs.databricks.com/en/generative-ai/vector-search.html
- https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html
Question 5

A Generative AI Engineer is designing an LLM-powered live sports commentary platform. The platform provides real-time updates and LLM-generated analyses for any users who would like to have live summaries, rather than reading a series of potentially outdated news articles.
Which tool below will give the platform access to real-time data for generating game analyses based on the latest game scores?
- A. DatabricksIQ
- B. Foundation Model APIs
- C. Feature Serving
- D. AutoML
Correct Answer: C
Explanation:
I agree with the suggested answer C (Feature Serving). In a Generative AI context on Databricks, providing real-time, structured data (like live scores) to an LLM is best handled by Feature Serving, which allows for low-latency retrieval of the most recent state stored in Unity Catalog.

Reason

Feature Serving is designed to provide real-time access to features stored in Unity Catalog. In this scenario, live sports scores can be treated as features that are updated via streaming or batch processes and served via a REST API. This allows the LLM application to fetch the exact, latest metadata needed to ground the generative analysis in factual, real-time data.

Why the other options are not as suitable
- Option A is incorrect because DatabricksIQ refers to the AI-powered engine that powers the Databricks platform's internal capabilities (like natural language to SQL), rather than a specific tool for serving real-time external data to user-built models.
- Option B is incorrect because Foundation Model APIs provide access to the models themselves (like Llama or Mixtral), but they do not inherently provide the real-time data or context; the LLM would still need an external source for the live scores.
- Option D is incorrect because AutoML is used for automating the machine learning model development process (training and tuning) and does not play a role in real-time data retrieval during model inference.
Citations
- https://docs.databricks.com/en/machine-learning/feature-store/feature-serving.html
- https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html
Question 6

A Generative AI Engineer has a provisioned throughput model serving endpoint as part of a RAG application and would like to monitor the serving endpoint’s incoming requests and outgoing responses. The current approach is to include a micro-service in between the endpoint and the user interface to write logs to a remote server.
Which Databricks feature should they use instead which will perform the same task?
- A. Vector Search
- B. Lakeview
- C. DBSQL
- D. Inference Tables
Correct Answer: D
Explanation:
I agree with the chosen answer D. Inference Tables are the native Databricks solution specifically designed to log request and response payloads from Model Serving endpoints directly into a Delta Lake table for monitoring and analysis.

Reason

Option D is correct because Inference Tables automatically capture incoming requests and outgoing responses from Model Serving endpoints. They provide a scalable, managed way to log data without requiring external micro-services or custom logging logic, supporting real-time monitoring and long-term quality tracking in RAG applications.

Why the other options are not as suitable
- Option A is incorrect because Vector Search is a similarity search engine used to retrieve relevant document chunks during the retrieval phase of a RAG pipeline, not a monitoring or logging tool.
- Option B is incorrect because Lakeview (now known as AI/BI Dashboards) is a visualization tool used to create dashboards; while it can visualize log data, it does not perform the actual capture or logging of endpoint requests.
- Option C is incorrect because DBSQL (Databricks SQL) is a warehouse service used to run queries and manage data, but it is not a feature for automatically intercepting and logging model serving traffic.
Citations
- https://docs.databricks.com/en/machine-learning/model-serving/inference-tables.html
- https://docs.databricks.com/en/machine-learning/model-serving/index.html
Question 7

A Generative Al Engineer is tasked with improving the RAG quality by addressing its inflammatory outputs.
Which action would be most effective in mitigating the problem of offensive text outputs?
- A. Increase the frequency of upstream data updates
- B. Inform the user of the expected RAG behavior
- C. Restrict access to the data sources to a limited number of users
- D. Curate upstream data properly that includes manual review before it is fed into the RAG system
Correct Answer: D
Explanation:
I agree with the suggested answer D. In a Retrieval Augmented Generation (RAG) architecture, the model's output is heavily grounded in the retrieved context; therefore, the most effective way to prevent the generation of inflammatory or offensive content is to ensure the source knowledge base is curated and free of such material.

Reason

Option D is correct because data curation and manual review act as a primary safety layer. By filtering out offensive, biased, or inflammatory content from the upstream data before it is indexed in the vector database, the RAG system is deprived of the toxic context that would otherwise lead to harmful completions. This aligns with Data Governance and Responsible AI best practices in the Databricks Lakehouse.

Why the other options are not as suitable
- Option A is incorrect because increasing the frequency of updates does not address the quality or safety of the content; it would only refresh the potentially offensive data more often.
- Option B is incorrect because simply informing the user of bad behavior (disclaimers) does not mitigate the problem or prevent the offensive output from occurring in the first place.
- Option C is incorrect because restricting access to data sources based on user count does not solve the underlying issue of the data containing inflammatory content; authorized users would still receive offensive responses.
Citations
- https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html
- https://www.databricks.com/blog/building-responsible-ai-systems-lakehouse
Question 8

A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from.
Which will fulfill their need?
- A. context length 514; smallest model is 0.44GB and embedding dimension 768
- B. context length 2048: smallest model is 11GB and embedding dimension 2560
- C. context length 32768: smallest model is 14GB and embedding dimension 4096
- D. context length 512: smallest model is 0.13GB and embedding dimension 384
Correct Answer: D
Explanation:
I agree with the suggested answer Option D. The scenario explicitly states that cost and latency are the primary constraints, taking precedence over quality. Therefore, selecting the model with the lowest resource footprint (smallest model size and smallest embedding dimension) that matches the chunk size of 512 tokens is the optimal engineering choice.

Reason

Option D is correct because it offers the lowest latency and cost among all provided choices. It has the smallest model size (0.13GB) and the smallest embedding dimension (384), which directly reduces computational overhead and memory usage. Since the documents are chunked at 512 tokens, a context length of 512 is sufficient to process each individual chunk.

Why the other options are not as suitable
- Option A is incorrect because while it fits the 512-token requirement with a context length of 514, the model size (0.44GB) and embedding dimension (768) are significantly larger than
- Option D, leading to higher costs and latency.
- Option B is incorrect because a context length of 2048 is far in excess of the 512-token chunk size, and the 11GB model size would dramatically increase resource consumption and inference time.
- Option C is incorrect because a 32768 context length and 14GB model size represent the high end of quality and capacity, which contradicts the goal of prioritizing low cost and low latency.
Citations
- https://docs.databricks.com/en/generative-ai/generative-ai-fundamentals.html
- https://www.databricks.com/blog/choosing-right-embedding-model-rag-applications
Question 9

A small and cost-conscious startup in the cancer research field wants to build a RAG application using Foundation Model APIs.
Which strategy would allow the startup to build a good-quality RAG application while being cost-conscious and able to cater to customer needs?
- A. Limit the number of relevant documents available for the RAG application to retrieve from
- B. Pick a smaller LLM that is domain-specific
- C. Limit the number of queries a customer can send per day
- D. Use the largest LLM possible because that gives the best performance for any general queries
Correct Answer: B
Explanation:
I agree with the suggested answer B. Selecting a smaller, domain-specific model is a recognized best practice for balancing performance and operational costs in specialized fields like medical research.

Reason

Option B is correct because domain-specific LLMs (e.g., those trained on medical or scientific corpora) often outperform larger general-purpose models on specialized tasks while requiring significantly fewer computational resources. For a startup, this reduces the Token-based costs of Foundation Model APIs and decreases latency, meeting both cost-consciousness and customer performance needs.

Why the other options are not as suitable
- Option A is incorrect because limiting the retrieval corpus directly degrades the quality and accuracy of the RAG application, which is counterproductive for a research-focused field.
- Option C is incorrect because limiting user queries restricts the utility of the product for the customer and does not address the underlying efficiency of the architecture.
- Option D is incorrect because using the largest possible LLM is the least cost-conscious approach, as Foundation Model APIs typically charge based on model size and complexity, which would likely exceed the startup's budget without guaranteeing better domain-specific accuracy.
Citations
- https://www.databricks.com/blog/efficient-rag-engineering-best-practices
- https://docs.databricks.com/en/machine-learning/foundation-models/index.html
Question 10

A Generative Al Engineer is responsible for developing a chatbot to enable their company’s internal HelpDesk Call Center team to more quickly find related tickets and provide resolution. While creating the GenAI application work breakdown tasks for this project, they realize they need to start planning which data sources (either Unity Catalog volume or Delta table) they could choose for this application. They have collected several candidate data sources for consideration: call_rep_history: a Delta table with primary keys representative_id, call_id. This table is maintained to calculate representatives’ call resolution from fields call_duration and call start_time. transcript Volume: a Unity Catalog Volume of all recordings as a *.wav files, but also a text transcript as *.txt files. call_cust_history: a Delta table with primary keys customer_id, cal1_id. This table is maintained to calculate how much internal customers use the HelpDesk to make sure that the charge back model is consistent with actual service use. call_detail: a Delta table that includes a snapshot of all call details updated hourly. It includes root_cause and resolution fields, but those fields may be empty for calls that are still active. maintenance_schedule – a Delta table that includes a listing of both HelpDesk application outages as well as planned upcoming maintenance downtimes.
They need sources that could add context to best identify ticket root cause and resolution.
Which TWO sources do that? (Choose two.)
- A. call_cust_history
- B. maintenance_schedule
- C. call_rep_history
- D. call_detail
- E. transcript Volume
Correct Answer: DE
Explanation:
I agree with the suggested answer D and E. These sources provide direct evidence and historical data regarding the underlying causes and solutions for support tickets, which is essential for a Retrieval-Augmented Generation (RAG) system used in a chatbot.

Reason

Option D (call_detail) is correct because it explicitly contains root_cause and resolution fields, providing the chatbot with past examples of how similar issues were diagnosed and solved. Option E (transcript Volume) is correct because raw text transcripts contain the actual dialogue between customers and agents, offering deep, unstructured context and specific troubleshooting steps that might not be captured in structured tables.

Why the other options are not as suitable
- Option A is incorrect because call_cust_history focuses on billing and usage metrics (charge back models) rather than technical resolution data.
- Option B is incorrect because while a maintenance_schedule shows when systems were down, it does not provide the specific root cause or resolution for individual user tickets.
- Option C is incorrect because call_rep_history is a performance tracking table for HR/management purposes (calculating resolution rates and duration) and lacks the technical descriptive text needed to train or prompt a GenAI model on how to solve a specific problem.
Citations
- https://docs.databricks.com/en/generative-ai/retrieval-augmented-generation.html
- https://docs.databricks.com/en/data-governance/unity-catalog/index.html

Databricks Generative AI Engineer Associate (GAEA) Practice Questions & Study Guide

Free Databricks Generative AI Engineer Associate (GAEA) Practice Questions Preview

Question 1

Reason

Why the other options are not as suitable

Citations

Question 2

Reason

Why the other options are not as suitable

Citations

Question 3

Reason

Why the other options are not as suitable

Citations

Question 4

Reason

Why the other options are not as suitable

Citations

Question 5

Reason

Why the other options are not as suitable

Citations

Question 6

Reason

Why the other options are not as suitable

Citations

Question 7

Reason

Why the other options are not as suitable

Citations

Question 8

Reason

Why the other options are not as suitable

Citations

Question 9

Reason

Why the other options are not as suitable

Citations

Question 10

Reason

Why the other options are not as suitable

Citations

About This Practice Material