2026 Easy Success Snowflake DEA-C02 Exam in First Try
Best DEA-C02 Exam Dumps for the Preparation of Latest Exam Questions
NEW QUESTION # 124
You are designing a data pipeline in Snowflake to process IoT sensor data'. The data arrives in JSON format, and you need to extract specific nested fields using a Snowpark UDF for performance reasons. Which of the following statements are true regarding best practices and limitations when working with complex JSON data and Snowpark UDFs (Python or Scala)? (Select all that apply)
- A. Ensure the UDF is idempotent, meaning it produces the same output for the same input, as Snowflake might execute UDFs multiple times for optimization purposes.
- B. For highly complex JSON structures, consider using a Scala UDF with a robust JSON parsing library like Jackson or Gson for potentially better performance and control over error handling compared to Python UDFs.
- C. Leverage Snowflake's built-in 'PARSE_JSON' function and 'GET_PATH' function outside of the UDF as much as possible before passing the data to the UDF to reduce the complexity within the UDF itself.
- D. When working with Snowpark Python UDFs, it's recommended to use the 'json' module in Python to parse the JSON data within the UDF, as it's optimized for Snowflake's internal JSON representation.
- E. The maximum size of the JSON document that can be processed by a Snowpark UDF is directly limited by the maximum size of the UDF code itself (typically a few MB), requiring chunking strategies for large JSON payloads.
Answer: A,B,C
Explanation:
Options B, C, and D are correct. B: Scala UDFs often offer better performance for complex operations like JSON parsing due to the JVM's efficiency and the availability of robust libraries like Jackson. Snowflake support different approaches for handling JSON parsing such as Jackson. C: UDF idempotency is crucial for reliable results, as Snowflake might rerun them for optimizatiom D: Pre-processing JSON data using Snowflake's built-in functions ('PARSE_JSON', 'GET _ PATH') reduces the UDF's workload and can improve performance. Option A is incorrect because Snowflake has its own internal JSON representation; using standard Python 'json' module is not optimized for it. Option E is incorrect because the UDF input data size is limited by the maximum input size of the UDF and available memory not the size of the UDF code.
NEW QUESTION # 125
You are configuring cross-cloud replication for a Snowflake database named 'SALES DB' from an AWS (us-east-I) account to an Azure (eastus) account. You have already set up the necessary network policies and security integrations. However, replication is failing with the following error: 'Replication of database SALES DB failed due to insufficient privileges on object 'SALES DB.PUBLIC.ORDERS'.' What is the MOST LIKELY cause of this issue, and how would you resolve it? (Assume the replication group and target database exist).
- A. The replication group is missing the 'ORDERS' table. Alter the replication group to include the 'ORDERS' table: 'ALTER REPLICATION GROUP ADD DATABASE SALES DB;'
- B. The target Azure account does not have sufficient storage capacity. Increase the storage quota for the Azure account.
- C. The network policy is blocking access to the ORDERS table. Update the network policy to allow access to the ORDERS table.
- D. The user account performing the replication does not have the 'ACCOUNTADMIN' role in the AWS account. Grant the 'ACCOUNTADMIN' role to the user.
- E. The replication group does not have the necessary permissions to access the 'ORDERS' table in the AWS account. Grant the 'OWNERSHIP' privilege on the 'ORDERS table to the replication group: 'GRANT OWNERSHIP ON TABLE SALES DB.PUBLIC.ORDERS TO REPLICATION GROUP
Answer: E
Explanation:
The error message indicates a privilege issue on the 'ORDERS table. Replication requires the replication group to have 'OWNERSHIP' privileges on the objects being replicated. Granting 'OWNERSHIP ensures that the replication process can access and replicate the table's data and metadata. ACCOUNTADMIN is not required for granular object level replication. Simply adding the database, while necessary initially, doesn't automatically grant the necessary privileges on contained objects.
NEW QUESTION # 126
You are ingesting data from an AWS S3 bucket into a Snowflake table using a COPY INTO statement. The COPY INTO command fails with an error indicating 'Invalid stage location specified'. You have verified that the stage name is correct and the Snowflake user has the necessary privileges to access the stage. However, the error persists. Which of the following are potential causes and solutions for this issue?
- A. The S3 bucket policy is not correctly configured to allow Snowflake to assume the IAM role. Review the bucket policy to ensure it grants access to the Snowflake IAM role.
- B. The network policy configured in Snowflake is blocking access to the AWS S3 endpoint. Check the network policy rules and ensure they allow outbound traffic to the S3 region.
- C. The IAM role associated with the Snowflake stage is incorrect or does not have sufficient permissions to access the S3 bucket. Verify the IAM role configuration and permissions.
- D. The S3 bucket is encrypted using KMS and the Snowflake integration lacks the necessary key grant. Check the KMS key policy to ensure the storage integration IAM role has decrypt permission.
- E. The external stage definition in Snowflake includes an incorrect storage integration. Examine and correct the STORAGE INTEGRATION parameter in the CREATE STAGE statement.
Answer: A,C,D,E
Explanation:
The 'Invalid stage location specified' error can be misleading. It often masks underlying issues with permissions, network connectivity, or encryption. Options A, B, C, and E represent the most common causes. IAM role misconfiguration (A), incorrect S3 bucket policy (B), invalid STORAGE INTEGRATION (C), and missing KMS key grants (E) can all lead to this error. Network Policy issues though possible are less likely if other S3 access is working and the error message is typically more direct.
NEW QUESTION # 127
You are tasked with optimizing the performance of a Snowflake virtual warehouse used for running several types of queries: short- running analytical queries with strict latency requirements, long-running batch data transformations, and ad-hoc queries from data scientists. The workload is unpredictable, and the team wants to minimize queueing and maximize resource utilization. Which warehouse configuration would be MOST appropriate to handle this mixed workload, minimizing cost and maximizing performance?
- A. A single Small warehouse with auto-suspend set to 60 minutes.
- B. A single X-Large warehouse with auto-suspend set to 5 minutes.
- C. Three separate warehouses: a Medium warehouse for analytical queries, a Large warehouse for batch transformations, and an X-Small warehouse for ad-hoc queries.
- D. A multi-cluster warehouse with a scaling policy of 'Standard' and a minimum of 1 and maximum of 3 clusters with auto-suspend set to 10 minutes.
- E. A multi-cluster warehouse with a scaling policy of 'Economy' and a minimum of 1 and maximum of 2 clusters with auto-suspend set to 5 minutes.
Answer: D
Explanation:
A multi-cluster warehouse with a 'Standard' scaling policy allows Snowflake to automatically scale up the number of clusters based on workload demands, providing better performance during peak times and reducing queueing. The 'Standard' policy aims to minimize query start times which is essential for the mix of short and long running queries. Option A is less flexible than multi-cluster. Option B is too small for the workload. Option D might be cost-effective but will likely lead to performance issues during peak times. Option E can work, but it increases administrative overhead and might not be as efficient as a multi-cluster warehouse for handling unpredictable workloads.
NEW QUESTION # 128
You need to unload data from a Snowflake table named 'CUSTOMER DATA to an AWS S3 bucket The data should be unloaded in Parquet format, partitioned by the 'CUSTOMER REGION' column, and automatically compressed with GZIP. Furthermore, you only want to unload customers whose 'REGISTRATION DATE is after '2023-01-01'. Which of the following 'COPY INTO' statement correctly achieves this?
- A. Option E
- B. Option C
- C. Option A
- D. Option D
- E. Option B
Answer: A
Explanation:
The correct 'COPY INTO' statement requires using a named stage and a named file format. A subquery is used to filter the data based on the 'REGISTRATION_DATE. The 'PARTITION BY clause is used to partition the data by 'CUSTOMER REGION'. You must create a FILE FORMAT seperately and refer to it later. Other solutions have syntax errors, incorrect stage references, or incorrect ordering of clauses. Option 'A' doesn't use a stage nor it allows for a where condition. Option 'B' doesn't work since 'TYPE' and 'COMPRESSIONS are properties of a file format, not direct arguments to FILE_FORMAT. 'C' includes the 'TYPE and 'COMPRESSION' inline when this is not allowed. 'D' contains the same error with FILE FORMAT as 'C' and 'B' and does not use a stage.
NEW QUESTION # 129
A financial services company, 'Acme Finance', wants to share aggregated, anonymized transaction data with a research firm, 'Data Insights', through a Snowflake Data Clean Room. Acme Finance needs to ensure that Data Insights can only analyze the data using pre- defined aggregate functions and cannot access the raw, underlying transactional details. Acme Finance has already created a secure view to share the aggregated data'. Which of the following steps are necessary to grant Data Insights access to the data securely while enforcing the required restrictions?
- A. Create a share object and grant USAGE privilege on the database containing the secure view to the share. Then, grant SELECT privilege on the secure view to the share. Finally, share the share with Data Insights' Snowflake account using their account identifier.
- B. Grant SELECT privilege on the secure view directly to the role used by Data Insights' Snowflake account.
- C. Create a masking policy that only allows aggregate functions to be executed by Data Insights' role and apply it to the relevant columns in the underlying table. Then, grant SELECT privilege on the secure view directly to the role used by Data Insights' Snowflake account.
- D. Create an external function that Data Insights can call to execute pre-approved aggregate functions on the underlying data. Grant USAGE on the function to Data Insights' role and create a secure view that uses that function.
- E. Create a row access policy that restricts the rows returned based on the role used by Data Insights. Then, grant SELECT privilege on the secure view directly to the role used by Data Insights' Snowflake account.
Answer: A
Explanation:
Option B correctly outlines the process of creating a share, granting USAGE on the database and SELECT on the secure view to the share, and then sharing the share with the consumer account. This is the fundamental mechanism for secure data sharing in Snowflake. Option A is incorrect because it doesn't utilize sharing. Options C and D are data governance mechanisms (masking and row access policies) that control data visibility within an account, not across accounts. Option E suggests an external function, which is not a standard approach for Data Clean Rooms and adds unnecessary complexity.
NEW QUESTION # 130
You have a Snowflake table called 'RAW ORDERS that contains semi-structured JSON data in a column named 'ORDER DETAILS. You need to extract specific fields from the JSON data, perform some data type conversions, and then load the transformed data into a relational table named 'CLEAN ORDERS'. Your requirements are as follows: 1. Extract the (STRING) from the JSON and store it as 'ORDER ID (NUMBER). 2. Extract the (STRING) from the JSON and store it as 'CUSTOMER ID (NUMBER). 3. Extract the 'order_date' (STRING) from the JSON and store it as 'ORDER DATE' (DATE). 4. Extract (STRING) from the JSON and store it as 'TOTAL AMOUNT' (FLOAT). Which of the following Snowpark Python code snippets correctly transforms the data and loads it into the 'CLEAN ORDERS table using a combination of Snowpark DataFrame operations and SQL? Assume that session 'sp' is already initialized.
- A. Option A
- B. Option C
- C. Option E
- D. Option D
- E. Option B
Answer: A
Explanation:
Option A correctly uses Snowpark to extract and transform the JSON data. It utilizes the correct syntax for extracting values from the column using the colon notation It then uses 'to_date', and 'to_float' to perform the necessary data type conversions. Finally, it saves the transformed DataFrame to a new table called 'CLEAN ORDERS' using overwrite mode. Option B uses incorrect syntax with 'col('ORDER_DETAlLS:order_id')'. Option C uses the 'get()' function, which is more appropriate when the JSON structure is unknown, and using colon notation is simpler and faster when the JSON structure is known. Option D attempts to use , which returns a JSON object that can be accessed with square brackets. The correct Snowflake function to access JSON properties in this manner without parsing, is through the colon notation. Option E uses which is a Snowflake SQL function, not a native Snowpark function, and requires JSON path, making it less efficient than direct extraction using Although E will work, A is more optimized way to write code.
NEW QUESTION # 131
A Snowflake data pipeline utilizes Snowpipe to ingest JSON data from cloud storage into a raw staging table 'RAW DATA' Subsequently, a series of transformation tasks are executed to cleanse, transform, and load the data into fact and dimension tables. You've noticed significant performance degradation in the transformation tasks, especially when dealing with large JSON payloads and deeply nested structures. Which of the following optimization techniques, applied at different stages of the pipeline, would MOST likely improve the overall performance of the data transformation tasks?
- A. Partitioning the 'RAW DATA' staging table based on the ingestion timestamp to reduce the amount of data scanned during transformation.
- B. Replacing the transformation tasks with external functions implemented in Python using Snowpark, leveraging the power of Pandas DataFrames for JSON processing.
- C. Increasing the virtual warehouse size used by the transformation tasks to provide more compute resources.
- D. Employing Snowflake's 'LATERAL FLATTEN' function with appropriate 'PATH' expressions to efficiently extract the required attributes from the JSON data during transformation.
- E. Using the file format option when defining the Snowpipe integration to remove the outer array from the JSON data before ingestion.
Answer: A,C,D
Explanation:
Options A, C, and D address different aspects of performance optimization. Increasing the virtual warehouse size (A) provides more resources for the transformation tasks. Using SLATERAL FLATTEN' effectively (C) optimizes JSON parsing. Partitioning the ' RAW_DATR table (D) reduces the data scanned, improving query performance. While (B) can be helpful, it depends on the JSON structure. Snowpark UDFs (E) can introduce overhead due to the serialization and deserialization between Snowflake and the external environment.
NEW QUESTION # 132
You have a Snowflake table named 'ORDERS clustered on 'ORDER DATE. After a significant data load, you want to evaluate the effectiveness of the clustering. Which of the following SQL queries, using Snowflake system functions, will provide insights into the clustering depth and overlap of micro-partitions in the 'ORDERS' table, specifically helping you identify whether re-clustering is necessary? Assume that the table
- A.

- B.

- C.

- D.

- E.

Answer: A
Explanation:
The query SELECT avg_depth, avg_overlap FROM is the correct approach. The function, when given the table name and the clustering key column(s), returns information about the clustering state. Using 'TABLE()' allows you to extract 'avg_deptm and 'avg_overlap', which are key metrics for assessing clustering effectiveness. 'avg_depth' indicates how well the data is clustered (lower is better), and 'avg_overlap' indicates the degree of overlap between micro-partitions (lower is better). A high 'avg_depth' or 'avg_overlap' suggests the need for re-clustering. Option A returns a JSON which is difficult to process to get the required metrics. Option B is missing the clustering key. Option C returns JSON and not the desired output. Option E is not valid SQL syntax in Snowflake.
NEW QUESTION # 133
You are developing a data pipeline that extracts data from an on-premise PostgreSQL database, transforms it, and loads it into Snowflake. You want to use the Snowflake Python connector in conjunction with a secure method for accessing the PostgreSQL database. Which of the following approaches provides the MOST secure and manageable way to handle the PostgreSQL connection credentials in your Python script when deploying to a production environment?
- A. Prompt the user for the PostgreSQL username and password each time the script is executed.
- B. Hardcode the PostgreSQL username and password directly into the Python script.
- C. Store the PostgreSQL username and password in a dedicated secrets management service (e.g., AWS Secrets Manager, HashiCorp Vault, Azure Key Vault) and retrieve them in the Python script using the appropriate API.
- D. Store the PostgreSQL username and password in environment variables and retrieve them in the Python script using 'os.environ'
- E. Store the PostgreSQL username and password in a configuration file (e.g., JSON or YAML) and load the file in the Python script.
Answer: C
Explanation:
Option D, using a dedicated secrets management service, provides the most secure and manageable approach. Secrets management services are designed to securely store and manage sensitive information like database credentials. They offer features like encryption, access control, auditing, and versioning, making them the best choice for production environments. Option A is highly insecure. Options B and C are better than A but still less secure than using a secrets management service, as environment variables and configuration files can be accidentally exposed or committed to version control. Option E is impractical and insecure for automated pipelines.
NEW QUESTION # 134
You have a 'SALES table and a 'PRODUCTS table. The 'SALES table contains daily sales transactions, including 'SALE DATE , 'PRODUCT ID', and 'QUANTITY. The 'PRODUCTS table contains 'PRODUCT and 'CATEGORY. You need to create a materialized view to track the total quantity sold per category daily, optimized for fast query performance. You anticipate frequent updates to the 'SALES table but infrequent changes to the 'PRODUCTS table. Which of the following strategies would provide the MOST efficient materialized view implementation, considering both data freshness and query performance?
- A. Create a standard materialized view that joins 'SALES' and 'PRODUCTS' , grouping by 'SALE_DATE and 'CATEGORY, and defining a clustering key on ' CATEGORY.
- B. Create a standard materialized view that joins 'SALES' and 'PRODUCTS' , grouping by 'SALE_DATE and 'CATEGORY, and defining a clustering key on 'SALE DATE' and 'CATEGORY.
- C. Create a standard materialized view that joins 'SALES' and 'PRODUCTS' , grouping by 'SALE DATE and 'CATEGORY without any specific clustering key.
- D. Create two materialized views: one for daily sales by product and another joining the first with 'PRODUCTS' to aggregate by category. Cluster the first view by 'SALE DATE' and the second by 'CATEGORY'.
- E. Create a standard materialized view that joins 'SALES' and 'PRODUCTS' , grouping by 'SALE_DATE and 'CATEGORY, and defining a clustering key on 'SALE DATE.
Answer: E
Explanation:
Option B is most efficient. Clustering the materialized view on 'SALE_DATE will significantly improve query performance when filtering or grouping by date, which is a common operation in time-series data. Although frequent updates will affect the maintenance costs of the materialized view, querying on date will be very efficient. Option A is less efficient due to the lack of clustering. Option C may not be the best choice if filtering/grouping primarily occurs on date. Option D is also good, but Option B is better if most of the query filter is on SALE DATE. Option E introduces complexity and two refreshes may create a delay in data available.
NEW QUESTION # 135
You are developing a data pipeline that uses Snowpipe Streaming to ingest JSON data into a Snowflake table. Some JSON documents contain nested arrays and complex structures. You need to flatten the JSON structure during ingestion to simplify querying. Consider the following JSON document: { "order id": 123, "customer": { "id": "cust123", "name": "John Doe", "address": { "street": "123 Main St", "city": "Anytown" } }, "items": [ {"product_id": "prodl", "quantity": 2}, {"product_id": "prod2", "quantity": 1} ] } Which approach would you use within the 'COPY INTO' statement of your Snowpipe to flatten this JSON structure during ingestion?
- A. Use the ' FLATTEN()' table function directly within the 'COPY INTO' statement to expand the 'items' array and extract nested fields. For nested objects, use dot notation directly in the SELECT list (e.g., 'customer.name').
- B. Create a separate transformation pipeline using Snowflake Tasks to flatten the data after it is ingested into the table.
- C. Use JavaScript UDFs within the 'COPY INTO' statement to recursively flatten the JSON structure.
- D. Pre-process the JSON documents before loading them into the stage using a custom script to flatten the structure.
- E. Snowpipe and the 'COPY INTO' command automatically flattens JSON data during ingestion; no additional steps are required.
Answer: A
Explanation:
Snowflake's 'FLATTEN()' function, combined with dot notation for nested objects, provides the most efficient way to flatten JSON data during ingestion within the 'COPY INTO' statement. Options B, C, and D introduce unnecessary complexity and latency. Snowpipe does NOT automatically flatten JSON (E).
NEW QUESTION # 136
You are tasked with building a data pipeline that incrementally loads data from an external cloud storage location (AWS S3) into a Snowflake table named 'SALES DATA'. You want to optimize the pipeline for cost and performance. Which combination of Snowflake features and configurations would be MOST efficient and cost-effective for this scenario, assuming the data volume is substantial and constantly growing?
- A. Develop a custom Python script that uses the Snowflake Connector for Python to connect to Snowflake and execute a COPY INTO command. Schedule the script to run on an EC2 instance using cron.
- B. Use a Snowflake Task to regularly truncate and reload 'SALES DATA" from S3 using COPY INTO. This ensures data consistency.
- C. Create an external stage pointing to the S3 bucket. Create a Snowpipe with auto-ingest enabled, using an AWS SNS topic and SQS queue for event notifications. Configure the pipe with an error notification integration to monitor ingestion failures.
- D. Employ a third-party ETL tool to extract data from S3, transform it, and load it into Snowflake using JDBC. Schedule the ETL process using the tool's built-in scheduler.
- E. Use a Snowflake Task scheduled every 5 minutes to execute a COPY INTO command from S3, with no file format specified, assuming the data is CSV and auto-detection will work.
Answer: C
Explanation:
Snowpipe with auto-ingest is the most efficient and cost-effective solution for continuously loading data into Snowflake from cloud storage. It leverages event notifications to trigger data loading as soon as new files are available, minimizing latency and compute costs. Option A lacks error handling and proper file format specification. Option C involves custom coding and infrastructure management. Option D introduces overhead and costs associated with a third-party ETL tool. Option E is inefficient as it truncates and reloads the entire table, losing any incremental loading benefits.
NEW QUESTION # 137
Which of the following statements are true regarding data masking policies in Snowflake? (Select all that apply)
- A. Data masking policies are supported on external tables.
- B. Once a masking policy is applied to a column, the original data is permanently altered.
- C. Data masking policies can be applied to both tables and views.
- D. Different masking policies cannot be applied to different columns within the same table.
- E. The 'CURRENT_ROLE()' function can be used within a masking policy to implement role-based data masking.
Answer: A,C,E
Explanation:
A and D are correct. Masking policies can be applied to tables and views, and the function is essential for implementing role-based masking. B is incorrect because masking policies apply dynamically at query time and don't alter the underlying data. C is incorrect; different policies can be applied to different columns. E is correct, Data masking policies are also supported on external tables.
NEW QUESTION # 138
A critical database, 'PRODUCTION DB', in your Snowflake account was accidentally dropped. You need to restore it as quickly as possible, but you're unsure if Time Travel retention is sufficient. Which method guarantees restoration of the database even if it falls outside the Time Travel window?
- A. Utilize the data cloning feature: 'CREATE DATABASE CLONE PRODUCTION_DB BEFORE (STATEMENT 'DROP DATABASE PRODUCTION_DB');'
- B. Restore from a Snowflake-managed backup using the 'CREATE DATABASE ... FROM BACKUP' command. Specify the timestamp before the drop occurred.
- C. Fail-safe cannot be directly accessed by the user for restoration purposes; it is only used by Snowflake Support in extreme disaster recovery scenarios.
- D. Contact Snowflake Support and request restoration from Fail-safe.
- E. Use the 'UNDROP DATABASE PRODUCTION command.
Answer: C
Explanation:
Fail-safe is a last resort for data recovery managed entirely by Snowflake. Users cannot directly access or restore data from Fail- safe. Options B and C are valid for Time Travel, but fail if the data falls outside of that window. Option A is partially correct; you contact Snowflake support, who then might use fail-safe if appropriate, but option E is the most accurate answer. Option D uses Time Travel, which may not work.
NEW QUESTION # 139
You have a table 'ORDERS in your Snowflake database. You are implementing a new data transformation pipeline. Before deploying the pipeline to production, you want to validate the changes in a development environment. You decide to use Time Travel to create a snapshot of the 'ORDERS' table before the transformation and compare it with the transformed data'. Which sequence of SQL commands would best facilitate this validation, assuming your development database and schema structure mirrors production?
- A.

- B.

- C.

- D.

- E.

Answer: D
Explanation:
Option D is the most complete and reliable approach. It first creates a backup table in production using Time Travel before the transformation. Then, clones both the original (pre-transformation) table and the transformed table into the development environment. Finally, it compares these cloned tables to validate the transformation. The use of LAST_QUERY_ID() would not be suitable since the clone is required in the same session, while TIMESTAMP based approach is less reliable due to the lack of synchronisation on when the query was executed.
NEW QUESTION # 140
You're managing a Snowflake data warehouse and need to create a development environment for testing a complex stored procedure that updates a critical table, 'SALES DATA'. The procedure is located in the 'PRODUCTION' database and you want to ensure minimal impact to the production environment during development. You decide to use cloning and time travel. Which of the following strategies is the MOST efficient and safest approach to achieve this, minimizing downtime and resource consumption in production?
- A. Clone the entire 'PRODUCTION' database into a new development database. This ensures developers have access to all necessary data and dependencies but consumes significant storage and may impact production performance during the cloning process.
- B. Clone the "PRODUCTION' database. Immediately after cloning, use Time Travel to revert the 'SALES_DATR table in the development database to a state before the stored procedure was last run in production. Then clone the stored procedure itself. This gives a starting point without the procedure's impact.
- C. Clone only the 'SALES DATA' table into a development database. This minimizes storage consumption but requires developers to manually recreate or mock any dependencies the stored procedure has on other tables in the 'PRODUCTION' database.
- D. Create a snapshot of the 'SALES DATA' table using Time Travel at a specific timestamp (e.g., 1 hour ago), then clone only the stored procedure, updating it to point to the Time Travel version of 'SALES DATA' in the development environment. This provides a consistent dataset for testing while minimizing the impact on production and cloned data volumes.
- E. Clone the schema in which 'SALES_DATX is stored along with the stored procedure. Use time travel on the cloned schema to revert all objects in the schema to a point in time before the stored procedure was last run, then update the stored procedure to point to the cloned schema. This gives a consistent starting point for testing in development.
Answer: E
Explanation:
Option E offers the best balance of minimal impact and realistic testing. Cloning the entire database (A) is resource-intensive. Cloning only the table (B) requires significant manual setup to address dependencies. Option C might result in unpredictable behavior if any data dependencies exist that rely on related tables. Option D is almost correct, but the risk is that other objects in the 'PRODUCTION' database schema might change resulting in incomplete testing. Cloning the schema and using Time Travel on the schema level before updating the procedure gives the most consistent and efficient development setup and the best balance.
NEW QUESTION # 141
You are tasked with managing a large Snowflake table called 'TRANSACTIONS'. Due to compliance requirements, you need to archive data older than one year to long-term storage (AWS S3) while ensuring the queries against the current 'TRANSACTIONS' table remain performant. What is the MOST efficient strategy using Snowflake features and considering minimal impact on query performance?
- A. Export the historical data to S3 using COPY INTO, truncate the 'TRANSACTIONS' table, and then create an external table pointing to the archived data in S3.
- B. Partition the 'TRANSACTIONS table by date. Export the old partitions of the 'TRANSACTIONS' table to S3 using COPY INTO. Then, drop the old partitions from the 'TRANSACTIONS table and create an external table that points to the data in S3.
- C. Create an external table pointing to S3. Then create new table named 'TRANSACTIONS_ARCHIVE in Snowflake, copy the historical data from 'TRANSACTIONS' table into 'TRANSACTIONS ARCHIVE, and then delete the archived data from the 'TRANSACTIONS' table.
- D. Use Time Travel to clone the "TRANSACTIONS' table to a point in time one year ago. Then, export the cloned table to S3 and drop the cloned table. Delete the archived data from the 'TRANSACTIONS table.
- E. Create a new table 'TRANSACTIONS_ARCHIVE in Snowflake, copy the historical data, and then delete the archived data from the 'TRANSACTIONS table.
Answer: A
Explanation:
Option B is the most efficient. Using 'COPY INTO' to export to S3 is a fast and optimized way to move data. Truncating the table is faster than deleting a large number of rows. Creating an external table allows you to query the archived data in S3 if needed, without ingesting it into Snowflake. Options A & C create another Snowflake table which will consume snowflake credits and storage, which might be costly for a long term storage. Option D cloning is an expensive operation. Option E Partitioning in Snowflake is not natively supported, and would require manual management using external tables and views.
NEW QUESTION # 142
You are designing a data warehouse for an e-commerce company. One of the requirements is to provide fast analytics on order fulfillment times by region. You have two tables: 'ORDERS: Contains order information, including ID, 'ORDER DATE, 'REGION ID, and 'FULFILLMENT DATE. 'REGIONS': Contains region information, including 'REGION ID' and Due to the large size of the 'ORDERS' table and the complexity of calculating fulfillment times, you decide to use materialized views.
Which of the following combinations of materialized view definition and Snowflake features would BEST optimize query performance and minimize data staleness for this scenario? Choose two options.
- A. create a materialized view that joins 'ORDERS' and 'REGIONS', calculates 'FULFILLMENT_TIME' grouped by 'REGION_NAME, and cluster by 'REGION NAM Configure incremental data refreshes.
- B. Create a materialized view that joins 'ORDERS and 'REGIONS', calculates the difference between 'FULFILLMENT DATE' and 'ORDER DATE as , and groups by REGION_NAME. Cluster the view by ' REGION_NAME.
- C. Partition the 'ORDERS' table by 'ORDER_DATE and create a materialized view that calculates 'FULFILLMENT_TIME grouped by REGION_NAME , clustering by 'ORDER DATE'
- D. Use Snowflake's search optimization service on the 'ORDERS' table instead of creating a materialized view.
- E. Create a materialized view that joins 'ORDERS and 'REGIONS', calculates 'FULFILLMENT TIME', and groups by 'REGION NAME'. Do not specify a clustering key.
Answer: A,B
Explanation:
Options A and E, both provides optimized performance, A pre-computes the aggregated and joins then cluster the data, the use of a materialized view to pre-calculate fulfillment times and grouping by region significantly speeds up queries. Clustering by 'REGION NAME further optimizes queries filtered by region. E, Incremental refreshes are crucial for maintaining data freshness with minimal performance impact. Because incremental refreshes do not support partition. Option B is not performant if we don't do any clustering on the MV. Option C does not support incremental refresh and its not good in this case. Option D partitioning the original table has no impact on MV query performance.
NEW QUESTION # 143
You are tasked with implementing data masking on a 'CUSTOMER' table. The requirement is to mask the 'EMAIL' column for all users except those with the 'DATA ADMIN' role. You have the following code snippet. What is wrong with it?
- A. The WITH clause is unneccessary.
- B. The masking policy is applied to the wrong column. It should be applied to the ID column, not the EMAIL column.
- C. There is no code provided, so there is nothing wrong with it.
- D. The masking policy syntax is incorrect. It should use 'CASE WHEN IS_ROLE_IN_SESSION('DATA_ADMIN') THEN EMAIL ELSE '[email protected]' END'.
- E. Without masking poliy code, it's impossible to determine if there is anything wrong.
Answer: E
Explanation:
Without the masking policy code, it's impossible to determine if there are any errors. Option A is wrong without any data, Option B can be correct but we cannot know without code. Option C may be right but we do not know as well. Option D assumes there is no code provided, but we simply can't see it here. The correct answer is E, since we cannot determine the answer without code.
NEW QUESTION # 144
You are using Snowpark Python to transform a DataFrame 'df_orderS containing order data'. You need to filter the DataFrame to include only orders with a total amount greater than $1000 and placed within the last 30 days. Assume the DataFrame has columns 'order_id', 'order_date' (timestamp), and 'total_amount' (numeric). Which of the following code snippets is the MOST efficient and correct way to achieve this filtering using Snowpark?
- A. Option D
- B. Option C
- C. Option A
- D. Option E
- E. Option B
Answer: A
Explanation:
Option D is the most efficient and correct. It uses 'snowflake.snowpark.functions' to correctly reference the columns using and 'dateadd()" for date arithmetic. Option A and C attempts to use native python date functions, and Option E passes a SQL string directly to the filter, bypassing Snowpack's function calls. and 'filter()' are functionally equivalent in Snowpark. Option B, while technically correct, uses 'dateadd' with which is better suited for Snowflake SQL than Snowpark operations. Option D keeps the entire filtering logic within Snowpark.
NEW QUESTION # 145
You're building a data product on the Snowflake Marketplace that includes a view that aggregates data from a table containing Personally Identifiable Information (PII). You need to ensure that consumers of your data product CANNOT directly access the underlying PII data but can only see the aggregated results from the view. What is the MOST secure and recommended approach to achieve this?
- A. Grant the 'SELECT privilege only on the to the share used for the Marketplace listing. Do not grant any privileges on the underlying PII table.
- B. Grant the 'SELECT privilege directly on the underlying PII table to the share used for the Marketplace listing, along with the 'SELECT privilege on 'sensitive data view'.
- C. Grant USAGE privilege on the database containing the PII table and to the share.
- D. Grant 'READ privilege on the internal stage containing the data files backing the PII table.
- E. Create a stored procedure that returns the aggregated data, and grant EXECUTE privilege on the stored procedure to the share. The stored procedure SELECTs from the PII table.
Answer: A
Explanation:
Granting only 'SELECT privilege on the (option B) ensures that consumers can only access the view and not the underlying PII data. Granting 'SELECT on the underlying table (option A) defeats the purpose of the view. Using a stored procedure (option C), while potentially masking the data access, is less performant and can still expose data if not carefully implemented. 'USAGE privilege (option D) only allows access to the database, not the data itself. 'READ' on the stage (option E) allows direct access to the raw data, which exposes the PII.
NEW QUESTION # 146
You have a requirement to continuously load data from a cloud storage location into a Snowflake table. The source data is in Avro format and is being appended to the cloud storage location frequently. You want to automate this process using Snowpipe. You've already created the Snowpipe and the associated stage and file format. However, you notice that some files are being skipped during the ingestion process, and data is missing in your Snowflake table. What is the MOST likely reason for this issue, assuming all necessary permissions and configurations (stage, file format, pipe definition) are correctly set up?
- A. The cloud storage event notifications are not properly configured to trigger Snowpipe.
- B. The Snowpipe is paused due to exceeding the daily quota.
- C. The file format definition in Snowflake is incompatible with the Avro schema.
- D. Snowflake does not support Avro format for Snowpipe.
- E. The data files in cloud storage are not being automatically detected by Snowpipe.
Answer: A
Explanation:
Option D is the most likely reason. Snowpipe relies on cloud storage event notifications (e.g., SQS for AWS S3, Event Grid for Azure Blob Storage, Pub/Sub for Google Cloud Storage) to trigger data ingestion. If these notifications are not properly configured, Snowpipe will not be notified of new files being added to the storage location, resulting in skipped files. Option A is possible, but less likely if the pipe was just created and initial data loading is failing. Option B is incorrect; Snowpipe detects files based on event notifications, not by continuously scanning the storage location. Option C could be an issue, but the question states the file format is defined. Option E is incorrect; Snowpipe does support Avro format.
NEW QUESTION # 147
......
DEA-C02 Study Material, Preparation Guide and PDF Download: https://pass4itsure.passleadervce.com/SnowPro-Advanced/reliable-DEA-C02-exam-learning-guide.html