Documentation | Pancake

Pancake Quick Start Guide

Copy and paste into a Snowflake worksheet

Global Privileges and Warehouse Set Up Script

Script Placeholders:

<application_install_role> (x1)
<accountadmin_or_equivalent_role> (x2)
<warehouse_create_role> (x2)

Script Steps

#1 - Grant access to Pancake to other roles in your organization
#2 - Grant access to Pancake to execute tasks in your account to read and process data.
#3 - Create the Pancake Warehouses and grant the USAGE privilege to Pancake
#4 - Create a temporary table used to store the Pancake Procedure call results used to add warehouses to Pancake
#5 - Add the warehouse information to Pancake.

--#1 - Execute the following SQL statement to grant your ACCOUNTADMIN role or any other role that can grant the EXECUTE TASK privilege on your account to PANCAKE so that role can complete the next step. If you are installing Pancake with the ACCOUNTADMIN role then grant access to other roles that will need access to Pancake such as the role used to create warehouses.

USE ROLE <application_install_role>;
GRANT APPLICATION ROLE PANCAKE.app_public TO ROLE <accountadmin_or_equivalent_role>;
GRANT APPLICATION ROLE PANCAKE.app_public TO ROLE <warehouse_create_role>;

--optional
GRANT APPLICATION ROLE PANCAKE.app_public TO ROLE <other_roles_that_need_access_to_pancake>;

--#2 - Execute this SQL to grant the EXECUTE TASK PRIVILEGE to PANCAKE. Tasks are used by PANCAKE to scan data.

USE ROLE <accountadmin_or_equivalent_role>;

GRANT EXECUTE TASK ON ACCOUNT TO APPLICATION PANCAKE;
GRANT CREATE WAREHOUSE ON ACCOUNT TO APPLICATION PANCAKE;

--#3 - To perform the next set of actions please use the role that has access to manage and create Snowflake virtual warehouses.

USE ROLE <warehouse_create_role>;

--Execute the following SQL to create Snowflake virtual warehouses that will be used by Pancake to scan semi-structured data.
--An X-Small virtual warehouse will be used to create data sources and complete quick scans with a limit 1000 or less per data source.

CREATE OR REPLACE WAREHOUSE PANCAKE_X_SMALL_01 WITH
WAREHOUSE_SIZE = 'X-Small';

--A medium sized Snowflake virtual warehouse with snowpark optimization is required to scan data because of its
--increased memory capacity. The max concurrency level is set to 1 to maximize compute and memory availability.

CREATE OR REPLACE WAREHOUSE PANCAKE_MEDIUM_OPTIMIZED_01 WITH
WAREHOUSE_SIZE = 'Medium'
WAREHOUSE_TYPE = 'SNOWPARK-OPTIMIZED'
MAX_CONCURRENCY_LEVEL = 1;

--Execute the following SQL to grant access to the newly created warehouse from Step #5 to the PANCAKE application.

GRANT USAGE on WAREHOUSE PANCAKE_X_SMALL_01 to APPLICATION PANCAKE;
GRANT USAGE on WAREHOUSE PANCAKE_MEDIUM_OPTIMIZED_01 to APPLICATION PANCAKE;

--#4 - Create the tempoary table to store the Pancake procedure call results

--To perform the next set of actions please use the role that was used to install the application or another role that has been granted access to the Pancake.

USE ROLE <warehouse_create_role>;

--Execute the following SQL statement to create a temporary table used to store results that are created by Pancake Procedures which are executed in later step #5.

CREATE or replace TEMPORARY TABLE PANCAKE.CORE.SETUP_RESULTS (
object_type varchar,
object_name varchar,
call_result varchar
);

--#5 - Create the following code block for each warehouse you have granted usage rights to use with PANCAKE.
--Add the newly created virtual warehouse in Step #5a and #6a to PANCAKE

DECLARE
warehouse_name VARCHAR DEFAULT 'PANCAKE_X_SMALL_01';
warehouse_size VARCHAR DEFAULT 'X-Small';
call_result VARCHAR;
BEGIN
CALL PANCAKE.CORE.ADD_WAREHOUSE(:warehouse_name, :warehouse_size) into :call_result;
insert into PANCAKE.CORE.SETUP_RESULTS values ('Warehouse',:warehouse_name,:call_result);
END;

--Add the newly created virtual warehouse in Step #5b and #6b to PANCAKE

DECLARE
warehouse_name VARCHAR DEFAULT 'PANCAKE_MEDIUM_OPTIMIZED_01';
warehouse_size VARCHAR DEFAULT 'Medium';
call_result VARCHAR;
BEGIN
CALL PANCAKE.CORE.ADD_WAREHOUSE(:warehouse_name, :warehouse_size) into :call_result;
insert into PANCAKE.CORE.SETUP_RESULTS values ('Warehouse',:warehouse_name,:call_result);
END;

--Show the results from all PANCAKE actions performed.
select * from PANCAKE.CORE.SETUP_RESULTS;

Data Object Grant Select Privileges and App Data Source Set Up Script

Script Steps
#1 - Execute select query to find variant columns in a database and produce code blocks to quickly onboard data sources to Pancake:
#2 - Grant the Select privilege to the database objects you want to scan.
#3 - Create temporary results table
#4 - Create Pancake data sources with a quick start scan configuration, then initiate the scan.

Script Placeholders:

<database_access_role> (x2)
<application_install_role> (x3)

<database_name> (x5 per database)
<schema_name> (x1 per schema)

#1 - Variant Column Query
Copy and run the following query for each database that may have database objects you wish to scan. This query is designed to filter for database objects that have columns with a VARIANT data type. Those objects could be Tables, External Tables, Views, or Materialized Views. You can download the results of this query into Google Sheets or Microsoft Excel to access code blocks generated for each object and column combination into this worksheet.

****In order to onboard data sources into PANCAKE more quickly, you can use the pre-generated code blocks to grant access to database objects to Pancake and for executing Pancake procedures to add data sources and initiate scans.

To perform the next set of actions please use the role that has access to all of the database objects in this database.

USE ROLE <database_access_role>;

select columns.table_catalog as db_name, columns.table_schema as schema_name, columns.table_name, column_name, tables.table_type as object_type, tables.row_count as object_row_count, tables.bytes as object_total_bytes, case when databases.type = 'IMPORTED DATABASE' then
concat('GRANT IMPORTED PRIVILEGES ON DATABASE ', case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, columns.table_catalog, case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, ' TO APPLICATION PANCAKE;')
else
concat('GRANT REFERENCES, SELECT ON ', case when tables.table_type = 'BASE TABLE' then 'TABLE' ELSE tables.table_type end ,' ', columns.table_catalog, '.', columns.table_schema, '.', case when upper(columns.table_name) not like columns.table_name or rlike(columns.table_name, '.[0-9].*') then '"' else '' end, columns.table_name, case when upper(columns.table_name) not like columns.table_name or rlike(columns.table_name, '.[0-9].*') then '"' else '' end, ' TO APPLICATION PANCAKE;') end as grant_statement_code_block,
concat('DECLARE \r\n\t',
'datasource_name VARCHAR DEFAULT ', '''', columns.table_catalog, '.', columns.table_schema, '.', columns.table_name, '''; \r\n\t',
'object_type VARCHAR DEFAULT ', '''', case when tables.table_type = 'BASE TABLE' then 'Table' when tables.table_type ='VIEW' then 'View' when tables.table_type = 'MATERIALIZED TABLE' then 'Materialized Table' when tables.table_type = 'EXTERNAL TABLE' then 'External Table' end, '''; \r\n\t',
'database_name VARCHAR DEFAULT ', '''', case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, columns.table_catalog, case when upper(columns.table_catalog) not like columns.table_catalog or rlike(columns.table_catalog, '.[0-9].*') then '"' else '' end, '''; \r\n\t','schema_name VARCHAR DEFAULT ', '''', case when upper(columns.table_schema) not like columns.table_schema or rlike(columns.table_schema, '.[0-9].*') then '"' else '' end, columns.table_schema, case when upper(columns.table_schema) not like columns.table_schema or rlike(columns.table_schema, '.[0-9].*') then '"' else '' end, '''; \r\n\t',
'object_name VARCHAR DEFAULT ', '''', case when upper(columns.table_name) not like columns.table_name or rlike(columns.table_name, '.[0-9].*') then '"' else '' end, columns.table_name, case when upper(columns.table_name) not like columns.table_name or rlike(columns.table_name, '.[0-9].*') then '"' else '' end, '''; \r\n\t','column_name VARCHAR DEFAULT ', '''', case when upper(column_name) not like column_name or rlike(column_name, '.[0-9].*') then '"' else '' end, column_name, case when upper(column_name) not like column_name or rlike(column_name, '.[0-9].*') then '"' else '' end, '''; \r\n\t',
'datasource_tag VARCHAR DEFAULT ', '''', '''; \r\n\t',
'warehouse_name VARCHAR DEFAULT ', '''', 'PANCAKE_X_SMALL_01', '''; \r\n\t','record_limit DEFAULT 150; \r\n\t', 'initiate_scan DEFAULT TRUE; \r\n\t', 'call_result VARCHAR;\r\n',
'BEGIN \r\n\t','CALL PANCAKE.core.add_datasource_with_scan( \r\n',':datasource_name,:object_type,:database_name,:schema_name,:object_name,:column_name, \r\n',':warehouse_name, :record_limit, :initiate_scan, :datasource_tag) into\r\n',':call_result;\r\n',
'\r\ninsert into PANCAKE.CORE.SETUP_RESULTS values (:object_type,:object_name,:call_result);\r\n','\r\nEND;') as create_datasource_code_block
from <database_name>.information_schema.columns inner join <database_name>.information_schema.tables on columns.table_catalog = tables.table_catalog and
columns.table_schema = tables.table_schema and columns.table_name = tables.table_name inner join <database_name>.information_schema.databases on tables.table_catalog = databases.database_name
where data_type = 'VARIANT' and is_dynamic = 'NO';

--#2 - Grant the Select privilege to the database objects that Pancake will be allowed to scan.
--To perform the next set of actions please use the role that has access to all of the database objects in this database you wish to scan.

USE ROLE <database_access_role>;

/*
The following SQL statements are needed for each database, schema and database object you wish to grant access to Pancake to. You can use the pre-generated SQL statements found in the results of the Variant Column Query in Step #1 in the grant_statement_code_block. You can optionally download the results and open the csv file in Excel or upload to Google Sheets. Copy the code blocks for the database objects you want to scan and paste over the placeholder.

--#Execute the following SQL for each internal/external database that has database objects you wish to scan and discover.

--If your database was imported or from a share these next two statements are not required.
GRANT USAGE ON DATABASE <database_name> TO APPLICATION PANCAKE;
GRANT USAGE ON SCHEMA <database_name>.<schema_name> TO APPLICATION PANCAKE;

--#3 - Create the tempoary table to store the Pancake procedure call results

--To perform the next set of actions please use the role that was used to install the application or another role that has been granted access to the Pancake.

USE ROLE <application_install_role>;

--Execute the following SQL statement to create a temporary table used to store results that are created by Pancake Procedures which are executed in later steps (#9 and #15).

CREATE or replace TEMPORARY TABLE PANCAKE.CORE.SETUP_RESULTS (
object_type varchar,
object_name varchar,
call_result varchar
);

--paste the SQL grant statements from the Variant Column Query in Step #1 found in the "grant_statement_code_block" column.

#4 - Add data sources to Pancake.

The following section defines each of the components of the SQL Anonymous Code Block used to add database sources to Pancake.

DATA SOURCE NAME - a user defined name for the data source
OBJECT TYPE (can be a Table, View, Materialized View, or External Table)
DATABASE NAME
SCHEMA NAME
OBJECT NAME (the name of the table, view, materialized view, or external table)
OBJECT ATTRIBUTE NAME - the name of the VARIANT column
OBJECT UNIQUE ATTRIBUTE NAME - the name of the attribute at the root level that is unique across the entire dataset
WAREHOUSE NAME - the name of the warehouse created in the first step of this process
RECORD LIMIT - The record limit of the number of rows to scan (quick scan)
INITIATE SCAN - Whether or not to initiate the scan after it has been added to pancake. This quick scan will determine the records per minute metric to be used in creating a scan configuration to complete the full scan.

copy and paste the SQL code block from the Variant Column Query in Step #1 stored in the "create_datasource_code_block" column. To use the Variant Column Query in Step #1, you can optionally download the results and open the csv file in Excel or upload to Google Sheets. Copy the code blocks for the database objects you want to scan and paste into an editor. Remove any double quotes. Then copy and paste the code blocks over the placeholder text below.

After pasting the code blocks you can modify the unique_attribute_name which is required and datasource_tag which is optional.

To perform the next set of actions please use the role that was used to install the application or another role that has been granted access to the Pancake.

USE ROLE <application_install_role>;

--Show the results from all PANCAKE actions performed.
select * from PANCAKE.CORE.SETUP_RESULTS;