Selecting the correct Warehouse size in Snowflake can be tricky. If your queries are spilling over to the Remote Storage then you either need to re-write the query or increase the Warehouse size. But how do you identify these problematic queries that are running on undersized Warehouse? Identifying these queries is a Maximum Vector problem that be solved using Skyline queries. A Skyline query finds points in a vector space that dominate other points. Here is how a Skyline Query can be used to identify Snowflake queries that are problematic.
Skyline, also known as Pareto Frontier or Pareto Set in Business Management or Maxima in Computional Geometry, is a useful technique in multi-criteria analysis and decision support. Skyline aims to identify points in n-dimensions that are not dominated by any other point. In other words, a Skyline is formed using points that are best in at least one dimension and at least as good as other best points in every other dimension.
Skyline computation is an essential database operation that has many applications in multi-criteria decision making scenarios such as recommender systems. As such, Skyline queries have emerged as an increasingly popular tool for identifying a set of interesting objects that balance different user-specified criteria.
Skyline computation is an essential database operation that has many applications in multi-criteria decision making scenarios such as recommender systems. As such, Skyline queries have emerged as an increasingly popular tool for identifying a set of interesting objects that balance different user-specified criteria.
Sessionization or conventionally Gaps and Islands in Time-series Analysis are terms referring to the same problem of having to reset all parameters of a predictive or retrospective analysis in a time series when the time series has a gap.
Google recently added a new Table Function called RANGE_SESSIONIZE to Google BigQuery. RANGE_SESSIONIZE allows you to sessionize temporal data. Timestamps from events that either meet or overlap form one session.
RATIO_TO_REPORT SQL analytic function can be used to calculate percentage of the entire dataset or partitioned data. RATIO_TO_REPORT returns a value between 0 and 1 which indicates the weight of a numerical value in respect to the sum of all numerical values in the same partition. If multiplied by 100, the value of ratio_to_report can be interpreted as percentage.
RATIO_TO_REPORT is a powerful Snowflake Window function that lets you calculate the ratio of a value to the sum of the values in a window. The following query uses the RATIO_TO_REPORT function to calculate the percentage of the sales by channel (online vs. store) for each day.