<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Sparksql on ZRJ | 学习笔记</title>
        <link>https://blog.zrj.me/tags/sparksql/</link>
        <description>Recent content in Sparksql on ZRJ | 学习笔记</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>zh-CN</language>
        <lastBuildDate>Thu, 08 Nov 2018 15:23:18 +0800</lastBuildDate><atom:link href="https://blog.zrj.me/tags/sparksql/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>多维聚合函数</title>
        <link>https://blog.zrj.me/posts/2018-11-08-%E5%A4%9A%E7%BB%B4%E8%81%9A%E5%90%88%E5%87%BD%E6%95%B0/</link>
        <pubDate>Thu, 08 Nov 2018 15:23:18 +0800</pubDate>
        
        <guid>https://blog.zrj.me/posts/2018-11-08-%E5%A4%9A%E7%BB%B4%E8%81%9A%E5%90%88%E5%87%BD%E6%95%B0/</guid>
        <description>&lt;p&gt;使用 SQL 的时候，可以用多维聚合函数，有 grouping set, cube, roll up，具体区别可以参考这里 &lt;a class=&#34;link&#34; href=&#34;https://webcache.googleusercontent.com/search?q=cache:SVQdbpNdejYJ:https://blog.csdn.net/jiangshouzhuang/article/details/51057616&amp;#43;&amp;amp;cd=3&amp;amp;hl=zh-CN&amp;amp;ct=clnk&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://webcache.googleusercontent.com/search?q=cache:SVQdbpNdejYJ:https://blog.csdn.net/jiangshouzhuang/article/details/51057616+&amp;cd=3&amp;hl=zh-CN&amp;ct=clnk&lt;/a&gt; 这个原文挂了，要看 google cahche，摘录如下&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Hive分析窗口函数之GROUPING SETS，CUBE和ROLLUP 2016年04月04日 18:06:46 Free World 阅读数：2629 个人分类： Hive 所属专栏： Hive实战 版权声明：本文为博主原创文章，未经博主允许不得转载。 &lt;a class=&#34;link&#34; href=&#34;https://blog.csdn.net/jiangshouzhuang/article/details/51057616&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://blog.csdn.net/jiangshouzhuang/article/details/51057616&lt;/a&gt; 这几个分析函数通常用于OLAP中，不能累加，而且需要根据不同维度上钻和下钻的指标统计。&lt;/p&gt;
&lt;p&gt;环境信息：&lt;/p&gt;
&lt;p&gt;Hive版本为apache-hive-0.14.0-bin Hadoop版本为hadoop-2.6.0 Tez版本为tez-0.7.0&lt;/p&gt;
&lt;p&gt;数据：&lt;/p&gt;
&lt;p&gt;2016-03,2016-03-10,user1&lt;/p&gt;
&lt;p&gt;2016-03,2016-03-10,user5&lt;/p&gt;
&lt;p&gt;2016-03,2016-03-12,user7&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-12,user3&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-13,user2&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-13,user4&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-16,user4&lt;/p&gt;
&lt;p&gt;2016-03,2016-03-10,user2&lt;/p&gt;
&lt;p&gt;2016-03,2016-03-10,user3&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-12,user5&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-13,user6&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-15,user3&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-15,user2&lt;/p&gt;
&lt;p&gt;2016-04,2016-04-16,user1&lt;/p&gt;
&lt;p&gt;创建表：&lt;/p&gt;
&lt;p&gt;CREATE TABLE windows_gcr (&lt;/p&gt;
&lt;p&gt;op_month STRING,&lt;/p&gt;
&lt;p&gt;op_day STRING,&lt;/p&gt;
&lt;p&gt;userno STRING&lt;/p&gt;
&lt;p&gt;) ROW FORMAT DELIMITED&lt;/p&gt;
&lt;p&gt;FIELDS TERMINATED BY &amp;lsquo;,&amp;rsquo;&lt;/p&gt;
&lt;p&gt;stored as textfile;&lt;/p&gt;
&lt;p&gt;将数据导入Hive表中：&lt;/p&gt;
&lt;p&gt;load data local inpath &amp;lsquo;/home/hadoop/testhivedata/windows_gcr.txt&amp;rsquo; into table windows_gcr;&lt;/p&gt;
&lt;p&gt;1. GROUPING SETS&lt;/p&gt;
&lt;p&gt;在一个GROUP BY查询中，根据不同的维度组合进行聚合，等价于将不同维度的GROUP BY结果集进行UNION ALL&lt;/p&gt;
&lt;p&gt;按照op_month和op_day维度进行聚合:&lt;/p&gt;
&lt;p&gt;SELECT&lt;/p&gt;
&lt;p&gt;op_month,&lt;/p&gt;
&lt;p&gt;op_day,&lt;/p&gt;
&lt;p&gt;COUNT(DISTINCT userno) AS userno_count,&lt;/p&gt;
&lt;p&gt;GROUPING__ID&lt;/p&gt;
&lt;p&gt;FROM windows_gcr&lt;/p&gt;
&lt;p&gt;GROUP BY op_month,op_day&lt;/p&gt;
&lt;p&gt;GROUPING SETS (op_month,op_day)&lt;/p&gt;
&lt;p&gt;ORDER BY GROUPING__ID;&lt;/p&gt;
&lt;p&gt;结果：&lt;/p&gt;
&lt;p&gt;op_month op_day userno_count grouping__id&lt;/p&gt;
&lt;p&gt;2016-03 NULL 5 1&lt;/p&gt;
&lt;p&gt;2016-04 NULL 6 1&lt;/p&gt;
&lt;p&gt;NULL 2016-03-10 4 2&lt;/p&gt;
&lt;p&gt;NULL 2016-03-12 1 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-12 2 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-13 3 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-15 2 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-16 2 2&lt;/p&gt;
&lt;p&gt;其中的 GROUPING__ID，表示结果属于哪一个分组集合。&lt;/p&gt;
&lt;p&gt;等价于&lt;/p&gt;
&lt;p&gt;SELECT op_month,NULL asop_day,COUNT(DISTINCT userno) AS userno_count,1 AS GROUPING__ID FROM windows_gcr GROUP BY op_month&lt;/p&gt;
&lt;p&gt;UNION ALL&lt;/p&gt;
&lt;p&gt;SELECT NULL asop_month,op_day,COUNT(DISTINCT userno) AS userno_count,2 AS GROUPING__ID FROM windows_gcr GROUP BY op_day;&lt;/p&gt;
&lt;p&gt;再看一个例子：&lt;/p&gt;
&lt;p&gt;SELECT&lt;/p&gt;
&lt;p&gt;op_month,&lt;/p&gt;
&lt;p&gt;op_day,&lt;/p&gt;
&lt;p&gt;COUNT(DISTINCT userno) AS userno_count,&lt;/p&gt;
&lt;p&gt;GROUPING__ID&lt;/p&gt;
&lt;p&gt;FROM windows_gcr&lt;/p&gt;
&lt;p&gt;GROUP BY op_month,op_day&lt;/p&gt;
&lt;p&gt;GROUPING SETS(op_month,op_day,(op_month,op_day))&lt;/p&gt;
&lt;p&gt;ORDER BY GROUPING__ID;&lt;/p&gt;
&lt;p&gt;结果：&lt;/p&gt;
&lt;p&gt;op_month op_day userno_count grouping__id&lt;/p&gt;
&lt;p&gt;2016-04 NULL 6 1&lt;/p&gt;
&lt;p&gt;2016-03 NULL 5 1&lt;/p&gt;
&lt;p&gt;NULL 2016-03-12 1 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-12 2 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-13 3 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-15 2 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-16 2 2&lt;/p&gt;
&lt;p&gt;NULL 2016-03-10 4 2&lt;/p&gt;
&lt;p&gt;2016-03 2016-03-10 4 3&lt;/p&gt;
&lt;p&gt;2016-03 2016-03-12 1 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-12 2 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-13 3 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-15 2 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-16 2 3&lt;/p&gt;
&lt;p&gt;等价于&lt;/p&gt;
&lt;p&gt;SELECT op_month,NULL asop_day,COUNT(DISTINCT userno) AS userno_count,1 AS GROUPING__ID FROM windows_gcr GROUP BY op_month&lt;/p&gt;
&lt;p&gt;UNION ALL&lt;/p&gt;
&lt;p&gt;SELECT NULL asop_month,op_day,COUNT(DISTINCT userno) AS userno_count,2 AS GROUPING__ID FROM windows_gcr GROUP BY op_day&lt;/p&gt;
&lt;p&gt;UNION ALL&lt;/p&gt;
&lt;p&gt;SELECT op_month,op_day,COUNT(DISTINCTuserno) AS userno_count,3 AS GROUPING__ID FROM windows_gcr GROUP BY op_month,op_day;&lt;/p&gt;
&lt;p&gt;2. CUBE&lt;/p&gt;
&lt;p&gt;根据GROUP BY的维度的所有组合进行聚合。&lt;/p&gt;
&lt;p&gt;SELECT&lt;/p&gt;
&lt;p&gt;op_month,&lt;/p&gt;
&lt;p&gt;op_day,&lt;/p&gt;
&lt;p&gt;COUNT(DISTINCT userno) AS userno_count,&lt;/p&gt;
&lt;p&gt;GROUPING__ID&lt;/p&gt;
&lt;p&gt;FROM windows_gcr&lt;/p&gt;
&lt;p&gt;GROUP BY op_month,op_day&lt;/p&gt;
&lt;p&gt;WITH CUBE&lt;/p&gt;
&lt;p&gt;ORDER BY GROUPING__ID;&lt;/p&gt;
&lt;p&gt;结果：&lt;/p&gt;
&lt;p&gt;op_month op_day userno_count grouping__id&lt;/p&gt;
&lt;p&gt;NULL NULL 7 0&lt;/p&gt;
&lt;p&gt;2016-03 NULL 5 1&lt;/p&gt;
&lt;p&gt;2016-04 NULL 6 1&lt;/p&gt;
&lt;p&gt;NULL 2016-04-12 2 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-13 3 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-15 2 2&lt;/p&gt;
&lt;p&gt;NULL 2016-04-16 2 2&lt;/p&gt;
&lt;p&gt;NULL 2016-03-10 4 2&lt;/p&gt;
&lt;p&gt;NULL 2016-03-12 1 2&lt;/p&gt;
&lt;p&gt;2016-03 2016-03-10 4 3&lt;/p&gt;
&lt;p&gt;2016-03 2016-03-12 1 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-16 2 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-12 2 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-13 3 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-15 2 3&lt;/p&gt;
&lt;p&gt;等价于&lt;/p&gt;
&lt;p&gt;SELECT NULL as op_month,NULL asop_day,COUNT(DISTINCT userno) AS userno_count,0 AS GROUPING__ID FROM windows_gcr&lt;/p&gt;
&lt;p&gt;UNION ALL&lt;/p&gt;
&lt;p&gt;SELECT op_month,NULL asop_day,COUNT(DISTINCT userno) AS userno_count,1 AS GROUPING__ID FROM windows_gcr GROUP BY op_month&lt;/p&gt;
&lt;p&gt;UNION ALL&lt;/p&gt;
&lt;p&gt;SELECT NULL asop_month,op_day,COUNT(DISTINCT userno) AS userno_count,2 AS GROUPING__ID FROM windows_gcr GROUP BY op_day&lt;/p&gt;
&lt;p&gt;UNION ALL&lt;/p&gt;
&lt;p&gt;SELECT op_month,op_day,COUNT(DISTINCTuserno) AS userno_count,3 AS GROUPING__ID FROM windows_gcr GROUP BY op_month,op_day&lt;/p&gt;
&lt;p&gt;3. ROLLUP&lt;/p&gt;
&lt;p&gt;是CUBE的子集，以最左侧的维度为主，从该维度进行层级聚合。&lt;/p&gt;
&lt;p&gt;比如，以op_month维度进行层级聚合：&lt;/p&gt;
&lt;p&gt;SELECT&lt;/p&gt;
&lt;p&gt;op_month,&lt;/p&gt;
&lt;p&gt;op_day,&lt;/p&gt;
&lt;p&gt;COUNT(DISTINCT userno) AS userno_count,&lt;/p&gt;
&lt;p&gt;GROUPING__ID&lt;/p&gt;
&lt;p&gt;FROM windows_gcr&lt;/p&gt;
&lt;p&gt;GROUP BY op_month,op_day&lt;/p&gt;
&lt;p&gt;WITH ROLLUP&lt;/p&gt;
&lt;p&gt;ORDER BY GROUPING__ID;&lt;/p&gt;
&lt;p&gt;结果：&lt;/p&gt;
&lt;p&gt;op_month op_day userno_count grouping__id&lt;/p&gt;
&lt;p&gt;NULL NULL 7 0&lt;/p&gt;
&lt;p&gt;2016-03 NULL 5 1&lt;/p&gt;
&lt;p&gt;2016-04 NULL 6 1&lt;/p&gt;
&lt;p&gt;2016-03 2016-03-10 4 3&lt;/p&gt;
&lt;p&gt;2016-03 2016-03-12 1 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-12 2 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-13 3 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-15 2 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-16 2 3&lt;/p&gt;
&lt;p&gt;可以实现这样的上钻过程：&lt;/p&gt;
&lt;p&gt;月天的用户数-&amp;gt;月的用户数-&amp;gt;总用户数&lt;/p&gt;
&lt;p&gt;--把op_month和op_day调换顺序，则以op_day维度进行层级聚合：&lt;/p&gt;
&lt;p&gt;SELECT&lt;/p&gt;
&lt;p&gt;op_month,&lt;/p&gt;
&lt;p&gt;op_day,&lt;/p&gt;
&lt;p&gt;COUNT(DISTINCT userno) AS userno_count,&lt;/p&gt;
&lt;p&gt;GROUPING__ID&lt;/p&gt;
&lt;p&gt;FROM windows_gcr&lt;/p&gt;
&lt;p&gt;GROUP BY op_day,op_month&lt;/p&gt;
&lt;p&gt;WITH ROLLUP&lt;/p&gt;
&lt;p&gt;ORDER BY GROUPING__ID;&lt;/p&gt;
&lt;p&gt;结果：&lt;/p&gt;
&lt;p&gt;op_month op_day userno_count grouping__id&lt;/p&gt;
&lt;p&gt;NULL NULL 7 0&lt;/p&gt;
&lt;p&gt;NULL 2016-04-13 3 1&lt;/p&gt;
&lt;p&gt;NULL 2016-03-12 1 1&lt;/p&gt;
&lt;p&gt;NULL 2016-04-15 2 1&lt;/p&gt;
&lt;p&gt;NULL 2016-03-10 4 1&lt;/p&gt;
&lt;p&gt;NULL 2016-04-16 2 1&lt;/p&gt;
&lt;p&gt;NULL 2016-04-12 2 1&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-12 2 3&lt;/p&gt;
&lt;p&gt;2016-03 2016-03-10 4 3&lt;/p&gt;
&lt;p&gt;2016-03 2016-03-12 1 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-13 3 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-15 2 3&lt;/p&gt;
&lt;p&gt;2016-04 2016-04-16 2 3&lt;/p&gt;
&lt;p&gt;可以实现这样的上钻过程：&lt;/p&gt;
&lt;p&gt;天月的UV-&amp;gt;天的UV-&amp;gt;总UV&lt;/p&gt;
&lt;p&gt;具体可以访问Hive官网：&lt;/p&gt;
&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://cwiki.apache.org/confluence/display/Hive/Enhanced&amp;#43;Aggregation%2C&amp;#43;Cube%2C&amp;#43;Grouping&amp;#43;and&amp;#43;Rollup&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;另外还有这篇 &lt;a class=&#34;link&#34; href=&#34;https://blog.csdn.net/moon&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://blog.csdn.net/moon&lt;/a&gt;_yang_bj/article/details/17200367&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;GROUPING SETS GROUPING SETS作为GROUP BY的子句，允许开发人员在GROUP BY语句后面指定多个统计选项，可以简单理解为多条group by语句通过union all把查询结果聚合起来结合起来，下面是几个实例可以帮助我们了解，&lt;/p&gt;
&lt;p&gt;以acorn_3g.test_xinyan_reg为例：&lt;/p&gt;
&lt;p&gt;[dp@YZSJHL19-87 xjob]$ hive -e &amp;ldquo;use acorn_3g;desc test_xinyan_reg;&amp;rdquo; user_id                 bigint                  None                 device_id               int                     None   手机，平板              os_id                   int                     None   操作系统类型              app_id                  int                     None   手机app_id              client_version          string                  None   客户端版本              from_id                 int                     None  四级渠道 几个demo帮助大家了解：&lt;/p&gt;
&lt;p&gt;grouping sets语句 等价hive语句 select device_id,os_id,app_id,count(user_id) from  test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id))  SELECT device_id,null,null,count(user_id) FROM test_xinyan_reg group by device_id select device_id,os_id,app_id,count(user_id) from  test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id,os_id)) SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_id select device_id,os_id,app_id,count(user_id) from  test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id,os_id),(device_id)) SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_id  UNION ALL  SELECT device_id,null,null,count(user_id) FROM test_xinyan_reg group by device_id select device_id,os_id,app_id,count(user_id) from  test_xinyan_reg group by device_id,os_id,app_id grouping sets((device_id),(os_id),(device_id,os_id),()) SELECT device_id,null,null,count(user_id) FROM test_xinyan_reg group by device_id  UNION ALL  SELECT null,os_id,null,count(user_id) FROM test_xinyan_reg group by os_id  UNION ALL  SELECT device_id,os_id,null,count(user_id) FROM test_xinyan_reg group by device_id,os_id   UNION ALL  SELECT null,null,null,count(user_id) FROM test_xinyan_reg CUBE函数 cube简称数据魔方，可以实现hive多个任意维度的查询，cube(a,b,c)则首先会对(a,b,c)进行group by，然后依次是(a,b),(a,c),(a),(b,c),(b),(c),最后在对全表进行group by，他会统计所选列中值的所有组合的聚合&lt;/p&gt;
&lt;p&gt;select device_id,os_id,app_id,client_version,from_id,count(user_id)  from test_xinyan_reg  group by device_id,os_id,app_id,client_version,from_id with cube; 手工实现需要写的hql语句（写个程序自己生成的，手写累死）：&lt;/p&gt;
&lt;p&gt;SELECT device_id,null,null,null,null ,count(user_id) FROM test_xinyan_reg group by device_id UNION ALL SELECT null,os_id,null,null,null ,count(user_id) FROM test_xinyan_reg group by os_id UNION ALL SELECT device_id,os_id,null,null,null ,count(user_id) FROM test_xinyan_reg group by device_id,os_id UNION ALL SELECT null,null,app_id,null,null ,count(user_id) FROM test_xinyan_reg group by app_id UNION ALL SELECT device_id,null,app_id,null,null ,count(user_id) FROM test_xinyan_reg group by device_id,app_id UNION ALL SELECT null,os_id,app_id,null,null ,count(user_id) FROM test_xinyan_reg group by os_id,app_id UNION ALL SELECT device_id,os_id,app_id,null,null ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,app_id UNION ALL SELECT null,null,null,client_version,null ,count(user_id) FROM test_xinyan_reg group by client_version UNION ALL SELECT device_id,null,null,client_version,null ,count(user_id) FROM test_xinyan_reg group by device_id,client_version UNION ALL SELECT null,os_id,null,client_version,null ,count(user_id) FROM test_xinyan_reg group by os_id,client_version UNION ALL SELECT device_id,os_id,null,client_version,null ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,client_version UNION ALL SELECT null,null,app_id,client_version,null ,count(user_id) FROM test_xinyan_reg group by app_id,client_version UNION ALL SELECT device_id,null,app_id,client_version,null ,count(user_id) FROM test_xinyan_reg group by device_id,app_id,client_version UNION ALL SELECT null,os_id,app_id,client_version,null ,count(user_id) FROM test_xinyan_reg group by os_id,app_id,client_version UNION ALL SELECT device_id,os_id,app_id,client_version,null ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,app_id,client_version UNION ALL SELECT null,null,null,null,from_id ,count(user_id) FROM test_xinyan_reg group by from_id UNION ALL SELECT device_id,null,null,null,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,from_id UNION ALL SELECT null,os_id,null,null,from_id ,count(user_id) FROM test_xinyan_reg group by os_id,from_id UNION ALL SELECT device_id,os_id,null,null,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,from_id UNION ALL SELECT null,null,app_id,null,from_id ,count(user_id) FROM test_xinyan_reg group by app_id,from_id UNION ALL SELECT device_id,null,app_id,null,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,app_id,from_id UNION ALL SELECT null,os_id,app_id,null,from_id ,count(user_id) FROM test_xinyan_reg group by os_id,app_id,from_id UNION ALL SELECT device_id,os_id,app_id,null,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,app_id,from_id UNION ALL SELECT null,null,null,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by client_version,from_id UNION ALL SELECT device_id,null,null,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,client_version,from_id UNION ALL SELECT null,os_id,null,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by os_id,client_version,from_id UNION ALL SELECT device_id,os_id,null,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,client_version,from_id UNION ALL SELECT null,null,app_id,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by app_id,client_version,from_id UNION ALL SELECT device_id,null,app_id,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,app_id,client_version,from_id UNION ALL SELECT null,os_id,app_id,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by os_id,app_id,client_version,from_id UNION ALL SELECT device_id,os_id,app_id,client_version,from_id ,count(user_id) FROM test_xinyan_reg group by device_id,os_id,app_id,client_version,from_id UNION ALL SELECT null,null,null,null,null ,count(user_id) FROM test_xinyan_reg 看着很蛋疼是不是，体会到cube的强大了吗！(低版本hive可以通过union all方式解决，算是没有办法的办法)&lt;/p&gt;
&lt;p&gt;ROLL UP函数 rollup可以实现从右到做递减多级的统计，显示统计某一层次结构的聚合。&lt;/p&gt;
&lt;p&gt;select device_id,os_id,app_id,client_version,from_id,count(user_id)  from test_xinyan_reg  group by device_id,os_id,app_id,client_version,from_id with rollup; 等价以下sql语句：&lt;/p&gt;
&lt;p&gt;select device_id,os_id,app_id,client_version,from_id,count(user_id)  from test_xinyan_reg  group by device_id,os_id,app_id,client_version,from_id  grouping sets ((device_id,os_id,app_id,client_version,from_id),(device_id,os_id,app_id,client_version),(device_id,os_id,app_id),(device_id,os_id),(device_id),()); Grouping_ID函数 当我们没有统计某一列时，它的值显示为null，这可能与列本身就有null值冲突，这就需要一种方法区分是没有统计还是值本来就是null。（写一个排列组合的算法，就马上理解了，grouping_id其实就是所统计各列二进制和）&lt;/p&gt;
&lt;p&gt;直接拿官方文档一个例子，O(∩_∩)O哈哈~&lt;/p&gt;
&lt;p&gt;Column1 (key) Column2 (value) 1 NULL 1 1 2 2 3 3 3 NULL 4 5 hql统计：&lt;/p&gt;
&lt;p&gt; SELECT key, value, GROUPING__ID, count(*) from T1 GROUP BY key, value WITH ROLLUP 统计结果如下：&lt;/p&gt;
&lt;p&gt;        NULL NULL 0     00 6 1 NULL 1     10 2 1 NULL 3     11 1 1 1 3     11 1 2 NULL 1     10 1 2 2 3     11 1 3 NULL 1     10 2 3 NULL 3     11 1 3 3 3     11 1 4 NULL 1     10 1 4 5 3     11 1 GROUPING__ID转变为二进制，如果对应位上有值为null，说明这列本身值就是null。（通过类DataFilterNull.py 扫描，可以筛选过滤掉列中null、“”统计结果), &amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash;&amp;mdash; 作者：扫大街的程序员 来源：CSDN 原文：https://blog.csdn.net/moon_yang_bj/article/details/17200367 版权声明：本文为博主原创文章，转载请附上博文链接！&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;其实简单总结，就是 cube 最全，grouping set 可以自定义，而 rollup 就是逐级上卷&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
