Hive Sql从表中动态获取空列计数

我正在使用datastax spark集成和spark SQL thrift服务器,它为我提供了一个Hive SQL接口来查询Cassandra中的表.

我的数据库中的表是动态创建的,我想要做的是仅根据表名在表的每列中获取空值的计数.

我可以使用describe database.table获取列名,但在hive SQL中,如何在另一个为所有列计数null的select查询中使用其输出.

更新1:使用Dudu的解决方案回溯

Error running query: TExecuteStatementResp(status=TStatus(errorCode=0,
errorMessage=”org.apache.spark.sql.AnalysisException: Invalid usage of
‘*’ in explode/json_tuple/UDTF;”, sqlState=None,
infoMessages=[“org.apache.hive.service.cli.HiveSQLException:org.apache.spark.sql.AnalysisException:
Invalid usage of ‘
‘ in explode/json_tuple/UDTF;:16:15″,
‘org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute:SparkExecuteStatementOperation.scala:258′,
‘org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation:runInternal:SparkExecuteStatementOperation.scala:152’,
‘org.apache.hive.service.cli.operation.Operation:run:Operation.java:257′,
‘org.apache.hive.service.cli.session.HiveSessionImpl:executeStatementInternal:HiveSessionImpl.java:388’,
‘org.apache.hive.service.cli.session.HiveSessionImpl:executeStatement:HiveSessionImpl.java:369’,
‘org.apache.hive.service.cli.CLIService:executeStatement:CLIService.java:262’,
‘org.apache.hive.service.cli.thrift.ThriftCLIService:ExecuteStatement:ThriftCLIService.java:437’,
‘org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1313’,
‘org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement:getResult:TCLIService.java:1298’,
‘org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39’,
‘org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39’,
‘org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56’,
‘org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:286’,
‘java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1142’,
‘java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:617’,
‘java.lang.Thread:run:Thread.java:745’], statusCode=3),
operationHandle=None)

在以下解决方案中,不需要单独处理每个列.
结果是列索引和该列中的空值数.
您可以稍后通过列索引将其连接到从Metastore检索到的信息.
一个限制是包含确切文本null的字符串将被计为空值.

演示

CTE(由mytable定义的mytable)显然可以替换为实际表

with        mytable as 
            (
                select  stack
                        (
                            5

                           ,1   ,1.2     ,date '2017-06-21'     ,null
                           ,2   ,2.3     ,null                  ,null
                           ,3   ,null    ,null                  ,'hello'
                           ,4   ,4.5     ,null                  ,'world'
                           ,5   ,null    ,date '2017-07-22'     ,null
                        ) as (id,amt,dt,txt)
            )

select      pe.pos                                          as col_index
           ,count(case when pe.val='null' then 1 end)       as nulls_count

from        mytable t lateral view posexplode (split(printf(concat('%s',repeat('\u0001%s',field(unhex(1),t.*,unhex(1))-2)),t.*),'\\x01')) pe

group by    pe.pos       
;
+-----------+-------------+
| col_index | nulls_count |
+-----------+-------------+
|         0 |           0 |
|         1 |           2 |
|         2 |           3 |
|         3 |           3 |
+-----------+-------------+
相关文章
相关标签/搜索