carbondata使用的问题（类似hive的列式数据库）

使用spark-shell加载数据

carbon.sql("load data inpath '/test/part-r-00023.gz' into table log_t options('DELIMITER'=',','MULTILINE'='true','SKIP_EMPTY_LINE'='TRUE','SINGLE_PASS'='TRUE','header'='false','fileheader'='ip,endtime,tingliu_time,deal_ip,jsonstring')")

报如下的错误，说超过最大字符串
org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: Length of parsed input (32001) exceeds the maximum number of characters defined in your parser settings (32000).

不可替代 · 发表于 2018-4-8 18:09:19

##limit 理解：
sql( s""" | CREATE TABLE boolean_table( | intField INT, | booleanField BOOLEAN, | stringField STRING, | doubleField DOUBLE, | booleanField2 BOOLEAN | ) | STORED BY 'carbondata' | TBLPROPERTIES('sort_columns'='booleanField,booleanField2') """.stripMargin) for (i <-0 until 100){ sql("select * from boolean_table limit 3").show() }limit是按blocklet来进行操作的，page默认32000行数据，blocklet默认64M，但是只会写到80-90%左右，然后就写另外一个blocklet
所以当数据文件大于50M左右后，select加limit出来的数据就有可能不一致

liuyou2036 · 发表于 2020-7-17 10:23:47

carbondata

图文精华

carbondata使用的问题（类似hive的列式数据库）

已有(2)人评论

推荐 /2