Apache kylin: Würfelerzeugung scheitert bei Schritt 5 - KeyValue Größe zu groß

Ich begann Apache Kylin (Version 1.5.3) zu verwenden. Beim Erstellen eines Cubes erhalte ich einen Fehler bei Schritt 5 'Save Cuboid Statistics'. Das Protokoll sagt:Apache kylin: Würfelerzeugung scheitert bei Schritt 5 - KeyValue Größe zu groß

java.lang.IllegalArgumentException: KeyValue size too large 
at org.apache.hadoop.hbase.client.HTable.validatePut(HTable.java:1521) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.validatePut(BufferedMutatorImpl.java:147) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.doMutate(BufferedMutatorImpl.java:134) 
at org.apache.hadoop.hbase.client.BufferedMutatorImpl.mutate(BufferedMutatorImpl.java:98) 
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1038) 
at org.apache.kylin.storage.hbase.HBaseResourceStore.putResourceImpl(HBaseResourceStore.java:242) 
at org.apache.kylin.common.persistence.ResourceStore.putResource(ResourceStore.java:208) 
at org.apache.kylin.engine.mr.steps.SaveStatisticsStep.doWork(SaveStatisticsStep.java:113) 
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112) 
at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:57) 
at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:112) 
at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:127) 
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
at java.lang.Thread.run(Thread.java:745)

Zuerst habe ich versucht, den gleichen Würfel mit weniger Dimension zu erstellen, und es funktioniert. Erstellen eines Antheres Cube mit den Out-Dimensionen funktioniert auch. Aber wenn ich versuche, einen Würfel mit all diesen (13) Dimensionen zu erstellen, scheitert es. Ich müde auch hbase.client.keyvalue.maxsize auf 0 setzen, um die Prüfung zu deaktivieren. Immer noch der gleiche Fehler.

Weiß jemand, was das Problem ist und wie ich es lösen kann?

Ich benutze Kylin auf der Sandbox HDP 2.4 nebenbei.

Vielen Dank für Hilfe im Voraus

Søren

Quelle

2016-08-09 Søren

Was ist die "hbase.client.keyvalue.maxsize" in Ihrer hbase-Konfiguration? –

"hbase.client.keyvalue.maxsize" wird auf 0 atm gesetzt. Normalerweise sollte der Check deaktiviert sein. –

Versuchen Sie es mit kylin.hbase.client.keyvalue.maxsize = 1048576 –

Stellen Sie sicher, dass Wert von kylin.hbase.client.keyvalue.maxsize (die in kylin config-Datei befindet - conf/kylin.properteis) und hbase.client .keyvalue.maxsize (das sich in der hbase-Konfigurationsdatei befindet) ist identisch. Normalerweise bekommen wir Schlüsselwert zu groß Fehler, wenn der Wert von kylin.hbase.client.keyvalue.maxsize größer als hbase.client.keyvalue.maxsize

Nachstehend finden Sie die Probe kylin Eigenschaften

# kylin server's mode 
kylin.server.mode=all 

# optional information for the owner of kylin platform, it can be your team's email 
# currently it will be attached to each kylin's htable attribute 
[email protected] 

# List of web servers in use, this enables one web server instance to sync up with other servers. 
kylin.rest.servers=localhost:7070 

# The metadata store in hbase 
[email protected] 

# The storage for final cube file in hbase 
kylin.storage.url=hbase 

# Temp folder in hdfs, make sure user has the right access to the hdfs directory 
kylin.hdfs.working.dir=/kylin 

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020 
# leave empty if hbase running on same cluster with hive and mapreduce 
kylin.hbase.cluster.fs= 

kylin.job.mapreduce.default.reduce.input.mb=500 

# max job retry on error, default 0: no retry 
kylin.job.retry=0 

# If true, job engine will not assume that hadoop CLI reside on the same server as it self 
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password 
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine 
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands) 
kylin.job.run.as.remote.cmd=false 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.hostname= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.username= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.password= 

# Used by test cases to prepare synthetic data for sample cube 
kylin.job.remote.cli.working.dir=/tmp/kylin 

# Max count of concurrent jobs running 
kylin.job.concurrent.max.limit=10 

# Time interval to check hadoop job status 
kylin.job.yarn.app.rest.check.interval.seconds=10 

# Hive database name for putting the intermediate flat tables 
kylin.job.hive.database.for.intermediatetable=default 

#default compression codec for htable,snappy,lzo,gzip,lz4 
kylin.hbase.default.compression.codec=snappy 

#the percentage of the sampling, default 100% 
kylin.job.cubing.inmem.sampling.percent=100 

# The cut size for hbase region, in GB. 
kylin.hbase.region.cut=5 

# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster 
# set 0 to disable this optimization 
kylin.hbase.hfile.size.gb=2 

# Enable/disable ACL check for cube query 
kylin.query.security.enabled=true 

# whether get job status from resource manager with kerberos authentication 
kylin.job.status.with.kerberos=false 


## kylin security configurations 

# spring security profile, options: testing, ldap, saml 
# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login 
kylin.security.profile=testing 

# default roles and admin roles in LDAP, for ldap and saml 
acl.defaultRole=ROLE_ANALYST,ROLE_MODELER 
acl.adminRole=ROLE_ADMIN 

#LDAP authentication configuration 
ldap.server=ldap://ldap_server:389 
ldap.username= 
ldap.password= 

#LDAP user account directory; 
ldap.user.searchBase= 
ldap.user.searchPattern= 
ldap.user.groupSearchBase= 

#LDAP service account directory 
ldap.service.searchBase= 
ldap.service.searchPattern= 
ldap.service.groupSearchBase= 

#SAML configurations for SSO 
# SAML IDP metadata file location 
saml.metadata.file=classpath:sso_metadata.xml 
saml.metadata.entityBaseURL=https://hostname/kylin 
saml.context.scheme=https 
saml.context.serverName=hostname 
saml.context.serverPort=443 
saml.context.contextPath=/kylin 


ganglia.group= 
ganglia.port=8664 

## Config for mail service 

# If true, will send email notification; 
mail.enabled=false 
mail.host= 
mail.username= 
mail.password= 
mail.sender= 

###########################config info for web####################### 

#help info ,format{name|displayName|link} ,optional 
kylin.web.help.length=4 
kylin.web.help.0=start|Getting Started| 
kylin.web.help.1=odbc|ODBC Driver| 
kylin.web.help.2=tableau|Tableau Guide| 
kylin.web.help.3=onboard|Cube Design Tutorial| 

#guide user how to build streaming cube 
kylin.web.streaming.guide=http://kylin.apache.org/ 

#hadoop url link ,optional 
kylin.web.hadoop= 
#job diagnostic url link ,optional 
kylin.web.diagnostic= 
#contact mail on web page ,optional 
kylin.web.contact_mail= 

###########################config info for front####################### 

#env DEV|QA|PROD 
deploy.env=QA 

###########################deprecated configs####################### 
kylin.sandbox=true 
kylin.web.hive.limit=20 
# The cut size for hbase region, 
#in GB. 
# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default 
kylin.hbase.region.cut.small=5 
kylin.hbase.region.cut.medium=10 
kylin.hbase.region.cut.large=50 
kylin.hbase.client.keyvalue.maxsize=1048576

Innen Eigenschaften set kylin.hbase.client.keyvalue.maxsize = 1048576

Quelle

2016-08-10 21:09:09

@ Nithin K Anil

Kann nicht kylin.hbase.client.keyvalue.maxsize in kylin.properties finden. Kylin.properties sieht wie folgt aus:

> [[email protected] conf]# cat kylin.properties 
# 
# Licensed to the Apache Software Foundation (ASF) under one or more 
# contributor license agreements. See the NOTICE file distributed with 
# this work for additional information regarding copyright ownership. 
# The ASF licenses this file to You under the Apache License, Version 2.0 
# (the "License"); you may not use this file except in compliance with 
# the License. You may obtain a copy of the License at 
# 
# http://www.apache.org/licenses/LICENSE-2.0 
# 
# Unless required by applicable law or agreed to in writing, software 
# distributed under the License is distributed on an "AS IS" BASIS, 
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 
# See the License for the specific language governing permissions and 
# limitations under the License. 
# 

# kylin server's mode 
kylin.server.mode=all 

# optional information for the owner of kylin platform, it can be your team's email 
# currently it will be attached to each kylin's htable attribute 
[email protected] 

# List of web servers in use, this enables one web server instance to sync up with other servers. 
kylin.rest.servers=localhost:7070 

# The metadata store in hbase 
[email protected] 

# The storage for final cube file in hbase 
kylin.storage.url=hbase 

# Temp folder in hdfs, make sure user has the right access to the hdfs directory 
kylin.hdfs.working.dir=/kylin 

# HBase Cluster FileSystem, which serving hbase, format as hdfs://hbase-cluster:8020 
# leave empty if hbase running on same cluster with hive and mapreduce 
kylin.hbase.cluster.fs= 

kylin.job.mapreduce.default.reduce.input.mb=500 

# max job retry on error, default 0: no retry 
kylin.job.retry=0 

# If true, job engine will not assume that hadoop CLI reside on the same server as it self 
# you will have to specify kylin.job.remote.cli.hostname, kylin.job.remote.cli.username and kylin.job.remote.cli.password 
# It should not be set to "true" unless you're NOT running Kylin.sh on a hadoop client machine 
# (Thus kylin instance has to ssh to another real hadoop client machine to execute hbase,hive,hadoop commands) 
kylin.job.run.as.remote.cmd=false 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.hostname= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.username= 

# Only necessary when kylin.job.run.as.remote.cmd=true 
kylin.job.remote.cli.password= 

# Used by test cases to prepare synthetic data for sample cube 
kylin.job.remote.cli.working.dir=/tmp/kylin 

# Max count of concurrent jobs running 
kylin.job.concurrent.max.limit=10 

# Time interval to check hadoop job status 
kylin.job.yarn.app.rest.check.interval.seconds=10 

# Hive database name for putting the intermediate flat tables 
kylin.job.hive.database.for.intermediatetable=default 

#default compression codec for htable,snappy,lzo,gzip,lz4 
kylin.hbase.default.compression.codec=snappy 

#the percentage of the sampling, default 100% 
kylin.job.cubing.inmem.sampling.percent=100 

# The cut size for hbase region, in GB. 
kylin.hbase.region.cut=5 

# The hfile size of GB, smaller hfile leading to the converting hfile MR has more reducers and be faster 
# set 0 to disable this optimization 
kylin.hbase.hfile.size.gb=2 

# Enable/disable ACL check for cube query 
kylin.query.security.enabled=true 

# whether get job status from resource manager with kerberos authentication 
kylin.job.status.with.kerberos=false 


## kylin security configurations 

# spring security profile, options: testing, ldap, saml 
# with "testing" profile, user can use pre-defined name/pwd like KYLIN/ADMIN to login 
kylin.security.profile=testing 

# default roles and admin roles in LDAP, for ldap and saml 
acl.defaultRole=ROLE_ANALYST,ROLE_MODELER 
acl.adminRole=ROLE_ADMIN 

#LDAP authentication configuration 
ldap.server=ldap://ldap_server:389 
ldap.username= 
ldap.password= 

#LDAP user account directory; 
ldap.user.searchBase= 
ldap.user.searchPattern= 
ldap.user.groupSearchBase= 

#LDAP service account directory 
ldap.service.searchBase= 
ldap.service.searchPattern= 
ldap.service.groupSearchBase= 

#SAML configurations for SSO 
# SAML IDP metadata file location 
saml.metadata.file=classpath:sso_metadata.xml 
saml.metadata.entityBaseURL=https://hostname/kylin 
saml.context.scheme=https 
saml.context.serverName=hostname 
saml.context.serverPort=443 
saml.context.contextPath=/kylin 


ganglia.group= 
ganglia.port=8664 

## Config for mail service 

# If true, will send email notification; 
mail.enabled=false 
mail.host= 
mail.username= 
mail.password= 
mail.sender= 

###########################config info for web####################### 

#help info ,format{name|displayName|link} ,optional 
kylin.web.help.length=4 
kylin.web.help.0=start|Getting Started| 
kylin.web.help.1=odbc|ODBC Driver| 
kylin.web.help.2=tableau|Tableau Guide| 
kylin.web.help.3=onboard|Cube Design Tutorial| 

#guide user how to build streaming cube 
kylin.web.streaming.guide=http://kylin.apache.org/ 

#hadoop url link ,optional 
kylin.web.hadoop= 
#job diagnostic url link ,optional 
kylin.web.diagnostic= 
#contact mail on web page ,optional 
kylin.web.contact_mail= 

###########################config info for front####################### 

#env DEV|QA|PROD 
deploy.env=QA 

###########################deprecated configs####################### 
kylin.sandbox=true 
kylin.web.hive.limit=20 
# The cut size for hbase region, 
#in GB. 
# E.g, for cube whose capacity be marked as "SMALL", split region per 5GB by default 
kylin.hbase.region.cut.small=5 
kylin.hbase.region.cut.medium=10 
kylin.hbase.region.cut.large=50

Quelle

2016-08-11 09:56:49

Setzen Sie kylin.hbase.client.keyvalue.maxsize = 1048576 in die Eigenschaftendatei –

Wir Schlüsselgrenzen bei Splice-Maschine vor, wie gut getroffen haben ...

Auch von der KeyValue erinnern spec der Schlüssel erforderlich ist, in einer kurzen zu passen. KeyValue # getRowOffset()

Quelle

2016-08-11 19:43:59

Apache kylin: Würfelerzeugung scheitert bei Schritt 5 - KeyValue Größe zu groß

Antwort

Verwandte Themen