Wie kann ich einen Teil der Zeilen auswählen und eine neue Tabelle in HBase erstellen?

Ich habe eine große Tabelle in HBase, ich möchte sie in mehrere kleine Tabellen trennen, so dass es für mich einfacher zu verwenden ist. (Die ursprüngliche Tabelle sollte beibehalten werden.) Wie kann ich das tun?Wie kann ich einen Teil der Zeilen auswählen und eine neue Tabelle in HBase erstellen?

Zum Beispiel: Ich habe eine Tabelle, die so genannte all mit folgenden RowKey:

animal-1, ... 
plant-1, ... 
animal-2, ... 
plant-2, ... 
human-1, ... 
human-2, ...

ich möchte es drei Tabellen trennen: animal, plant, human für die drei Arten von Lebewesen. Wie kann ich es tun?

Quelle

2016-06-11 xirururu

MultipleTableOutputFormat können diesem Zweck seve. Bitte sehen Sie meine Antwort. –

Sie können Mapreduce mit MultipleTableOutputFormat wie im folgenden Beispiel verwenden.

Aber im folgenden Beispiel, das ich aus der Datei also lese TextInputFormat stattdessen haben Sie es aus Hbase Tabelle mit TableInputFormat'all' statt tabelle1 table2 zu lesen ... Sie 'animal', 'planet', 'human'

Wie pro Ihre Anforderung verwenden, Wenn Sie einen Scan in der Hbase-Tabelle durchführen und ihn mit der Tabelle InputFormat an Mapper übergeben, erhalten Sie auch einen Zeilenschlüssel für Mappers map-Methode. Dies müssen Sie vergleichen, um zu entscheiden, welche Tabelle Sie einfügen möchten.

Please see 7.2.2. HBase MapReduce Read/Write Example

package mapred; 
import java.io.IOException; 
import org.apache.hadoop.hbase.io.ImmutableBytesWritable; 
import org.apache.hadoop.hbase.mapreduce.MultiTableOutputFormat; 
import org.apache.hadoop.hbase.util.Bytes; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.conf.Configuration; 
import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.mapreduce.Job; 
import org.apache.hadoop.mapreduce.Mapper; 
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; 
import org.apache.hadoop.hbase.client.Put; 
public class MultiTableMapper { 
static class InnerMapper extends Mapper <LongWritable, Text, ImmutableBytesWritable, Put> { 
public void map(LongWritable offset, Text value, Context context) throws IOException { 
// contains the line of tab separated data we are working on (needs to be parsed out). 
//byte[] lineBytes = value.getBytes(); 
String valuestring[]=value.toString().split(“\t”); 
String rowid = /*HBaseManager.generateID();*/ “12345”; 
// rowKey is the hbase rowKey generated from lineBytes 
Put put = new Put(rowid.getBytes()); 
put.add(Bytes.toBytes(“UserInfo”), Bytes.toBytes(“StudentName”), Bytes.toBytes(valuestring[0])); 
try { 
context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table1”)), put); 
} catch (InterruptedException e) { 
// TODO Auto-generated catch block 
e.printStackTrace(); 
} // write to the actions table 
// rowKey2 is the hbase rowKey 
Put put1 = new Put(rowid.getBytes()); 
put1.add(Bytes.toBytes(“MarksInfo”),Bytes.toBytes(“Marks”),Bytes.toBytes(valuestring[1])); 
// Create your KeyValue object 
//put.add(kv); 
try { 
context.write(new ImmutableBytesWritable(Bytes.toBytes(“Table2”)), put1); 
} catch (InterruptedException e) { 
// TODO Auto-generated catch block 
e.printStackTrace(); 
} // write to the actions table 
} 
} 
public static void createSubmittableJob() throws IOException, ClassNotFoundException, InterruptedException { 
Path inputDir = new Path(“in”); 
Configuration conf = /*HBaseManager.getHBConnection();*/ new Configuration(); 
Job job = new Job(conf, “my_custom_job”); 
job.setJarByClass(InnerMapper.class); 
FileInputFormat.setInputPaths(job, inputDir); 
job.setMapperClass(InnerMapper.class); 
job.setInputFormatClass(TextInputFormat.class); 
// this is the key to writing to multiple tables in hbase 
job.setOutputFormatClass(MultiTableOutputFormat.class); 
//job.setNumReduceTasks(0); 
//TableMapReduceUtil.addDependencyJars(job); 
//TableMapReduceUtil.addDependencyJars(job.getConfiguration()); 
System.out.println(job.waitForCompletion(true)); 
} 
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { 
// TODO Auto-generated method stub 
MultiTableMapper.createSubmittableJob(); 
System.out.println(); 
} 
}

Quelle

2016-06-11 15:20:17

Wie kann ich einen Teil der Zeilen auswählen und eine neue Tabelle in HBase erstellen?

Antwort

Verwandte Themen