分享

nutch-2.3.1+solr-4.10.4报错,大神请进!(在nutch板块无权发帖,望版主移下谢谢了)

Jeelon 发表于 2016-8-3 10:18:09 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 1 6487
网上看到类似报错的提问到是存在,但是未得到解决,google了很多次依然未发现解决方案故此提问,望大神光临!
首先介绍下环境:hadoop-2.5.1+hbase-0.98+nutch-2.3.1+solr-4.10.4
抓取网页正常,抓取完成(查询hbase能查到数据)建立索引时候报错如下:
2016-08-02 00:22:48,533 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1048576002016-08-02 00:22:48,533 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 26214396; length = 65536002016-08-02 00:22:48,681 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output2016-08-02 00:22:48,929 INFO [main] org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded & initialized native-zlib library2016-08-02 00:22:48,968 INFO [main] org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor [.deflate]2016-08-02 00:22:49,101 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NullPointerException   
at org.apache.hadoop.io.Text.encode(Text.java:443)   
at org.apache.hadoop.io.Text.set(Text.java:198)   
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecordReader.nextKeyValue(SolrDeleteDuplicates.java:234)   
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)   
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)   
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)   
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)   
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)   
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)   
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)   
at java.security.AccessController.doPrivileged(Native Method)   
at javax.security.auth.Subject.doAs(Subject.java:415)   
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)   
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 2016-08-02 00:22:49,130 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task2016-08-02 00:22:49,257 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping MapTask metrics system...2016-08-02 00:22:49,258 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system stopped.2016-08-02 00:22:49,260 INFO [main] org.apache.hadoop.metrics2.impl.MetricsSystemImpl: MapTask metrics system shutdown complete.
于是找到源码对应行加上try catch 如下:
@Override    public boolean nextKeyValue() throws IOException, InterruptedException {      
      if (currentDoc >= numDocs) {        
             return false;      
      }      
      SolrDocument doc = solrDocs.get(currentDoc);      
      String digest = (String) doc.getFieldValue(SolrConstants.DIGEST_FIELD);      
      //此处加入trycatch语句     
      try{        
             text.set(digest);     
      }catch(Exception e){        
            System.out.println("**********************************");     
      }      
     record.readSolrDocument(doc);      
     currentDoc++;     
     return true;   
      }   
};
在次启动建立索引依旧报错空指针异常如下:
16/08/02 01:50:43 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: starting...16/08/02 01:50:43 INFO solr.SolrDeleteDuplicates: SolrDeleteDuplicates: Solr url: http://192.168.159.120:8983/solr/16/08/02 01:50:48 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803216/08/02 01:51:27 INFO mapreduce.JobSubmitter: number of splits:216/08/02 01:51:28 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative16/08/02 01:51:28 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces16/08/02 01:51:28 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative16/08/02 01:51:28 INFO Configuration.deprecation: mapred.compress.map.output is deprecated. Instead, use mapreduce.map.output.compress16/08/02 01:51:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1470034090238_003216/08/02 01:51:48 INFO impl.YarnClientImpl: Submitted application application_1470034090238_003216/08/02 01:51:51 INFO mapreduce.Job: The url to track the job: http://CentOS641:8088/proxy/application_1470034090238_0032/16/08/02 01:51:51 INFO mapreduce.Job: Running job: job_1470034090238_003216/08/02 01:54:10 INFO mapreduce.Job: Job job_1470034090238_0032 running in uber mode : false16/08/02 01:54:10 INFO mapreduce.Job:  map 0% reduce 0%16/08/02 01:55:34 INFO mapreduce.Job:  map 50% reduce 0%16/08/02 01:55:34 INFO mapreduce.Job: Task Id : attempt_1470034090238_0032_m_000000_0, Status : FAILEDError: java.lang.NullPointerException      
at org.apache.hadoop.io.Text.encode(Text.java:443)        
at org.apache.hadoop.io.Text.encode(Text.java:424)        
at org.apache.hadoop.io.Text.writeString(Text.java:473)        
at org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrRecord.write(SolrDeleteDuplicates.java:140)        
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)        
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)        
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1134)        
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)        
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)        
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)        
at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124)        
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)        
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)        
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)      
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)        
at java.security.AccessController.doPrivileged(Native Method)        
at javax.security.auth.Subject.doAs(Subject.java:415)        
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)        
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Container killed by the ApplicationMaster.

感激不尽!!!!


已有(1)人评论

跳转到指定楼层
tntzbzc 发表于 2016-8-3 14:38:14

回帖奖励 +50 云币

@Override    public boolean nextKeyValue() throws IOException, InterruptedException {      
      if (currentDoc >= numDocs) {        
             return false;      
      }      
      SolrDocument doc = solrDocs.get(currentDoc);      
      String digest = (String) doc.getFieldValue(SolrConstants.DIGEST_FIELD);      
      //此处加入trycatch语句     
      try{        
             text.set(digest);     
      }catch(Exception e){        
            System.out.println("**********************************");     
      }      
     record.readSolrDocument(doc);      
     currentDoc++;     
     return true;   
      }   
};
说明错误不在此处,这是只是Java转换为hadoop。可以整个捕获函数体异常

回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条