about云2016年每周经典回顾汇总【第三篇】

本帖最后由 pig2 于 2016-8-28 18:55 编辑

about云每周经典回顾汇总

about云2015年每周经典回顾汇【第二篇】

2016年about云08月第04周经典帖子总结

Redis遇到（大数据量）百亿级Key存储需求及解决方案介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19603
1.需求背景是什么？2.存储何种数据？
3.数据特点是什么？
4.存在哪些技术挑战？
5.解决方案有哪些？
6.md5散列桶的方法需要注意的问题？
7.测试结果是什么？

Spark连接到MySQL并执行查询为什么速度会快？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19617
1. Spark为什么能提高Mysql的查询速度？
2. 如何运行SQL in Spark？
3. SparkSQL如何将查询推送到MySQL？
4. 如何使用Spark缓存查询数据？5. 如何使用 Spark 和 Percona XtraDB Cluster？
6. Spark表分区时需要注意的事项？
7. Spark表现不好的时候？

OpenStack Mitaka热迁移分析（一）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19615
1.热迁移是什么，它的过程是怎么样的？
2.热迁移和冷迁移的区别？
3.现阶段热迁移的问题（BUG）？
4.热迁移之后会怎么优化？

hive入门总结
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19599

hive分区知识整理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19596
1.hive如何创建表和分区？
2.如何加载数据？
3.hive默认是静态分区，还是动态分区？
4.动态分区如何实现？

大数据架构在携程的实践及相关案例介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19584
1、携程大数据如何实现高并发应用？
2、如何设计推荐系统架构？
3、如何实现在线实时计算？

机器学习入门篇一
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19583

深度学习将会变革NLP中的中文分词
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19578
1.区分中文分词的方法有哪些？
2.什么是HMM?
3.深度学习有哪些种不同类型的网络？

hadoop使用PathFilter遇到的一些问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19573

1.hadoop在分析数据时怎么过滤文件？

2.hadoop过滤文件的输入路径配置需要注意什么？

3.过滤文件输入路径支持正则表达式吗？

资源：
OpenStack网络配置及管理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19574

eclipse为何可以直接运行mapreduce程序
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19601

unitedstack 私有云解决方案
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19602

以混合存储模型实现云计算平台对电信海量数据的处理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19620

恒天云私有云建设方案
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19619

hive基础之mysql 5.6 从零开始学
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19609

云存储-设计
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19589

Hadoop中TeraSort算法分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19588

云平台样题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19575

问答：

java操作hbase报错
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19614

[疑问]分布式环境下hive的外部表是如何存储的？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19613

[疑问]Hive中分区和分桶查询时，有什么区别？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19612

mysql 5.6 从零开始学
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19598

面试过程被问到实时流处理的问题，高手请进！
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19595

hive2.0.0版本安装后运行问题，求大神解决
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19594

spark可以直接升级2.0吗
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19593

动态资源池获取不到资源
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19591

hadoop2.7.2+hive1.2+hbase0.98，hive插入数据报错
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19585

2016年about云08月第03周经典帖子总结

金融大数据架构概述与应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19521
1、如何设计金融大数据架构？
2、IBM如何看待未来大数据趋势？
3、架构设计容易忽略的细节有哪些？

Spark 2.0 Structured Streaming 分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19520
1、Spark 1.0和Spark 2.0中Spark Streaming有什么不同？
2、Structured Streaming是什么？

Redis数据“丢失”问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19525
1.如何进行Redis”数据丢失“的故障排查？
2.数据丢失的影响是什么？
3.常见Redis数据丢失的情况都有哪些？

携程实时大数据平台演进：1/3 Storm应用已迁到JStorm
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19526
1.为什么要做实时数据平台？
2.需要怎样的实时数据平台？
3.如何实现？
4.哪些曾经踩过的坑？
5.哪些是新的探索？
6.都有哪些未来的方向？

OpenStack计费Billing功能前瞻（一）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19558
1.openstack现阶段计费项目情况？
2.需要怎样的环境实现？
3.在没有cloudkitty的情况下如何实现？
4.最终的流程图是什么？

Zookeeper的功能以及工作原理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19566

1.ZooKeeper是什么？

2.ZooKeeper提供了什么？

3.Zookeeper做了什么？

数据分析师的能力和工具体系
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19565

1.数据分析师需要哪些能力？

2.数据分析师需要掌握哪些工具？

Nutch 2.3.1 Hbase Hadoop Solr 整合（三）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19508
1.Hbase如何查询表？
2.Hadoop如何配置？
3.如何实现SSH免密码登录？

Nutch 2.3.1 Hbase Hadoop Solr 整合（四）Nutch安装
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19509
1.Nutch如何配置与编译？
2.启动Nutch需要做哪些准备？
3.你认为整合Nutch需要哪些工作？

遗传算法一个模拟自然进化过程的启发式搜索算法
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19564
1.什么是遗传算法？
2.演化迭代的方式有哪两种？
3.在遗传算法中，将染色体称为个体，常见的基因编码方式有哪三种？

程序员：内向群体剖析，有什么优势和劣势
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19542

深度学习与自然语言处理(5)_斯坦福cs224d 大作业测验2与解答
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19552

资源：
Lucene视频教程_视频
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19518

雷欣--一个Google系的创业公司如何在中国做人工智能
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19519

维度建模指南by_Z.RaiNy
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19532

OpenStack有关书籍（大部分都有）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19562

IBM云时代的安全管理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19551

云应用系统中角色访问控制管理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19550

Cloudera Impala【英文书籍】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19511

问答：
hive数据存储与元数据的疑惑
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19515

求教，cloudera安装集群启动hbase master失败
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19556

CDH下，常用命令的存储位置在哪儿？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19555

手动修改云主机fixed ip
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19549

请教大神们一个问题，master主机和两个slave机器启动之后，master上通过50070端
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19544

2016年about云08月第02周经典帖子总结

数据挖掘快速入门
http://www.aboutyun.com/thread-19434-1-1.html
1.什么是数据挖掘？
2.机器学习与数据挖掘在什么地方？
3.数据挖掘能解决什么问题？

从0到N建立高性价比的大数据平台
http://www.aboutyun.com/thread-19441-1-1.html
1、如何从无到有建立一个大数据平台？
2、对于传统数据仓库、日志分析工具适合多大数据量？
3、主流OLAP的利器有哪些？

Spark Streaming 快速入门
http://www.aboutyun.com/thread-19469-1-1.html
1.Spark Streaming 的作用是什么？
2.Spark Streaming工作原理是什么？
3.spark streaming 中的离散流是什么？

翻译：Hadoop权威指南之Spark-5
http://www.aboutyun.com/thread-19448-1-1.html
1.Spark是如何工作的？
2.Spark工作是如何提交的？
3.如何构建DAG？
4.如何对任务进行调度？
5.任务如何执行？
6.什么是spark的执行器和集群管理器？
7.Spark on YARN 的关系？
8.什么是YARN client模式？
9.什么是YARN cluster模式？

数据仓库中如何使用索引
http://www.aboutyun.com/thread-19483-1-1.html
1.数据库中索引的作用是什么？
2.什么是维度索引？
3.在事实表上建立索引需要考虑哪些问题？

用机器学习来计算工作技能的匹配度
http://www.aboutyun.com/thread-19484-1-1.html
1.本文的思路是什么？
2.分层聚类是什么？
3.什么是LDA？

Spark Task未序列化(Task not serializable)问题分析
http://www.aboutyun.com/thread-19464-1-1.html
1. org.apache.spark.SparkException: Task not serializable 的原因是什么？
2. 如何在序列化类中标注不需要序列化的成员？
3. 什么时候类必须序列化（extends Serializable）？
4. 如何解决Task not serializable错误？

Nutch 2.3.1 Hbase Hadoop Solr 整合（一）
http://www.aboutyun.com/thread-19437-1-1.html
1.你认为Nutch 2.3.1 Hbase Hadoop Solr整合需要哪些准备工作？
2.本文作者做了哪些准备？
3.整合的过程中，你认为哪些问题是比较重要的？

Nutch 2.3.1 Hbase Hadoop Solr 整合（二）
http://www.aboutyun.com/thread-19445-1-1.html

资源：
01 lucene基础入门视频【限时】
http://www.aboutyun.com/thread-19438-1-1.html

02 孔浩老师lucene视频教程及代码
http://www.aboutyun.com/thread-19439-1-1.html

lucene实战
http://www.aboutyun.com/thread-19443-1-1.html

全文检索【视频教程】
http://www.aboutyun.com/thread-19444-1-1.html

商业银行大数据应用的理论-实践与影响
http://www.aboutyun.com/thread-19470-1-1.html

Neutron Mitaka Update
http://www.aboutyun.com/thread-19471-1-1.html

luncene入门一套小视频
http://www.aboutyun.com/thread-19454-1-1.html

luncene(汤阳光)1
http://www.aboutyun.com/thread-19453-1-1.html

问答：
java或者scala如何生成parquet文件
http://www.aboutyun.com/thread-19482-1-1.html

根据几万个字段，rowkey过滤查询问题
http://www.aboutyun.com/thread-19476-1-1.html

hive启动报错 nullappender
http://www.aboutyun.com/thread-19491-1-1.html

cinder 云硬盘问题
http://www.aboutyun.com/thread-19486-1-1.html

collect完的数据如何以txt文本保存到hdfs指定路径上？
http://www.aboutyun.com/thread-19474-1-1.html

关于spark中map、reduce的一点疑问
http://www.aboutyun.com/thread-19447-1-1.html

2016年about云08月第01周经典帖子总结

沉痛悼念雷霄骅CSDN博主、年仅26岁的音视频专家【呼吁IT人注意身体】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19420

Storm读取Kafka数据是如何实现的
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19403
1.本文基于什么版本？
2.Storm读取Kafka数据是如何实现的？
3.实现一个Kafka Spout有哪两种方式？

开源数据可视化工具（For Apache Kylin）使用说明
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19402

1. 如何安装Caravel-Kylin和PyKylin？

2. 如何创建Kylin数据源？

3. 如何添加Kylin表并配置表的维度和指标？

4. 如何进行数据探索分析与可视化展示？

5. 如何定制自己的DashBoard？

6. 如何配置多表关联？

从应用到平台，云服务架构的演进过程
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19393
1.云服务概念都有哪些？
2. 1.0 单应用架构是什么？
3. 2.0服务化架构是什么？
4. 3.0平台化是什么？

Flume中同时使用Kafka Source和Kafka Sink的Topic覆盖问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19391
1.在Kafka Sink中是什么样？
2.在Kafka Source中是什么样？
3.如何解决？

未来的信息化，就是挖掘企业数据、提升战略决策
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19375
1、企业在推进信息化进程过程中会遇到什么问题？
2、企信息化应如何建设？

机器学习算法之朴素贝叶斯算法入门
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19376

1、如何理解朴素贝叶斯算法？
2、如何解析朴素贝叶斯数学公式？
3、如何使用朴素贝叶斯进行文档分类？

人工智能、机器学习、深度学习三者之间的关系
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19366

Spark Streaming kafka 实现数据零丢失的几种方式
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19364

资源：
信息检索的开源工具（骆卫华）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19394

Solr：入门【学习笔记】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19406

Solr实战【英文版】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19405

搜索引擎-原理、技术与系统
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19392

Openstack实战指南（全）失效后重发
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19385

Cloudera产品及服务
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19379

数据挖掘入门课程汇总
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19378

Cloudera-CDH安装-CentOS
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19369

数据挖掘入门课程：第七章复杂数据的挖掘
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19368

问答：

storm读取kakfa数据卡住的问题。
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19389

JavaApi操作Hbase慢的问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19371

使用alter table add column 之后的数据问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19365

spark 单机运行成功，如何使用or测试？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19361

求教hive空值处理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19360

keystone部署问题求助
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19359

在启动实例的时候提示没有可用的域
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19412

2016年about云07月第05周经典帖子总结

Redis设计与实现：Redis底层研究之简单动态字符串SDS
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19325
1.什么是SDS？
2.SDS可以用来做什么?
3.SDS较C字符串有什么优点？

Apache Spark 2.0正式版发布下载
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19318
1. Spark2.0 有哪些新特性？
2. Spark2.0 API有哪些新特性？
3.  Spark2.0 的速度为什么更快？
4. Structured Streaming的优势在哪？

Cloudera Search 快速入门指南
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19308
1.前提条件是什么？
2.如何在 Search 中加载数据和为数据编制索引？
3.如何使用使用 Search 查询已加载的数据？

用户在线广告点击行为预测的深度学习模型
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19304
1、深度学习目前应用在哪些行业？
2、深度学习在Multi-field Categorical数据集上如何应用？
3、如何对模型效果进行评估？

Spark Streaming中空RDD的处理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19303
1、用什么方式判断空RDD？
2、Spark Streaming与Kafka如何处理空RDD？

很认真地聊一聊程序员的自我修养
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19307

spark不同版本集群停止，防止数据丢失的方法介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19292
1.spark1.3及以前版本如何实现集群停止？
2.spark1.4采用以前版本方法会出现什么问题？
3.spark1.4如何实现停止集群，不丢失数据？

通过Spark DataSource API 如何实现Rest数据源
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19291
1.本文解决了什么问题，场景是什么？
2.Spark DataSource API 如何实现Rest数据源的？
3.数据扫描的方法，目前Spark SQL提供了几种方法？

资源：

数据仓库12、13：参考文献及附录
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19289

数据挖掘入门课程：第一章  数据挖掘基本知识
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19310

数据挖掘入门课程：第二章  数据预处理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19311

数据挖掘入门课程：第三章  定性归纳
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19323

数据挖掘入门课程：第四章  分类与预测
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19324

搜索引擎技术基础
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19300

搜索引擎技术及趋势
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19299

数据仓库系列汇总下载【限时】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19296

Nutch入门教程
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19294

搜索引擎技术之数据结构
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19293

数据仓库11：技术汇总
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19288

问答：
关于hbase的rowkey设计及查询
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19342

2016年about云07月第04周经典帖子总结

MapReduce shuffle过程剖析及调优
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19222
1、什么是MapReduce？
2、Mapper、Reducer实现什么工作内容？
3、如何进行MapReduce性能调优？

企业该如何构建大数据平台【技术角度】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19255
1.作为一个技术人员，你认为该如何搭建大数据平台？
2.构建大数据平台，你认为包括哪些步骤？
3.本文是如何构建大数据平台的？

Hive的执行原理、与关系型数据库的比较
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19282

1.什么是Hive？

2.Hive编译流程是什么？

3.Hive与数据库有什么区别？

ELKELK(ElasticSearch, Logstash, Kibana)平台介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19254
1. ELK平台包括哪些工具？
2. ElasticSearch如何配置和启动？
3. Logstash如何配置和启动？
4. Kibana如何配置和启动？

Python在金融，数据分析，和人工智能中的应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19245
1.Python如何在金融中的应用？
2.Python如何用于分析学？
3.Python如何在人工智能领域的应用？
4.Python如何在数学中的应用？

Hadoop2.6.0的FileInputFormat的任务切分原理分析（即如何控制FileInputFormat的ma...
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19244

1.如何进行第一次优化？
2.如何进行第二次优化？
3.如何进行第三次优化？
4.如何进行源码分析？

Spark的广播和累加器的使用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19223
1、什么是广播变量和累加器？
2、Java和Scala如何实现？

Hadoop2工作经验分享线上mapreduce任务执行时间问题分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19217
1.mapreduce任务执行时间长问题，是如何分析的?
2.mapreduce任务执行时间长问题，是如何解决的?

SolrCloud入门:相关概念及数据迁移
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19216
1.为何产生SolrCloud？
2.SolrCloud哪些概念，含义都是什么？
3.SolrCloud有哪两种路由算法？

资源：
数据仓库1：决策支持系统的发展
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19206

数据仓库2：数据仓库环境
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19207

数据仓库3：设计数据仓库
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19218

数据仓库4：数据仓库中的粒度
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19219

数据仓库5：数据仓库和技术
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19232

数据仓库6：分布式数据仓库
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19233

数据仓库7：高级管理人员信息系统和数据仓库
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19239

数据仓库8：外部数据/非结构化数据与数据仓库
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19241

数据仓库9：迁移到体系结构设计环境
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19256

数据仓库10：数据仓库的设计复查要目
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19257

hbase全分布式配置
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19247

看看老外视频教程系列汇总：SQL on Hadoop - 使用hive分析大数据
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19238

dubbo视频系列视频汇总
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19211

问答：

求大神指导，学习大数据需要有哪些基础？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19253

storm1.0.1启动问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19209

2016年about云07月第03周经典帖子总结

【Lucene】Apache Lucene全文检索引擎架构之入门实战
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19160
1、Lucene中的全文搜索原理是什么？
2、如何使用Lucene处理问题？

轻松理解隐马尔可夫模型（HMM）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19161
1、什么是熵(Entropy)？
2、如何理解最大熵模型？
3、如何理解隐马尔可夫模型（HMM）？

Spark2.0 SQL中的Time Window实例
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19171
1.Spark SQL中的window API是哪个版本引入的？
2.本文通过什么例子解释了Window API的使用？

Apache Kylin的快速入门
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19193
1Kylin是如何产生的？
2.什么时候会用到Apache Kylin？
3.Apache Kylin发展到了什么程度？

使用Phoenix将SQL代码移植至HBase
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19182
1. HBase Shell如何使用?
2. Java如何远程连接HBase？
3. 如何安装和配置Phoenix?
4. Phoenix的语法有哪些？
5. 如何安装和使用SQuirrel？
6. 如何使用Phoenix移植SQL代码至HBase？
7. Phoenix如何进行性能调优？

新浪微博混合云架构实践弹性调度介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19180
1.新浪混合云弹性调度系统架构是如何演进的？
2.新浪混合云如何将业务合理调度到计算节点上？
3.Swarm是什么？

Spark 2.0技术新特性总结
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19170
1.Spark 2.0SQL做了哪些改变？
2.Spark 2.0，DataFrame、Dataset API做了哪些改变？
3.Structured Streaming APIs是什么？

学会如何学习
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19153

算法入门-算法和菜谱之间的联系
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19152
1.算法和菜谱有什么共同点？
2.本文认为什么是算法？
3.算法有哪两个必要条件？
4.算法有哪两大支柱？

资源：

MLlib在淘宝的应用和改进
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19184

经典算法大全
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19145

dubbo视频系列之一入门基础篇【限时分享】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19146

dubbo视频系列之二入门高级篇【限时分享】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19154

dubbo视频系列之三高可用架构篇【限时分享】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19165

dubbo视频系列之四相关文档及所用资源【限时分享】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19172

dubbo视频系列之五源码及相关例子【限时分享】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19183

RDD(弹性分布式数据集)-内存集群计算容错抽象【英文】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19173

使用spark监控电子交易环境【英文】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19166

大型集群上的快速和通用数据处理架构（修正版）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19155

问答：

关于大数据技术选型的困惑
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19176

CM 安装的oozie调用hive任务报找不到数据库
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19163

2016年about云07月第02周经典帖子总结

机器学习算法入门
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19121
1.什么是程序？
2.什么是算法？
3.什么是机器学习算法？
4.机器学习的主要任务是什么？
5.机器学习+数据库=？
6.什么是自然语言处理？

如何用深度学习识别网络欺诈?
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19094
1、当前欺诈广告是什么现状？
2、什么是深度学习？
3、如何用卷积类神经网络算法构建系统？

OpenStack云端的资源调度和优化剖析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19085

1.OpenStack如何资源调度？

2.PRS是什么？

3.OpenStack调度如何优化？

资源：
Spinach-构建于Spark之上的即席查询引擎
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19097

智慧城市中的大数据-李德仁
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19122

OpenStack实战指南（全）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19117

实现近实时健康数据中心数据分析使用模型驱动编程在Spark-Streaming和GraphX
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19112

看看老外视频教程系列3：SQL on Hadoop - 使用hive分析大数据-hive查询语言
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19088

看看老外视频教程系列4：SQL on Hadoop - 使用hive分析大数据-高级HiveSQL
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19096

看看老外视频教程系列5：SQL on Hadoop - 使用hive分析大数据-存储与生态系统
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19111

计算机的心智操作系统之哲学原理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19090

问答：

大数据找工作
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19101

2016年about云07月第01周经典帖子总结

教你如何用R进行数据挖掘（一）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19063
1.为什么学习R语言？
2.怎么样用R语言进行计算？

美团Spark性能优化指南——基础篇
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19058
1. Spark开发调优常见的有哪几个原则？如何使用？
2. Spark作业运行的基本原理是什么？
3. Spark资源调优可以有哪些参数？

机器学习教程二-安装octave绘制3D函数图像
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19006

1.mac系统如何安装？

2.centos7系统如何安装？

3.效果图如何？

深度学习与自然语言处理(1)_斯坦福cs224d Lecture 1
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18996

1.什么是自然语言处理？

2.什么是词向量？

3.有哪些分词模型？

深度学习在自然语言处理上的应用(2)_斯坦福cs224d Lecture 2
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19005

1.如何词向量评价？

2. 对外在性任务如何进行训练？

3.什么是词窗分类？

深度学习与自然语言处理(3)_斯坦福cs224d Lecture 3
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19018
1、如何理解神经网络？
2、什么是正向计算，反向传播？
3、如何理解梯度检验、参数的哈维初始化和学习速率？

深度学习与自然语言处理(4)_斯坦福cs224d 大作业测验1与解答
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19041
1.什么是Softmax ？
2.神经网络怎样学习？
3.什么是word2vec？
4.如何进行情感分析？

Spark python开发---Spark处理后的数据可视化
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18992

1.怎样进行数据可视化的预处理？

2.怎样创建wordcloud？

3.怎样进行tweets定位并在地图上显示？

Spark Streaming性能优化系列-如何获得和持续使用足够的集群计算资源？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18984
1.数据峰值有什么影响？
2.如何限制Spark的接收速度？

机器学习教程一-不懂这些线性代数知识别说你是搞机器学习的
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18997

网络爬虫项目介绍及简单例子
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19067

顶尖程序员的5个特点
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19038

资源：

看看老外视频教程系列1：SQL on Hadoop - 使用hive分析大数据-hadoop入门
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19056

从非结构化文本基于NLP使用spark提取关系【英文about云】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19031

使用spark sql旋转【行列转换】数据
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19024

spark：分析操作系统【about云】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19030

spark sql优化器的改进【about云】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19011

spark学习深递归神经网络【about云】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19010

构建实时数据仓库【英文】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19000

大数据结构中如何使用spark属性2016
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18999

问答：

storm startOffsetTime的问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=19009

求助：安装hive报这个错
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18988

2016年about云06月第04周经典帖子总结

中文分词原理和实现
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18901

1.有哪些主流分词方法？

2.什么是基于规则或词典的方法的分词方法？

3.什么是基于统计的分词？

HBase性能优化方法总结
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18909

1.HBase有哪几种性能优化方法？

2.HBase怎样做到高并发、批量读写？

3.怎样优化Rowkey？

别因为要学的太多反而压垮自己
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18919
1.不要学习的太多？
2.什么是及时的学习？
3.你不可能什么都知道吧？

用Akka解决Spark+ElasticSearch实时计算平台的瓶颈
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18982
1.Spark和ElasticSearch怎么解决实时计算瓶颈？
2.Akka和ElasticSearch怎么解决实时计算瓶颈？

HBase最佳实践网易视频云--内存规划
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18932
1.本文hbase是如何规划内存的？
2.写多读少型 + LRUBlockCache 内存规划思路是什么？
3.读多写少型 + BucketCache 内存你认为内存该如何规划？

Kafka设计解析（四）- Kafka Consumer设计解析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18921
1.什么是High Level Consumer？
2.如何使High Level Consumer Rebalance？
3.如何观察Consumer状态机？

Kafka设计解析（二）- Kafka HA高可用（上）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18903
1.Kafka为何需要High Available?
2.Kafka为何需要Replication?
3.如何将所有Replica均匀分布到整个集群?

资源：
刘永平-Spark-streaming在京东的项目实践
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18924

spark2.0文档【2016英文】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18970

hadoop实战系列3：hdfs源码跟踪及job提交源码跟踪等
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18905

hadoop实战系列4：自定义bean及hadoop序列化接口等自定义
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18926

hadoop实战系列5：hadoop HA原理、部署及相关zookeeper
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18938

hadoop实战系列6：hive及hbase入门相关视频
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18951

hadoop实战系列7：流量项目背景简介及行为轨迹增强模块等视频
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18969

机器学习的一本书
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18954

spark配置-企业系统管理员【英文资料】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18939

加速企业spark【英文版】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18906

问答：

求助：安装hive报这个错
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18988

2016年about云06月第2,3周经典帖子总结

Spark python 开发者 ---Spark流式数据处理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18866
1.Spark Streaming在数据密集型应用中的位置在哪？
2.Spark Streaming 内部工作方式是什么样的？
3.Spark Streaming 的底层基础怎么实现？
4.如何构建容错系统？
5.怎样以TCP sockets处理实时数据？
6.如何实时控制Twitter数据 ?
7.如何实时处理Tweets？
8.怎样构建一个稳定缩放的流式应用？
9.如何搭建 Kafka？
10.怎么开发 producers？
11.如何开发 consumers？
12.如何在Kafka 上开发Spark Streaming consumer？
13.如何探索flume？
14.基于Flume, Kafka和Spark开发数据流水线是什么样的？

Hive的HQL语句及数据倾斜解决方案
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18889

1. Hive如何创建内部表和外部表？

2. Hive如何进行分区？

3. Hive常用的基本操作有哪些？

4. Hive如何自定义函数？

5. Hive中常见的数据倾斜有哪些？如何解决？

Kafka设计解析（一）- Kafka背景及架构介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18894
1.kafka有什么作用？
2.常用的消息队列有哪些？
3.Kafka的设计理念是什么？

机器如何感受人类表情：表情符号&深度学习
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18884
1.Dango是什么？
2.Dango的作用是什么？
3.Dango工作原理是什么？

大数据系统数据采集产品的架构总结与介绍、分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18869
1.什么是Apache Flume？
2.什么是Fluentd？
3.什么是Logstash？
4.什么是Scribe？
5.什么是Chukwa？
6.什么是Splunk Forwarder？

Kafka Streams入门指南
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18827

1.什么是Kafka Streams？
2.有哪些核心概念？
3.参数如何配置？

数据分析之共同好友统计
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18826
1.如何统计好友？
2.如何用代码实现？
3.用到的算法是什么？

支持关系型数据库及NoSQL的统一数据建模方案
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18811

1.什么是Unified Modelset？

2.Unified Modelset中怎样查询？

3.Unified Modelset中怎样统一链接数据库？

Kylin环境搭建和操作
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18809

1.什么是Kylin？

2.怎样搭建Kylin环境？

3.Kylin工作原理是什么？

Spark 2.0中Dataset介绍和使用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18780
1.什么是dataset?
2.本文认为DataSet和RDD主要的区别是是什么？
3.Dataset Wordcount实例本文用了几步？

入职阿里巴巴数据分析师——我的10个关键转折点
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18794

Kafka - SQL 引擎分享
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18793

1.在Kafka中使用SQL的流程是什么？

2.怎样配置Kafka，使得在Kafka中使用SQL？

3.在Kafka中使用SQL有哪些注意事项？

hadoop2.6+zookeeper-3.4.6+hbase-1.0.3+hive1.2.1环境搭建
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18824

Spark(1.6.1) Sql 编程指南+实战案例分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18753
1、Spark SQL操作流程有哪些？
2、如何加载/保存数据源？
3、保存模式有哪些？

高可用Hadoop平台－Oozie工作流
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18733

1.什么是Oozie？

2.Oozie Server的依赖有哪些？

3.如何配置Oozie？

六步让你从数据分析小白变成高手
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18727

1.数据分析的步骤是什么？

2.怎样进行数据治理？

3.怎样做指标分析？

基于SSH的HDFS文件web管理系统
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18726

1.怎样利用hdfs管理文件？

2.hdfs中怎样检索文件夹？

3.怎样读取序列文件？

资源：
Spark2.0-陈超
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18756

基于hadoop的统一数据存储和分析平台
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18818

hive编程入门课程——少杰
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18839

Spark Streaming使用和概要图:时金魁
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18755

hadoop实战系列2：hadoop源码跟踪及远程调用等
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18886

hadoop实战系列1：hadoop入门及hadoop岗位要求等
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18873

spark编程
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18859

Spark源码解读迷你【书籍推荐】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18855

淘宝云梯分布式计算平台整体架构
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18854

hive随谈之hive入门
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18840

IBM-bigtable系统和结构【英文】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18817

途牛谢辉--akka构建响应式流计算
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18797

并行发展的基础架构_Gator.pdf
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18741

Spark_Mllib_实践与优化_雷宗雄
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18739

张宁--移动大数据技术在互联网金融获客及经营中的应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18730

问答：

map reduce卡住
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18868

hadoop集群监控工具有哪些，推荐一下
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18864

spark sql 最简单的例子一直运行不成功
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18837

2016年about云06月第01周经典帖子总结

Spark在微博Feed算法中的应用实践
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18634

1.新浪微博的三层架构是怎样的？

2.新浪微博的Feed使用场景？

3.新浪微博的Feed是怎样排序的？

支撑微博万亿级访问的Redis优化历程
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18633

1.怎样实现机制高可用优化？

2.怎样做到业务极致定制？

3.怎样实现Redis服务化？

Apache Spark 2.0概述
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18702
1.spark ML持久性的关键特性包括哪些？
2.Apache Spark 2.0为何说为机器学习模型注入持久性？

Hadoop 3.0做了哪些改变
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18701
1.hadoop3.0是基于jdk1.7还是1.8？
2.Hadoop 3.0有哪些新特性？
3.Hadoop 3.0YARN有哪些变化？

HBase 40道测试题【附答案】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18678
1.基础能力都是什么？
2.HBase核心知识点有哪些？
3.HBase 高级应用有哪些重点？
4.HBase 安装、部署、启动如何考察？

搜索引擎索引的数据结构和算法
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18677
1.索引技术的基础是什么？
2.如何建立索引？
3.如何进行查询处理？
4.如何进行短语查询？

Flume+Hadoop+Hive的离线分析系统基本架构(一)
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18660
1、如何设计离线分析系架构？
2、Flume如何收集日志信息？
3、如何使用Mapreduce清洗日志文件？

Flume+Hadoop+Hive的离线分析系统基本架构(二)
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18662
1、如何使用Mapreduce清洗日志文件？
2、如何使用HIVE建立数据仓库？

商品搜索引擎—推荐系统设计
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18641
1.推荐系统有什么需要了解？
2.Mahout，你懂多少？
3.如何个性化推荐？

程序员的薪资是怎么得来的？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18640

1.能力决定薪资？

2.业务与薪资关系？

3."功利心"，你有吗？

资源：
淘宝海量数据产品技术
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18703

openstack所有命令
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18705

云计算在智能电网调度技术支持系统中的应用研究
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18684

Hadoop云计算平台在视频转码上的应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18667

Hadoop在雅虎的应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18666

腾讯云存储：专业的存储解决方案
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18653

2016年about云05月第03周经典帖子总结

我是如何准备技术面试的
http://www.aboutyun.com/thread-18480-1-1.html

Spark:Master High Availability（HA）高可用配置的2种实现
http://www.aboutyun.com/thread-18498-1-1.html

1. Spark的HA实现可以通过哪几种方式实现？

2. 基于文件系统的单点恢复实现HA如何配置？

3. 基于zookeeper的Standby Masters实现HA如何配置？

Apache Flink：详细入门
http://www.aboutyun.com/thread-18491-1-1.html
1.Apache Flink是什么？
2.Flink在实现流处理和批处理时，与传统的一些方案有什么不同？
3.Apache Flink流处理有哪些特性？

程序员，我们都是夜归人
http://www.aboutyun.com/thread-18453-1-1.html

Flume+Kafka收集Docker容器内分布式日志应用实践
http://www.aboutyun.com/thread-18452-1-1.html
1、如何设计Flume+Kafka收集架构？
2、如何修改Docker内配置文件？
3、如何进行Flume配置？
4、如何定制RollingByTypeAndDayFileSink？

理解 OpenStack 高可用（1）：OpenStack 高可用和灾备方案（上）
http://www.aboutyun.com/thread-18430-1-1.html
1.什么是HA?
2.OpenStack HA有几类？
3.OpenStack HA的方案有哪些？

程序员你为什么这么忙？
http://www.aboutyun.com/thread-18429-1-1.html

1.目标比结果重要？

2.做好一件事，胜于做过十件事？

3.如何减少犹豫？

about云每日一读汇总（第十四篇2016.05.16）
http://www.aboutyun.com/thread-18424-1-1.html

将 Spark 中的文本转换为 Parquet 以提升性能
http://www.aboutyun.com/thread-18422-1-1.html

1.什么是Parquet？

2.Hbase怎样转换为Parquet？

使用 Spark Streaming 检测关键词
http://www.aboutyun.com/thread-18420-1-1.html

1.什么是Spark Streaming？

2.怎样用Spark Streaming做关键词检测？

3.怎样实现关键词检测程序？

用实例讲解Spark Sreaming
http://www.aboutyun.com/thread-18409-1-1.html

1.什么是Spark Streaming？

2.Spark Streaming如何工作？

3.怎样实现Spark Streaming？

e袋洗的微服务架构之路与Docker实践
http://www.aboutyun.com/thread-18407-1-1.html

1.为什么要拆成微服务的架构？

2.单体架构有什么问题？

3.拆分微服务可能会带来的问题？

资源：
云计算Docker虚拟化全套教程分享【限时】
http://www.aboutyun.com/thread-18438-1-1.html

搜索引擎构建与爬虫技术[全套视频]
http://www.aboutyun.com/thread-18464-1-1.html

搜搜-机器学习平台汇报
http://www.aboutyun.com/thread-18465-1-1.html

华为FusionInsight HD 2.3基础技术-Spark
http://www.aboutyun.com/thread-18439-1-1.html

深入Docker的镜像、容器和仓库以及测试下的Docker2【视频】
http://www.aboutyun.com/thread-18419-1-1.html

为什么Docker是云计算必然的现在和未来1【视频】
http://www.aboutyun.com/thread-18418-1-1.html

问答：

spark配置ha（用zookeeper）
http://www.aboutyun.com/thread-18455-1-1.html

hive2.0多表操作问题
http://www.aboutyun.com/thread-18397-1-1.html

2016年about云05月第03周经典帖子总结

搜索引擎索引数据结构和算法
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18377

1. 如何理解单词-文档模型？

2. 什么是倒排索引？如何理解？

3. 如何理解倒排列表？

4. 建立索引的方法有哪些？如何建立动态索引？

5. 索引更新有哪些策略？

6. 建立索引后，查询处理机制有哪些？

7. 如何实现多字段索引？

8. 如何进行短语查询？

9. 如何进行分布式查询？

Cloudera 系列1：Cloudera 入门指南
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18378
1.Cloudera 提供了那些产品和工具？
2.Cloudera Navigator的作用是什么？

朴素贝叶斯分类和预测算法的原理及实现
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18350
1、如何理解贝叶斯公式？
2、贝叶斯推断是什么？
3、贝叶斯算法如何应用到实例？

Hadoop获得集群NameNode和DataNode状态
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18336

1.如何配置Configuration？

2.如何获得DataNode相关信息？

3.如何获得Active NameNode？

乐视+金山+360面试经历与感受--积累很重要
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18394

Redis协议详解
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18371
1.RESP协议如何进行描述？
2.Simple Strings怎样进行响应？
3.RESP Errors如何进行响应？
4.RESP整型如何响应？
5.RESP Bulk Strings如何响应？
6.RESP 数组是什么样？
7.在Arrays中的NULL 元素是什么样的？
8.如何向Redis服务端发送命令？
9.多命令和管道处理如何工作？
10.Inline命令如何应用？
11.PHP如何实现Redis客户端？

人工智能大拿解答机器学习30个问答
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18370
1.强化学习是像Yann LeCun说的那样，是画龙点睛的一笔吗?
2.理解大脑对于理解深度学习来说有多重要?
3.有没有深度学习永远不能学会的东西?
4.Yoshua Bengio对于Kaggle和其他机器学习竞赛有什么看法
5.深度学习研究将去往何方?
6.一个人怎样才能开始机器学习?
7.Yoshua Bengio怎么看OpenAI?
8.目前对于深度学习的炒作是否言过其实?
9.在深度学习方面有哪些开放的研究领域?
10.深度学习能像在视觉和语音领域中那样在自然语言处理领域中取得成功吗?
11.深度学习与机器学习有怎样的不同?
12.对于正在进入机器学习领域的年轻研究人员，你有什么建议?
13.AI对人类有生存威胁吗?
14.只用一个学习算法解决问题是怎么看的?
15.在学术界做深度学习研究与在产业界相比有哪些好处和挑战?
16.机器学习算法的主要限制是它们学习需要太多的数据吗？
17.为什么非监督学习很重要?深度学习在其中起什么作用?
18.深度学习未被研究透彻的众多部分中，哪个是最令人困惑的?
19.传统的统计学习是否会在不久的将来再次战胜深度学习？
20.进入机器学习领域的年轻研究者们有什么建议?

hive2.0安装总结
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18361

Spark Streaming 数据清理机制
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18352
1、DStream和RDD如何理解他们的关系？
2、RDD如何在Spark Stream中产生？
3、怎么释放Cache住的RDD？

hive1.2.1源码导入eclipse
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18338
1.如何配置 local_reposity ？
2.如何进行编译？
3.如何使用？

资源：阿里巴巴百家讲坛-大规模离线数据计算-hadoop
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18369

淘宝学院-大规模离线数据计算-ODPS
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18368

算法技术手册.George.T.Heineman.扫描版
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18357

hadoop集群环境所需资源汇总
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18385

阿里巴巴：HBase最佳实践书睿
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18381

天猫追风堂-Java多线程与并发编程v2
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18380

算法设计与分析基础.第二版.ANANY.LEVITIN.扫描版
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18358

R的统计分析与作图
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18344

《Hive编程指南》pdf
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18339

hadoop图像分割
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18328

一种基于改进的链式MapReduce的并行ETL应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18327

问答：

namenode和resourcemanager启动不了
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18405

yarn-clientm模式下无法找到第三方jar包的问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18395

CDH 纯离线方式安装之后没有hadoop和spark等命令？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18391

求助！服务器突然断电后再启动CDH时cloudera-scm-server无法启动
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18389

hive多表操作报错
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18383

关于Cloudera Manager安装集群目录的问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18372

Phoenix 启动出错
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18366

求助java怎么把HBase数据读出并导入Hive中
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18359

mapreduce实现决策树算法，求帮助
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18355

如何定时清空sparkstreaming的统计结果，重新开始统计
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18347

CDH5.7.0 刚刚安装时 yarn服务启动失败
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18335

2016年about云05月第02周经典帖子总结

Spark 数据ETL及部分代码示例
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18250
1.数据如何处理？
2.从数据中如何提取有用的特征？
3.有哪些衍生特征？

Spark性能优化：JVM参数调优
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18292
1. JVM分为哪几种？
2. 如何监测垃圾回收？
3. 如何优化executor内存比例
4. 更高级的垃圾回收调优有哪些？

15年编程生涯，资深架构师经验总结
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18295

Hbase split方式及过程介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18281
1.在一个region中是否可以有一个或多个stroe？
2.什么是store？
3.一个store包含哪些内容？
4.Pre-splitting解决了什么问题？
5.Pre-splitting如何通过shell实现？
6.什么是自动splitting？
7.如何实现强制split？
8.region splits包含哪些内容？

从日志统计到大数据分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18242

1.怎样从零开始做大数据数据分析？

2.怎样进行利用最新的技术进行系统的架构改造？

3.怎样从零构建大数据平台？

Hadoop2.6.0中YARN底层状态机实现分析及代码示例
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18280
1.什么是Yarn中的事件？
2.Yarn 中的状态指的是什么？
3.什么是转换（过渡）？
4.什么是状态机？
5.如何用状态机构建？
6.什么是状态转移？

Spark函数扩展功能介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18267
1.UDF对spark sql的作用是什么？
2.用Scala编写的UDF与普通的Scala函数唯一的区别在什么地方？
3.如何在spark中使用UDF？

大数据平台搭建利器 Ambari 之 Kerberos 集成之路
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18261

1、什么是Kerberos？
2、Kerberos的认证流程有哪些？
3、Ambari与Kerberos的关系是什么？
4、如何理解Ambari Kerberos Descriptor？

Dr.Elephant入门指南
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18251
1.什么是 Dr.Elephant？
2.为什么要使用Dr.Elephant？
3.核心功能点有哪些？

资源：

facebook为什么使用hbase
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18298

大数据分析：商业价值的路径
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18284

facebook实时数据分析【ppt英文版】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18252

百度海量数据分析语言
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18283

淘宝网：HDFS元数据的独立服务和独立持久化存储
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18270

HDP2.2安装文档（推荐离线安装）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18262

美国俄亥俄州立大学：一个开发处理大数据软件的分析模型【英文】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18297

IBM李建：大数据系统与结构【英文】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18271

让你的应用漫步云端-闫国旗
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18246

大规模跨地域分布式资源的云平台技术挑战与实践-金钧
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18245

问答：

spark 读取oracle，字段类型为Date的处理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18282

2016年about云05月第01周经典帖子总结

在首席架构师眼里，架构的本质是？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18211

大数据，云技术基础知识：ssh解惑，到底谁免登陆谁
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18232
1.对于两台拥有公钥和私钥的密钥对，拥有私钥的客户端是否可以免密码登录公钥的客户端？
2.如果多台机器配置相互免登陆，该如何操作最简单？

技术领导力是如何炼成的？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18210

DStream, DStreamGraph 详解
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18206
1.本文内容适用范围是什么？
2.DStream, transformation, output operation 是什么？
3.quick example 的 transformation, output 如何解析？
4.DStream 类继承体系是什么？
5.Dependency, DStreamGraph 如何工作？

内存泄漏与内存溢出
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18177
1、什么是内存泄漏、内存溢出？
2、两者之间有什么关系？
3、如何从程序上规避？

一个“码农”自述的血泪史：当了35年程序员，我最大的遗憾就是没抓住机遇转行
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18176

Lucene架构介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18168

1.Lucene的优点有哪些？

2.对Lucene API 的调用如何实现索引？

3.如何实现搜索过程？

沈浩：可视化---用数据说话
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18167

1.如何发现数据可视化之美？

2.怎么用数据说话?

3.数据可视化到底是什么?

Spark On Yarn 如何提高CPU利用率
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18154
1. Spark On Yarn 如何提高CPU利用率？
2. 在并行计算处理框架下（Spark或MapReduce），为什么需要将数据进行分片？

spark源码分析之Executor启动与任务提交篇
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18151

1.什么是Spark-submit ？

2.Executor启动流程是什么？

3.Executor怎样进行任务调度？

PySpark处理数据并图表分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18150

1.什么是PySpark？

2.怎样利用PySpark处理数据并进行图表分析？

3.使用PySpark过程中需要注意哪些事项？

CDH集群调优：内存、Vcores和DRF
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18145

1.什么是DRF？

2.怎样进行CDH集群调优？

资源：

Hive 优化例子
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18152

Python的Web抓取
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18204

打造极致高效的搜索系统-阿里巴巴陈超
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18171

轻量虚拟化技术：docker实战分享-陈轶飞
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18205

Hadoop Real-World 解决方案书籍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18213

让机器学习得更快.pdf
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18153

为什么要云监控【英文版】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18186

为服务架构下docker实践及docker在测试环境中的应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18185

Hbase性能测试文档
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18170

问答：

value 中数字的排序
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18202

spark streaming 涨内存排查过程与疑问
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18200

mapreduce 的reduce中values的问题
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18196

keystone创建Identity 实例服务出现HTTP500错误
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18195

spark的gc在哪里配置，SPARK_DEAMON_JAVA_OPTS貌似不管用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18183

2016年about云04月第04周经典帖子总结

hadoop，hbase，hive，zookeeper整合可行性分析及版本确定
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18104
1.如何确定什么版本是稳定版本？
2.本文是如何确定各个版本的？
3.hbase1.x与hive1.x什么情况下是兼容的？

HDFS的工作原理扫扫盲
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18075

1.什么是分布式文件系统？

2.怎样分离元数据和数据？

3.HDFS的原理是什么？

CDH5.7快速离线安装教程
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18107
1.什么是CDH？
2.搭建CDH集群需要的基本环境是什么？
3.如何对集群进行基本的配置？
4.怎样进行Cloudera Manager安装？
5.怎样进行CDH服务安装？

如何在云平台构建大规模分布式系统
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18120

1.如何在云平台构建大规模分布式系统？

2.怎么打造高性能、高可用的负载均衡集群？

支付宝架构师：从工程师到架构师的成长之路
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18118
1.架构师是否有统一的定义？
2.架构师的职责是什么？
3.架构师是如何成长的？

HBase – RegionServer宕机案件侦查
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18093
1、RegionServer宕机如何精确分析故障？
2、如何从Hbase日志中排除定位问题？

数据驱动精准化营销在大众点评的实践
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18092
1、O2O营销的基本组成有哪些内容？
2、数据帮助运营和财务同事解决哪些问题？
3、美团点评外卖、微信红包如何实现精准营销？

实用 | Cloudera产品高可用性配置（操作）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18082

1.如何设置高可用性？

2.如何配置CDH其他组件使用HDFS高可用性？

3.如何配置Impala使用HDFS高可用性

分布式系统中负载均衡算法在高可用场景下的分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18081

1.为什么负载均衡重要？

2.负载均衡策略有哪些？

3.负载均衡的应用有哪些？

资源：
zookeeper入门：十个知识点介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18073

基于Spark_on_Yarn的淘宝数据挖掘平台
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18072

Hadoop分布式文件系统的模型分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18122

李成华--深度学习在自动问答系统中的应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18123

DVM：让 VM运行跟container一样
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18113

夏粉--大规模机器学习技术
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18096

王峰--计算广告技术之大数据下的短文本相关性计算【搜狗搜索】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18095

软件定义网络（SDN）与云安全-毛文波
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18087

超融合计算和存储的虚拟化平台
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18086

问答：

新人求助：使用hive之前启动metastore和hiveserver服务报错？？怎么解决？？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18077

2016年about云04月第03周经典帖子总结

hadoop,hbase,hive，zookeeper版本整合兼容性最全，最详细说明【适用于任何版本】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18015
1.hadoop与hbase哪些版本兼容？
2.hadoop与hive哪些版本兼容？
3.hbase与hive哪些版本兼容？
4.hbase与zookeeper哪些版本兼容？

乐视电商云的整体架构与技术实现
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18008
1、电商系统的发展过程有哪些？
2、乐视电商云架构有哪些框架组成？
3、电商云平台架构有哪些？

centos7:SSH公钥无密码认证
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17996
1.如何生成密钥对？
2.ssh localhost不成功，可能原因是什么？
3.ssh localhost警告的含义是什么？

大数据的明天将驶向何方？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18037
1.大数据的方向如何？
2.大数据的各方面的应用如何？

Flume+Spark Steaming初探
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18034

1.怎么去测试Flume？

2.结合Flume怎么用Spark Streaming去测试？

3.怎么用Flume发送数据给Spark Streaming？

针对 OpenStack 企业级云计算性能测试标准和解决方案，第 1 部分
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18066

1.怎样收集云计算性能测试的需求？

2.怎样分析和定制针对 OpenStack 云计算性能测试策略？

3.怎样制定云计算性能测试的解决方案？

StreamDM：基于Spark Streaming、支持在线学习的流式分析算法引擎
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18021
1.Spark生态圈是否缺乏一个支持在线学习的流分析算法引擎？
2.StreamDM的体系架构和任务流程是什么？
3.StreamDM的关键特性和优点是什么？

Kafka日志删除源码分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17988
1. Kafka如何配置日志保存的时间？
2. Kafka如何配置日志保存的大小？
3. Kafka删除日志的过程是啥？

Hadoop实际工作记录汇总
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17982

1.CDH为什么更好？

2.怎样解决低效的MapReduce Job？

3.怎样解决内存溢出？

问答：
hadoop,hbase,hive，zookeeper版本整合兼容性最全，最详细说明【适用于任何版本】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18024

资源：
绝对《快学Scala》中文完整版书籍,402页清晰电子版32M
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17991

许令波-淘宝网-Zookeeper_入门
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18001

大数据下，实现实时查询的NoSQL系统架构-范昂
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17986

电子商务数据分析指标体系
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18000

英特尔-基于 Apache Spark的机器学习及神经网络
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18042

.王栋--机器学习在美团：吃喝玩乐中的计算
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18041

邓澍军--在线教育领域的机器学习应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18025

从虚拟化到私有云的几大实践方法-金明
http://www.aboutyun.com/forum.php?mod=viewthread&tid=18012

云与端之变：全新的云计算开发平台-李平
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17985

2016年about云04月第02周经典帖子总结

让 BAT 的 Offer 不再难拿
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17920

基于SQL on Hadoop的数据仓库技术
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17919

在 Java 应用程序中使用 Elasticsearch
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17931
1、如何理解ElasticSearch的工作原理？
2、如何从命令行访问REST API基本信息？
3、Java 应用程序如何与ElasticSearch交互？

Hadoop YARN架构设计要点
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17948
1.YARN整体架构是什么？
2.如何实现YARN RPC？
3.ResourceManager内部原理是什么？
4.NodeManager内部原理是什么？
5.事件处理机制是怎样的？
6.什么是状态机？
7.NMLivelinessMonitor如何工作（源码分析）？

程序员面试千万不要犯这些错误
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17949

面试感悟----一名3年工作经验的程序员应该具备的技能
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17956

Redis数据介绍与指令大全
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17957
1.Redis的应用场景？
2.Redis的数据类型有哪些及操作？

引爆Spark大数据引擎的七大工具
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17968
1.Spark的引擎工具有哪些？
2.每个引擎工具各有什么作用？

spark配置说明
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17937

1.spark提供了哪三种方式配置系统？

2.环境变量如何配置？

3.spark.executor.memory的含义是什么？

Spark机器学习API之特征处理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17912

1.怎样利用Spark机器学习API进行特征提取？

2.怎样利用Spark机器学习API进行特征选择？

3.Spark机器学习API中的特征选择有哪几种方法？

Openstack(liberty)+VM+ubuntu14.04上安装各部分截图指导一
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17943

Openstack(liberty)+VM+ubuntu14.04上安装关键注意事项和心得二
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17942

资源：
Kafka自学文档
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17939

用企业级存储架构大数据系统
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17953

中国云计算
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17947

hadoop零基础学习到上手工作
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17946

Sqoop2 1.99.4安装与使用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17938

张溪梦 Simon 增长黑客与数据驱动
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17923

肖永红数据堂数据服务介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17922

使用Scala编程艺术概论【英文版】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17913

2016年about云04月第01周经典帖子总结

使用Solr搭建“小”数据集群的搜索和推荐功能
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17861
1、Solr如何连接Mysql数据库？
2、如何处理位置搜索的数据结构？
3、Solr的学习三个步骤有哪些内容？

机器学习在金融大数据风险建模中的应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17857
1、为什么互联网金融、消费金融需要大数据？
2、什么是T-L核模型、Random Forest模型、ScoreNet模型？
3、机器学习在金融大数据中的关注重点是什么？

Storm在线业务实践-集群空闲CPU飙高问题排查
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17874
1.什么是Storm？
2.集群空闲CPU飙高出现的现象是什么样的？
3.出现此现象之后应该如何进行排查？

当当网高可用架构之道
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17844
1.什么是高可用？
2.系统中的非功能性需求有哪些？
3.如何设计高可用架构？

雷军回应飞猪理论：任何人成功都需要一万小时的苦练
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17843

资源：
Spark各个知识点总结
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17876

发个HADOOP权威指南第3版中文的
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17869

hdfs的透明压缩存储-百度
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17890

Docker的云中实战
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17877

linux下的C编程（0基础）.pdf
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17868

GrowingIO使用spark过程中的小技巧
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17860

让数据说话-spark在TalkingData的应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17851

spark图处理
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17846

Linux内核分析方法
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17845

2016年about云03月第03周经典帖子总结

1号店交易系统架构如何向「高并发高可用」演进
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17782
1.电商（轻量级与重量级）的架构和痛点是什么？
2.电商网站演进之路在何方？
3.怎么做架构演进的准备工作？
4.核心Service如何规划？
5.怎样进行订单水平拆库？
6.SOA中间件是什么？
7.什么是多活机房架构？

物联网核心协议，消息推送技术演进
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17783
1.物联网架构和关键技术分别是什么？
2.什么是移动互联网通信模式？
3.消息推送技术如何进行的演进？

hbase与zookeeper版本对应关系
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17793

设置Hadoop用户以便访问任何HDFS文件
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17777
1、如何设置Hadoop用户访问任何HDFS文件？
2、Hadoop用户访问任何HDFS文件原理是什么？

决策树分类和预测算法的原理及实现
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17776
1、什么是决策树算法？
2、决策树算法有什么特点？
3、如何深度理解决策树算法？

Hadoop 之 Hive & Hbase 简介
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17764
1.一个初学者眼中的 Hive（数据仓库工具）是什么样子？
2.Hive 的 Metastore 三种工作模式是什么？
3.Hive 与传统数据库的相异点在哪？
4.Hbase的建构思路是什么？
5.如何用Java语言构建一颗汉语单词前缀树(trie树）？

代码解析深度学习系统编程模型：TensorFlow vs. CNTK
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17761

1.什么是TensorFlow?

2.什么是CNTK?

3.CNTK与TensorFlow在网络训练上有哪些不同？

2016年大数据发展七大趋势
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17760

1.大数据有哪些趋势？

2.大数据可能"走下神坛"？

3.大数据交易中心模式走向成熟？

Java虚拟机类加载机制
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17750
1. 类的加载过程是怎样的？
2. 类的记载过程中每个阶段是怎样的？

资源：
京东服务框架实践-京东
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17749

部署spark技巧
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17775

从Paxos到Zookeeper 分布式一致性原理与实践【推荐书籍】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17804

spark真实的未来
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17803

Hadoop-2.7.1分布式安装手册
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17789

Spark Streaming 和物联网
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17788

JDBC学习手册
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17774

SBT实战【英文版】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17755

ZooKeeper-3.4.6分布式安装指南
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17754

构建spark sql dataframes,datasets,和streaming
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17748

问答：

哪位大神帮忙看下，重启系统后，hive报错，
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17752

2016年about云03月第03周经典帖子总结

如何基于Spark进行用户画像？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17710
1.拿到数据我们怎么去做数据分析？
2.在spark中怎么去做聚类分析？

机器学习开发者的现代化路径：不需要从统计学微积分开始
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17726
1.机器学习是什么？
2.如何去学习机器学习技术？
3.机器学习常见误区有哪些？

分布式数据库挑战与分析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17711
1.关系型数据库和非关系型数据库的区别？
2.Nosql的分类有哪些？

Hadoop平台上用Sqoop在Hive和DB2数据库之间传输数据的实践和总结
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17696
1.如何将DB2导入Hive数据库？
2.有分区和无分区二者有什么区别？
3.如何从Hive库导入DB2库？

亲密接触Redis-第三天(Redis的Load Balance）-1
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17689
1.Redis3.x中如何引入的Load Balance？
2.Redis Cluster如何实现？
3.怎样使用Rubb Gem的Redis模块+redis-trib.rb创建集群？

协同过滤介绍和简单推荐系统的实现
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17678
1、什么是推荐系统、协同过滤？
2、如何实现相似度测量方法？

Hadoop之使用python实现数据集合间join操作
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17677
1、什么是hadoop steaming？
2、python如何调用steaming？
3、如何使用hadoop steaming分析数据？

亲密接触Redis-第一天
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17674
1.Redis是什么？
2.如何安装Redis？
3.如何使用Spring Data JEDIS来连接Redis Service？

亲密接触Redis-第二天（Redis Sentinel）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17673
1.RDB有哪些优缺点？
2.AOF有哪些优缺点？
3.什么是THP？

集群配置必知：linux下yum安装及配置
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17668
1.如何查看卸载安装包？
2.如何更改yum源？
3.如何清理yum缓存？

资源：

大型分布式网站架构设计与实践
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17660

分布式存储技术整体分析与研究应用
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17672

IBM存储解决方案——数据分析的存储
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17671

Hive 1.2.1&Spark&Sqoop安装指南
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17683

写spark应用5大误区
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17721

开源分布式文件系统
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17699

rpm命令详解
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17698

统计思维：程序员数学之概率统计.pdf
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17682

[O'Reilly：社交网站的数据挖掘与分析].(Mattbew.A.Russell)
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17681

构建健壮的、自适应流媒体应用spark streaming
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17680

连续整合spark apps
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17658

问答：

java 怎么读取hdfs上csv文件的某一列
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17669

2016年about云03月第02周经典帖子总结

Spark 1.6.0 新手快速入门
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17571
1.Spark交互式Shell如何使用？
2.更多RDD操作有什么？
3.缓存机制是怎样的？

机器学习大事记：2分钟看尽机器学习66年发展进程
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17603
1.机器学习应用有哪些方面？
2.机器学习有哪些突破？
3.机器学习未来如何？

HDFS追本溯源：租约，读写过程的容错处理及NN的主要数据结构
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17620
1.hadoop对读写的互斥同步就是靠Lease实现的?
2.LeaseManager中有哪两个时间限制？
3.LeaseManagement是一个什么机制？
4.LeaseManager主要完成哪两部分工作？
5.leaserecovery什么时候收回租约？

大数据时代带来的思想火花
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17623

ZooKeeper原理及使用(详细版本)
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17636
1.zookeeper的基本原理是什么？
2.zookeeper的特性有哪些？
3.zookeeper的应用场景是怎么样的？

Spark on Yarn：性能调优
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17630
1.spark调优有哪些配置项？
2.工作中怎么去自己调优？

基于日志文件的数据挖掘机理分析与研究
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17594
1、数据挖掘的含义是什么？
2、日志数据面临的挑战有什么？
3、如何对日志数据进行挖掘工作？

用Spark/DBSCAN做地理定位数据聚类
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17593
1、如何利用机器学习对用户事件进行分类？
2、如何利用Spark/DBSCAN进行数据聚类？

在Eclipse上配置Python开发环境
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17572
1.什么是PyDev？
2.如何安装PyDev？
3.安装Eclipse过程中如何解决遇到的问题？

资源：
阿里云-飞天系统-总体框架
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17586

流动力的spark可视化数据
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17622

cloudera spark时间序列数据分析【英文版】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17621

Python学习资料----最适合新手的入门学习指导文档！
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17607

七周七语言：理解多种编程范型
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17605

spark下的RDD
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17604

五个原因企业采用spark是不可阻挡的
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17585

Yarn上运行spark-1.6.0
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17577

关于spark和大数据的5个神话
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17576

问答：

hadoop中combine，partition和shuffle的疑问
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17584

求助关于spark mapToPair和reduceByKey遇到的问题，求助
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17570

2016年about云03月第01周经典帖子总结

使用hadoop2.x RPC框架通讯
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17461

HBase – 并发控制机制解析
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17460

1.HBase同步机制是什么？

2.HBase行锁是怎样实现的？

3.HBase怎样实现数据的读写并发控制？

什么是真正的程序员
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17484

大数据架构的未来
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17483

about云之hadoop系列1：centos网络配置
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17481
1.如何配置centos网络？
2.配置网络的关键是什么?
3.如何重启网络？

Kafka是如何实现高吞吐率的
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17525
1.Kafka如何实现分布式消息系统？
2.Kafka是如何实现高吞吐率的？

58同城高性能移动Push推送平台架构演进之路
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17524
1.为什么需要移动Push推送？
2.架构如何设计？

大数据 hadoop2.6.0+spark1.6.0 HA 分布式集群搭建（5个节点）【原创】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17546
1.怎么去安装hadoop？
2.怎么去安装zookeeper？
3.怎么去安装spark？
4.怎么去测试安装正确性？

我是这样克服拖延症的，你也可以试试
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17555

1.怎样追踪你每天的活动，分析时间利用？

2.怎样养成良好的习惯？

3.怎样通过改变自己的日常来治愈拖延症？

基于Spark的异构分布式深度学习平台
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17543
1.PADDLE是什么？
2.PADDLE与业务逻辑结合的痛点
3.Spark on PADDLE 2.0的主要目标是什么？

hosts文件格式说明，为什么还有域名配置
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17531
1.hosts为何有域名？
2.域名的作用是什么?
3.域名是否可以不用配置？

容器技术究竟为云计算带来了什么？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17507

1.容器有哪些好处？

2.容器技术究竟给云计算带来什么本质的改变呢?

3.运用容器技术需要注意什么？

程序员健康：30岁IT男连续工作一个月突然失聪
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17506

1.开发的时候需要注意什么？

2.为何年轻人突发性耳聋越来越平常？

3.程序员如何合理安排工作？

机器学习简史
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17491
1、机器学习如何源起？
2、哪些技术实现是机器学习历史突破？

我所经历的大数据平台发展史（下篇）
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17468

1.非互联网时代的数据模型是什么样的？

2.怎样设计数据模型？

3.设计数据模型的阶段有哪些？

Facebook 广告系统背后的Pacing算法
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17464

1.在线广告术语有哪些？

2.Facebook Pacing算法是怎么工作的?

从大数据的风水图，来看到底大数据是怎么回事
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17451
1.企业级技术与大数据有什么关系？
2.大数据目前生态系统怎么样？
3.大数据目前应用状态？

资源：
Spark Summit East 2016 PPT之五：kafka连接和spark流实时数据管道
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17500

Spark Summit East 2016 PPT之九：Magellan-spark作为地理空间分析引擎
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17526

Spark Summit East 2016 PPT之八：PB级别的数据科学使用spark和R
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17511

Spark Summit East 2016 PPT之六：spark and the enterprise
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17501

Spark Summit East 2016 PPT之三：Spark at Bloomberg：Dynamic Composable Analytics
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17485

Spark Summit East 2016 PPT之四：Monte Carlo Simulations in Ad lift Measuremen...
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17486

Spark Summit East 2016 PPT之二：Distributed TIme Travel for Feature Generation
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17472

Spark Summit East 2016 PPT之一：Office 365 spark
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17470

问答：

高可用Hadoop2.7、Hbase1.1.3集群配置:高可用集群安装与部署
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17508

求教关于spark streaming 处理时间片数据的调度策略
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17503

IDEA下新建Maven项目，没有自动生成src文件目录，怎么解决？求解，多谢多谢！
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17492

hadoop、hbase 、spark 我该选哪个呢？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17478

使用官方原版flume写日志到HDFS异常
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17477

用Spark读写Hbase出现Task not serializable
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17476

搭建openstack-keyston同步数据库时出现以下报错信息
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17458

安装IDEA，启动时出现错误，求大神解答，万分感谢!
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17457

2016年about云02月第04周经典帖子总结

hadoop入门:第十章hadoop工具
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17402
1.hadoop有哪些工具？
2.hadoop流的作用是什么？
3.hadoop集群负载如何模拟？
4.hadoop数据提取和分析工具是哪个？

经典大数据架构案例：酷狗音乐的大数据平台重构
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17437

1.酷狗音乐的大数据平台重构重构原因？
2.新一代大数据技术架构是什么样？

大型网站架构系列：电商网站架构案例
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17407
1、电商网站考虑的客户需求有哪些？
2、网站架构如何演变的？
3、电商架构优化需考虑哪些内容？

Java HashMap工作原理及实现
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17420
1.什么时候会使用HashMap？他有什么特点？
2.你知道HashMap的工作原理吗？
3.你知道get和put的原理吗？equals()和hashCode()的都有什么作用？

从大数据的风水图，来看到底大数据是怎么回事
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17451
1.企业级技术与大数据有什么关系？
2.大数据目前生态系统怎么样？
3.大数据目前应用状态？

我需要学习 R 吗？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17447
1.为什么选择R？
2.R是什么？有什么用途？
3.使用R会出现什么问题？

我所经历的大数据平台发展史（一）：非互联网时代 • 上篇
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17435

1.数据仓库第一代架构怎么样？

2.数据集市架构什么样？

3.数据仓库第二代架构怎么样？

Spark Streaming实践和优化
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17421
1.什么是Spark Streaming?
2.Spark Streaming如何在Hulu应用？
3.Spark Streaming如何优化？

使用hadoop+中文分词统计小说里的用词频率
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17410
1.使用hadoop+中文分词统计小说里的用词频率本文是如何实现的？
2.本文基于什么环境？
3.如何在IDEA创建项目？
4.如何在IDEA运行项目？

Hadoop十岁！Doug Cutting成长史+他眼中大数据技术的未来
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17400
1.认识 Doug Cutting，hadoop之父吗？
2.hadoop的发展经历了哪些阶段？
3.hadoop的未来会是什么？

人工智能的今天
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17389

1.什么是人工智能？

2.人工智能将可能会带来哪些威胁？

3.什么是机器学习？

程序员如何向老板提出加薪的要求？
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17385

1.程序员如何向老板提出加薪？

2.真正加薪的原因是什么？

3.提加薪之前需要做哪些功课？

Redis入门6--Redis发布/订阅
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17384

1.Redis发布订阅(pub/sub)是什么？

2.Redis发布订阅(pub/sub)有哪些功能？

3.怎样用JAVA实现Redis发布订阅(pub/sub)？

hadoop入门:第七章YARN REST APIs
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17387

hadoop入门:第十一章hadoop配置
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17424

hadoop官网帮助手册
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17427

资源：
最全的大数据解决方案
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17391

Linux命令大全完整版
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17404

Java程序员面试宝典
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17438

360HDFS下载平台介绍
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17426

在CentOS6 64上用构建HDFS分布式文件系统【hadoop2.7】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17415

Python 数据分析【英文版】
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17414

Fuel6.0之OpenStack Juno
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17403

hadoop-eclipse-plugin插件--hadoop2.6，hadoop2.7.1
http://www.aboutyun.com/forum.php?mod=viewthread&tid=17392

2016年about云02月第03周经典帖子总结

秒杀系统架构分析与实战
http://www.aboutyun.com/thread-17322-1-1.html

1.秒杀业务如何分析？
2.秒杀业务有哪些挑战？
3.秒杀业务框架的原则是什么？

hadoop入门:第六章YARN文档概述
http://www.aboutyun.com/thread-17338-1-1.html

Redis入门1--入门篇
http://www.aboutyun.com/thread-17346-1-1.html
1.什么是Redis?
2.Redis如何安装？
3.Redis客户端有哪些？

Redis入门2--Redis数据类型及相关命令
http://www.aboutyun.com/thread-17347-1-1.html
1.Redis中keys的相关命令？
2.Redis中string的相关命令？
3.Redis中list的相关命令？

Redis入门3--Redis键值设计和Redis数据存储优化机制
http://www.aboutyun.com/thread-17361-1-1.html
1.Redis键值怎么设计？
2.Redis数据存储优化机制是什么样？

数据挖掘的常用方法、功能和一个聚类分析应用案例
http://www.aboutyun.com/thread-17360-1-1.html
1.数据挖掘的常用方法有哪些？
2.数据挖掘的功能有哪些？
3.数据挖掘的聚类怎么应用？

Twitter的用户推荐算法
http://www.aboutyun.com/thread-17333-1-1.html
1、Twitter的用户推荐算法是什么？
2、新浪微博的用户推荐类型有什么？
3、Twitter的算法主要关注哪些方面？

hadoop入门:第四章mapreduce文档概述
http://www.aboutyun.com/thread-17319-1-1.html

hadoop入门:第三章HDFS文档概述（二）
http://www.aboutyun.com/thread-17316-1-1.html

shell awk 检查程序是否执行
http://www.aboutyun.com/thread-17313-1-1.html

1.Linux中怎样判断某一进程是不是运行？

2.Linux中怎样倒排序查看，服务器各类进程数？

3.Linux中怎样利用awk检查程序是否执行？

资源：
Openstack的Hadoop整合实践
http://www.aboutyun.com/thread-17314-1-1.html

hadoop应用案例
http://www.aboutyun.com/thread-17340-1-1.html

Scala编程学习:快捷入门Scala编程【英文版】
http://www.aboutyun.com/thread-17370-1-1.html

ceph_性能调优
http://www.aboutyun.com/thread-17369-1-1.html

大数据时代的数据银行
http://www.aboutyun.com/thread-17352-1-1.html

数据可视化及NEV
http://www.aboutyun.com/thread-17351-1-1.html

CentOS6.5+OpenStack+kvm云平台部署
http://www.aboutyun.com/thread-17341-1-1.html

RHEL6.6分布式文件系统方案--ceph-pub
http://www.aboutyun.com/thread-17325-1-1.html

机器学习英文版-Thoughtful Machine Learning【235页】
http://www.aboutyun.com/thread-17324-1-1.html

spring2.5-中文参考手册
http://www.aboutyun.com/thread-17315-1-1.html

Openstack的Hadoop整合实践
http://www.aboutyun.com/thread-17314-1-1.html

2016年about云01月第05周经典帖子总结

聚焦爬虫原理及其在互联网金融领域应用前景浅析
http://www.aboutyun.com/thread-17177-1-1.html
1.什么是爬虫？
2.本文聚焦爬虫的分为几类？
3.深聚焦爬虫的结构包含哪些内容？

每个架构师都应该研究下康威定律
http://www.aboutyun.com/thread-17205-1-1.html

Log4j日志入门
http://www.aboutyun.com/thread-17185-1-1.html
1.什么是log4j ？
2.log4j有哪三个组件？
3.如何配置log4j 配置文件？

Hadoop平台架构--存储篇
http://www.aboutyun.com/thread-17212-1-1.html
1.存储格式选择和效率如何权衡？
2.存储如何规划的？
3.为什么走向分布式？

Hadoop平台架构--硬件篇
http://www.aboutyun.com/thread-17211-1-1.html
1.什么决定集群规模？
2.硬件配置如何选择？
3.Hadoop版本如何选择？
4.节点该如何分配？

企业级 OpenStack 的六大需求（第 3 部分）：弹性架构、全球交付
http://www.aboutyun.com/thread-17198-1-1.html
1.为什么企业级往往和高可靠、高扩展和高性能的高质量系统相关？
2.OpenStack默认使用的网络是个半成品？
3.如何培训你的IT管理员成为新的云管理员？

如何选用一款适合自己的大数据分析工具
http://www.aboutyun.com/thread-17194-1-1.html

MLlib回归算法（线性回归、决策树）实战演练--Spark学习（机器学习）
http://www.aboutyun.com/thread-17183-1-1.html
1、Spark MLlib如何实现线性回归？
2、Spark MLlib如何实现决策树？
3、如何进行性能评估？

企业级 OpenStack 的六大需求（第 1 部分）：API 高可用、管理和安全
http://www.aboutyun.com/thread-17173-1-1.html
1.企业数据中心中的OpenStack是什么？
2.为什么要高可靠的云API？
3.如何健壮的管理？

资源：

华为金融行业大数据实践分享
http://www.aboutyun.com/thread-17221-1-1.html

华为海量视频解决方案
http://www.aboutyun.com/thread-17220-1-1.html

基于HDFS的多用户并行文件IO的设计与实现
http://www.aboutyun.com/thread-17210-1-1.html

机器学习实践指南：案例应用解析
http://www.aboutyun.com/thread-17209-1-1.html

离线快速安装分布式kilo版本openstack
http://www.aboutyun.com/thread-17202-1-1.html

文本挖掘手册【英文版】
http://www.aboutyun.com/thread-17190-1-1.html

神经网络学习理论基础【英文版400页】
http://www.aboutyun.com/thread-17189-1-1.html

R 数据导入和导出
http://www.aboutyun.com/thread-17176-1-1.html

时间序列分析及应用：R语言（原书第2版）
http://www.aboutyun.com/thread-17175-1-1.html

Spark大数据处理：技术、应用与性能优化
http://www.aboutyun.com/thread-17170-1-1.html

问题：

spark的初学Stream开发遇到问题请教
http://www.aboutyun.com/thread-17178-1-1.html

2016年about云01月第04周经典帖子总结

30岁的程序员，你迷惘了吗?
http://www.aboutyun.com/thread-17117-1-1.html

程序员保持健康的7个秘诀
http://www.aboutyun.com/thread-17067-1-1.html

最火搜索引擎：ElasticSearch详解与优化设计
http://www.aboutyun.com/thread-17078-1-1.html
1、什么是ElasticSearch？
2、ElasticSearch有什么作用？
3、如何优化ElasticSearch？

负载均衡之Nginx+tomcat+redis实现session共享的负载均衡
http://www.aboutyun.com/thread-17124-1-1.html
1.怎么样使用软负载实现session共享？
2.怎么样配置tomcat,nginx应用服务器？
3.怎么样搭建session共享环境？

老于聊架构：为什么说架构是一种思维模式
http://www.aboutyun.com/thread-17152-1-1.html

1.为什么要做架构？

2.如何做架构？

3.架构是什么？

远程接口设计经验分享
http://www.aboutyun.com/thread-17068-1-1.html
1.远程接口的系统架构是什么？
2.什么是RPC调用？
3.为什么要有Client层?

Spark的性能调优
http://www.aboutyun.com/thread-17118-1-1.html
1.spark如何增加CPU利用率？
2.partition是什么？
3.并行的executor的数量，有哪两种方式？

MapReduce过程、Spark和Hadoop以Shuffle为中心的对比分析
http://www.aboutyun.com/thread-17101-1-1.html
1.mapreduce过程如何解析？
2.Spark Shuffle过程如何解析？
3.hash-based 与sort-based的对比？

爬虫的常见陷阱以及Java的爬虫思路
http://www.aboutyun.com/thread-17100-1-1.html
1.网络爬虫的基本原理是什么？
2.什么是Jsoup？
3.爬虫的难点都有什么？

淘宝商品详情平台化思考与实践
http://www.aboutyun.com/thread-17079-1-1.html
1、什么是平台？
2、模块与平台的关系是什么？
3、设计平台应考虑哪些方面？

问答：
spark处理1亿行的数据耗时应为多少？
http://www.aboutyun.com/thread-17085-1-1.html

资源：
优酷：基于Spark的实时用户画像分析系统-汪飞-1027
http://www.aboutyun.com/thread-17087-1-1.html

Hive开发规范、最佳实践
http://www.aboutyun.com/thread-17103-1-1.html

hive开发资料
http://www.aboutyun.com/thread-17104-1-1.html

Spark入门之运行wordcount
http://www.aboutyun.com/thread-17088-1-1.html

Hadoop权威指南（中文版）
http://www.aboutyun.com/thread-17106-1-1.html

使用_Hive_构建数据库和数据仓库
http://www.aboutyun.com/thread-17119-1-1.html

Zookeeper可视化工具
http://www.aboutyun.com/thread-17131-1-1.html

云计算HIVE使用
http://www.aboutyun.com/thread-17120-1-1.html

GDB学习资料
http://www.aboutyun.com/thread-17072-1-1.html

基于MapReduce_的分布式光线跟踪的设计与实现
http://www.aboutyun.com/thread-17070-1-1.html

hadoop版本差异详解
http://www.aboutyun.com/thread-17069-1-1.html

问答：

Hiveserver2启动后，无法通过telnet验证端口是否启用？
http://www.aboutyun.com/thread-17059-1-1.html

2016年about云01月第03周经典帖子总结

总结2015之Spark篇：新生态系统的形成
http://www.aboutyun.com/thread-16974-1-1.html
1.为什么DataFrame比RDD在存储和计算上的效率更高？
2.Spark从API的角度看，可以分为哪两大类？
3.Spark支持的外部数据源有很多种，本文列举了哪些数据源？你知道哪些数据源？
4.spark在机器学习领域有哪些亮点？

2016年，数据、分析和机器学习趋势五大预测
http://www.aboutyun.com/thread-17006-1-1.html
1.你是如何预测大数据行业的？
2.本文是如何预测的？
3.你认为该如何根据行业来计划自己的2016？

献给初学者：谈谈如何学习Linux
http://www.aboutyun.com/thread-17037-1-1.html
1.linux有什么作用？为什么要学习linux?
2.linux在各领域的发展怎么样？
3.怎么样循序渐进学习linux?

使用 NoSQL 数据库提供云级别数据可伸缩性
http://www.aboutyun.com/thread-17019-1-1.html
1.Nosql数据库的设计原理是什么？
2.Hbase原理是什么样？
3.举例介绍MongoDB的运行原理怎么样？

HBase 数据导入功能实现方式解释
http://www.aboutyun.com/thread-17016-1-1.html
1.向hbase中导入数据分为几种方式？
2.Bulk load 怎么导入数据？
3.Sqoop怎么导入到Hbase数据库中？

对比Pig、Hive和SQL，浅看大数据工具之间的差异
http://www.aboutyun.com/thread-17005-1-1.html
1. 什么时候用Apache Pig?
2. 什么时候用Apache Hive?
3. 什么时候用SQL?

Apache HBase 2015年发展回顾与未来展望
http://www.aboutyun.com/thread-16984-1-1.html
1、HBase0.98 与 HBase1.0接口差异是什么？
2、HBase 与 HydraHBase 有何不同？

2016年大数据及其分析将影响深远
http://www.aboutyun.com/thread-16962-1-1.html

1.什么是实时大数据技术？

2.2016年大数据及其分析将有哪些影响？

3.实时大数据怎样打破传统商业模式？

Apache Spark 1.6 正式发布，做了哪些改变
http://www.aboutyun.com/thread-16973-1-1.html
1.spark1.6做了哪些改变？
2.性能提升做了哪些改进？
3.增加了哪些新的算法和功能？

资源：
R语言与金融大数据处理-视频与课件代码合集
http://www.aboutyun.com/thread-16993-1-1.html

Storm中文学习手册
http://www.aboutyun.com/thread-17007-1-1.html

机器学习与数据挖掘基础
http://www.aboutyun.com/thread-17042-1-1.html

数据挖掘十大算法及案例
http://www.aboutyun.com/thread-17041-1-1.html

Spark技术及应用
http://www.aboutyun.com/thread-17028-1-1.html

内存计算Spark
http://www.aboutyun.com/thread-17027-1-1.html

hadoop入门实战手册
http://www.aboutyun.com/thread-17008-1-1.html

数据分析指导
http://www.aboutyun.com/thread-16992-1-1.html

cloudera-quickstart安装使用总结
http://www.aboutyun.com/thread-16976-1-1.html

R导论【书籍】
http://www.aboutyun.com/thread-16975-1-1.html

问答：

想用scala程序操作spark时遇到了问题
http://www.aboutyun.com/thread-16980-1-1.html

2016年about云01月第02周经典帖子总结

程序员如何谋划出月薪 3 万
http://www.aboutyun.com/thread-16862-1-1.html

Spark算子：RDD键值转换操作(1)–partitionBy、mapValues、flatMapValues
http://www.aboutyun.com/thread-16919-1-1.html
1.spark中的partitionBy怎么理解？
2.spark中的mapValues怎么理解？
3.spark中的flatMapValues怎么理解？

Spark算子：统计RDD分区中的元素及数量
http://www.aboutyun.com/thread-16917-1-1.html
1.spark算子分区怎么理解？
2.怎么用代码去查找分区及分区中的数据？

快的打车架构实践
http://www.aboutyun.com/thread-16950-1-1.html

1.客户端与服务端通信会遇到哪些问题？

2.怎样基于Storm和HBase打造实时监控平台？

3.怎样对Web系统进行分布式改造？

Deep Learning（深度学习）系列：（五）卷积神经网络
http://www.aboutyun.com/thread-16877-1-1.html
1、什么是卷积神经网络？
2、什么是参数减少与权值共享？

真正的团队，必须要拍死这6大负能量
http://www.aboutyun.com/thread-16866-1-1.html

资源：

Hadoop原理介绍
http://www.aboutyun.com/thread-16867-1-1.html

新浪微博数据分析与微博营销案例
http://www.aboutyun.com/thread-16868-1-1.html

Zookeeper程序员指南《自译》
http://www.aboutyun.com/thread-16924-1-1.html

zookeeper文字稿
http://www.aboutyun.com/thread-16925-1-1.html

linux_kernel核心中文手册(内核图解)
http://www.aboutyun.com/thread-16910-1-1.html

linux的多线程编程的高效开发经验
http://www.aboutyun.com/thread-16909-1-1.html

问答：

nova各个服务的作用
http://www.aboutyun.com/thread-16928-1-1.html

2016年about云01月第01周经典帖子总结

程序员如何谋划出月薪 3 万
http://www.aboutyun.com/thread-16862-1-1.html

Spark算子：RDD键值转换操作(1)–partitionBy、mapValues、flatMapValues
http://www.aboutyun.com/thread-16919-1-1.html
1.spark中的partitionBy怎么理解？
2.spark中的mapValues怎么理解？
3.spark中的flatMapValues怎么理解？

Spark算子：统计RDD分区中的元素及数量
http://www.aboutyun.com/thread-16917-1-1.html
1.spark算子分区怎么理解？
2.怎么用代码去查找分区及分区中的数据？

快的打车架构实践
http://www.aboutyun.com/thread-16950-1-1.html

1.客户端与服务端通信会遇到哪些问题？

2.怎样基于Storm和HBase打造实时监控平台？

3.怎样对Web系统进行分布式改造？

Deep Learning（深度学习）系列：（五）卷积神经网络
http://www.aboutyun.com/thread-16877-1-1.html
1、什么是卷积神经网络？
2、什么是参数减少与权值共享？

真正的团队，必须要拍死这6大负能量
http://www.aboutyun.com/thread-16866-1-1.html

资源：

Hadoop原理介绍
http://www.aboutyun.com/thread-16867-1-1.html

新浪微博数据分析与微博营销案例
http://www.aboutyun.com/thread-16868-1-1.html

Zookeeper程序员指南《自译》
http://www.aboutyun.com/thread-16924-1-1.html

zookeeper文字稿
http://www.aboutyun.com/thread-16925-1-1.html

linux_kernel核心中文手册(内核图解)
http://www.aboutyun.com/thread-16910-1-1.html

linux的多线程编程的高效开发经验
http://www.aboutyun.com/thread-16909-1-1.html

问答：

nova各个服务的作用
http://www.aboutyun.com/thread-16928-1-1.html

2016年about云01月第01周经典帖子总结

HBase高可用原理与实践
http://www.aboutyun.com/thread-16782-1-1.html

程序员需要掌握的 6 项相关技能
http://www.aboutyun.com/thread-16806-1-1.html

Hadoop年度回顾与2016发展趋势
http://www.aboutyun.com/thread-16837-1-1.html
1.Hadoop在2015年发展怎么样？
2.2016年Hadoop的发展趋势怎么样？
3.Hadoop在2015年影响了哪些技术的发展？

如何设计你的2016年年度计划
http://www.aboutyun.com/thread-16838-1-1.html
1.如何设计2016年的年度计划？
2.SMART原则是什么意思？
3.如何制定学习计划？

Deep Learning（深度学习）系列：（三）训练过程、常用模型或方法
http://www.aboutyun.com/thread-16799-1-1.html
Deep Learning的基本思想是什么？
Deep learning训练过程都有哪些？
Deep Learning的常用模型或者方法是什么？

中文分词技术入门：概念介绍
http://www.aboutyun.com/thread-16791-1-1.html
1.为什么要进行中文分词？
2.中文分词技术本文是如何分类的？
3.中文分词有哪些常用方法？

资源：

让数据说话——销售数据分析方法
http://www.aboutyun.com/thread-16847-1-1.html

R在精算中的应用
http://www.aboutyun.com/thread-16805-1-1.html

R在社会网络分析中的一些应用
http://www.aboutyun.com/thread-16804-1-1.html

分享-《数据挖掘：你必须知道的32个经典案例》电子版
http://www.aboutyun.com/thread-16793-1-1.html

SPSS统计分析高级教程.张文彤
http://www.aboutyun.com/thread-16789-1-1.html

SPSS统计分析基础教程.张文彤
http://www.aboutyun.com/thread-16788-1-1.html