分享

求助,这个dataframe的scala代码怎么写

aurae 发表于 2016-6-6 15:54:17 [显示全部楼层] 回帖奖励 阅读模式 关闭右栏 2 6768
本帖最后由 aurae 于 2016-6-6 15:56 编辑

scala> val rescaledData = idfModel.transform(featurizedData)
rescaledData: org.apache.spark.sql.DataFrame = [category: string, text: string, words: array<string>, rawFeatures: vector, features: vector]

其中一条记录是,执行查看rescaledData.select($"category", $"words", $"features").take(10).foreach(println)

[7725315424,WrappedArray(100, 花果茶, 罐装, 果味, 果粒茶, 水果茶, 花茶, 花草茶, 洛神花茶, 150g, 抱抱, 芝士, 奶茶, 袋装, 奶精, 速溶, 饮料, 奶茶粉, 冲饮, 牛轧糖, 花生, 牛扎糖, 抹茶, 风味, 纯手工, 台湾进口, 特产零食, 芒果, 来啦, 芒果, 奶茶, 袋装, 速溶, 奶茶饮料, 果味, 奶茶粉, 早餐, 冲饮, 套餐, 组合, 赠送, 专用, 杯子, 单拍, 发货, 杯子, 赠送, 实物, 为主, 煌上煌, 香辣, 牛肉片, 40gx10, 真空包装, 休闲零食小吃, 江西特产),(200,[1,11,12,16,17,18,20,25,27,33,46,48,53,63,74,77,80,81,83,87,96,99,102,110,122,124,130,134,142,143,144,150,151,152,157,163,165,166,174,179,185,189,193,195],[1.6174978294112006,1.4477794447705714,3.454728613979466,1.4429132551193986,2.856910343888337,1.9739284831584842,1.1986353917312151,3.313088207389643,1.3817176940924787,0.9014925107752052,0.8902564375082793,1.3240520522426698,1.8291470012996753,2.9853495293569248,1.400151395781317,1.217829838987362,1.1723674649106048,0.6727151417427395,1.6956156086751528,2.9853495293569248,0.9043213669756829,1.4094972581995544,1.2413603363975563,1.3862943611198906,1.6003054288708278,1.1649873576129823,1.1446214148333431,1.5667827368321843,1.5029312648456514,1.217829838987362,1.3591433720539399,1.114797613089127,1.2859924904183246,3.2006108577416557,1.3955110162248148,1.3817176940924787,1.2573606777439974,2.965044786428889,1.5029312648456514,1.4575834448671923,1.8895786330570585,1.3546890217045595,3.2936508890114413,1.5834036180682245])]

我想取每个category对应的top10的features,及其对应的words。
请大神教我,谢谢

已有(2)人评论

跳转到指定楼层
nextuser 发表于 2016-6-6 18:29:06
有点复杂,不过使用上group by 和order by应该可以了。
也就是分组和排序
回复

使用道具 举报

nextuser 发表于 2016-6-6 18:37:49
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

关闭

推荐上一条 /2 下一条