Help -spark

September 21, 2017

created intial RDD in spark shell by importing data from traditional database to HDFS and processing data using scala on spark

sample:

column 1 column2 column3

1 10.00 2

2 30.00 2

3 16.96 3

4 18.06 3

so have to create single line RDD only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as max(column2),count(column1),avg(column2),min(column2) based on column 3 (like groupby colum3 in sql),

want Single line code using aggregateByKey in scala for spark

unable to get the aggregateBykey how it works in this case , got the solution using dataframe API and spark SQL

google anakandi , i didnt find solution and felt kind of tricky

aggregateByKey has four values so the intial values are (0.0F,0.0F,0.0F,0.0F)

😔

if any one can explain will post the complete question in pm

September 21, 2017

Google lo kuda dorakanidi db lo dorukutundani aasapaddav chudu.. Super ahey nuvvu yourock

September 21, 2017

5 hours ago, vendetta said:

created intial RDD in spark shell by importing data from traditional database to HDFS and processing data using scala on spark

sample: table

column 1 column2 column3

1 10.00 2

2 30.00 2

3 16.96 3

4 18.06 3

so have to create single line RDD only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as max(column2),count(column1),avg(column2),min(column2) based on column 3 (like groupby colum3 in sql),

want Single line code using aggregateByKey in scala for spark

unable to get the aggregateBykey how it works in this case , got the solution using dataframe API and spark SQL

google anakandi , i didnt find solution and felt kind of tricky

aggregateByKey has four values so the intial values are (0.0F,0.0F,0.0F,0.0F)

😔

You can try with Spark with python? why only scala?

September 21, 2017

7 hours ago, vendetta said:

created intial RDD in spark shell by importing data from traditional database to HDFS and processing data using scala on spark

sample: table

column 1 column2 column3

1 10.00 2

2 30.00 2

3 16.96 3

4 18.06 3

so have to create single line RDD only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as max(column2),count(column1),avg(column2),min(column2) based on column 3 (like groupby colum3 in sql),

want Single line code using aggregateByKey in scala for spark

unable to get the aggregateBykey how it works in this case , got the solution using dataframe API and spark SQL

google anakandi , i didnt find solution and felt kind of tricky

aggregateByKey has four values so the intial values are (0.0F,0.0F,0.0F,0.0F)

😔

akka aggregateByKey can only be used in paired RDD.....i cant understand what exactly u r tring to do here, and why are you tring to use aggregateByKey only????

September 21, 2017

1 hour ago, former said:

You can try with Spark with python? why only scala?

Python Scala R Java

September 21, 2017

41 minutes ago, kasi said:

akka aggregateByKey can only be used in paired RDD.....i cant understand what exactly u r tring to do here, and why are you tring to use aggregateByKey only????

Uncle Convert chey ichina data ni key,value pair ga and get the output using aggregateByKey .

pai data already rdd ga convert chesinde sample part vesa ala table columns laga orginal data contains more columns and TB of records

if you can explain pm me

September 21, 2017

@kasi why only aggregatebykey antunav

already post chesa getting solution was easy using spark dataframe API and spark sql ani

aggregatebykey just learning kosam , case class use cheykunda alternative ante LOC taggutundi ane ante

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false);

Result.collect().foreach(println);

ikda accumulator and combiner working ardam katledu

Ade solution and Naku explanation kavali ante ,the guy who gave me the solution didn't explain vere works lo busy undi and don't want to disturb him

teliste cheppu well and good lekapoyna ok

September 21, 2017

2 hours ago, former said:

You can try with Spark with python? why only scala?

scala lone kavali that to only using aggregateByKey

September 21, 2017

4 hours ago, JavaBava said:

Google lo kuda dorakanidi db lo dorukutundani aasapaddav chudu.. Super ahey nuvvu

Late night post chesa pichi pichiga anpinchi anta flow lo veltunde ikda stuck ayya ,ekda ardam katledu so post chesa forums lo

y not here ikda e db lo kuda unnaru chala mandi Naku telsi and last time evaro sparkscala training for free ani classes conduct cheste manollu chalamandi attend kudA ayyaru ,so asha ante

But ipdu anpistundi enduku post chesana ikda ani ,vachina rakapoyna sarcasm exhibit chestaru

ayna Ipdu dini gurinchi teliyakapoyna ok

next week cheppetollu free ga untaru

i know how to get using reducebykey and groupbykey but felt aggregatebykey tricky

idanta part of learning ante

September 21, 2017

16 minutes ago, vendetta said:

scala lone kavali that to only using aggregateByKey

1

Idi scala code:

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false);

Result.collect().foreach(println);

You are trying to understnd what is going with above Result variable ?

September 21, 2017

3 minutes ago, former said:

Idi scala code:

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false);

Result.collect().foreach(println);

You are trying to understnd what is going with above Result variable ?

That Accumulator and combiner part man

September 21, 2017

teliste pm me bye for now

evening post chesta malli

September 21, 2017

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).

what this is doing it taking in each row from resultDF and taking a(1) and a(2) and creating it as paired RDD
--> ouput will be like (a(1), a(2))

aggregateByKey((0.0,0.0,0,9999999999999.0))
now you are using this pairedRDD to create a list of 4 values, and you are initializing this in the above step

i changed some code below, i dont think your version will work
       ((a,b)=>((math.max(a._1,b),a._2+1),(a._3+y,math.min(a._4,y))),
       now you are passing this pairedRDD and creating another PairedRDD with aggregation, (this aggregating is self explanatory )

       ((a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4)))).sortBy(_._1, false);
       now you take the output of the above paired RDD and you are creating a list of 4 values using aggregations

September 22, 2017

8 hours ago, kasi said:

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).

what this is doing it taking in each row from resultDF and taking a(1) and a(2) and creating it as paired RDD
--> ouput will be like (a(1), a(2))

aggregateByKey((0.0,0.0,0,9999999999999.0))
now you are using this pairedRDD to create a list of 4 values, and you are initializing this in the above step

i changed some code below, i dont think your version will work
       ((a,b)=>((math.max(a._1,b),a._2+1),(a._3+y,math.min(a._4,y))),
       now you are passing this pairedRDD and creating another PairedRDD with aggregation, (this aggregating is self explanatory )

       ((a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4)))).sortBy(_._1, false);
       now you take the output of the above paired RDD and you are creating a list of 4 values using aggregations

@kasi

a._1 b._1 a._2 b._2 etc enti bhayya ??

aggregatebyKey lo intilaize cheyatam ante expecting output data type reference values ni pass cheyyatama ?

September 22, 2017

12 hours ago, former said:

@kasi

a._1 b._1 a._2 b._2 etc enti bhayya ??

aggregatebyKey lo intilaize cheyatam ante expecting output data type reference values ni pass cheyyatama ?

((val1, val2), (val3, val4) )

a._1 - val1

a._2 - val2

b._1 - val3

b._2 - val4

Sign In

Help -spark

Recommended Posts

vendetta

Link to comment

Share on other sites

JavaBava

Link to comment

Share on other sites

former

Link to comment

Share on other sites

kasi

Link to comment

Share on other sites

kasi

Link to comment

Share on other sites

vendetta

Link to comment

Share on other sites

vendetta

Link to comment

Share on other sites

vendetta

Link to comment

Share on other sites

vendetta

Link to comment

Share on other sites

former

Link to comment

Share on other sites

vendetta

Link to comment

Share on other sites

vendetta

Link to comment

Share on other sites

kasi

Link to comment

Share on other sites

former

Link to comment

Share on other sites

kasi

Link to comment

Share on other sites

Join the conversation

Tell a friend

Most viewed in last 30 days

Activity