Jump to content

Help -spark


vendetta

Recommended Posts

created intial  RDD in spark shell  by importing data from traditional database to HDFS and  processing data using scala on spark 

sample:   

column 1    column2 column3

1                    10.00      2

2                     30.00     2

3                     16.96     3

4                      18.06    3

so have to create single line RDD  only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as  max(column2),count(column1),avg(column2),min(column2)  based on column 3  (like groupby colum3 in sql), 

want Single line code  using aggregateByKey in scala for spark

 

 

 

unable to get  the aggregateBykey how it works  in this case , got the solution using dataframe API and spark SQL 

google anakandi , i didnt find solution and felt kind of tricky 

aggregateByKey has  four values so the intial values are (0.0F,0.0F,0.0F,0.0F) 

😔

if any one can explain will post the complete question in pm



 

Link to comment
Share on other sites

5 hours ago, vendetta said:

created intial  RDD in spark shell  by importing data from traditional database to HDFS and  processing data using scala on spark 

sample:  table  

column 1    column2 column3

1                    10.00      2

2                     30.00     2

3                     16.96     3

4                      18.06    3

so have to create single line RDD  only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as  max(column2),count(column1),avg(column2),min(column2)  based on column 3  (like groupby colum3 in sql), 

want Single line code  using aggregateByKey in scala for spark

 

 

unable to get  the aggregateBykey how it works  in this case , got the solution using dataframe API and spark SQL 

google anakandi , i didnt find solution and felt kind of tricky 

aggregateByKey has  four values so the intial values are (0.0F,0.0F,0.0F,0.0F) 

😔



 

You can try with Spark with python? why only scala?

Link to comment
Share on other sites

7 hours ago, vendetta said:

created intial  RDD in spark shell  by importing data from traditional database to HDFS and  processing data using scala on spark 

sample:  table  

column 1    column2 column3

1                    10.00      2

2                     30.00     2

3                     16.96     3

4                      18.06    3

so have to create single line RDD  only by using aggregateByKey in scala(no spark sql , no dataframes API)and should give result as  max(column2),count(column1),avg(column2),min(column2)  based on column 3  (like groupby colum3 in sql), 

want Single line code  using aggregateByKey in scala for spark

 

 

unable to get  the aggregateBykey how it works  in this case , got the solution using dataframe API and spark SQL 

google anakandi , i didnt find solution and felt kind of tricky 

aggregateByKey has  four values so the intial values are (0.0F,0.0F,0.0F,0.0F) 

😔



 

akka aggregateByKey can only be used in paired RDD.....i cant understand what exactly u r tring to do here, and why are you tring to use aggregateByKey only????

 

Link to comment
Share on other sites

41 minutes ago, kasi said:

akka aggregateByKey can only be used in paired RDD.....i cant understand what exactly u r tring to do here, and why are you tring to use aggregateByKey only????

 

Uncle Convert chey ichina data ni key,value pair ga and get the output using aggregateByKey .

pai data already rdd ga convert chesinde sample part vesa  ala table columns laga orginal data contains more columns and TB of records

if you can explain pm me 

 

 

Link to comment
Share on other sites

@kasi why only aggregatebykey antunav 

already post chesa  getting solution was easy using spark dataframe API and spark sql ani 

aggregatebykey just learning kosam ,  case class use cheykunda alternative ante LOC taggutundi ane ante 

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false);

Result.collect().foreach(println);

ikda accumulator and combiner working ardam katledu 

Ade solution and Naku explanation kavali ante ,the guy who gave me the solution didn't explain vere works lo busy undi and don't want to disturb him 

teliste cheppu well and good lekapoyna ok 

Link to comment
Share on other sites

4 hours ago, JavaBava said:

Google lo kuda dorakanidi db lo dorukutundani aasapaddav chudu..  Super ahey nuvvu yourock

Late night post chesa pichi pichiga anpinchi anta flow lo veltunde ikda stuck ayya ,ekda ardam katledu so post chesa forums lo 

y not here ikda e db lo kuda unnaru chala mandi Naku telsi and last time evaro sparkscala training for free ani classes conduct cheste manollu chalamandi attend kudA ayyaru ,so asha  ante

But ipdu anpistundi enduku post chesana ikda ani ,vachina rakapoyna sarcasm exhibit chestaru 

ayna Ipdu dini gurinchi teliyakapoyna ok

next week cheppetollu free ga untaru

i know how to get using reducebykey and groupbykey but felt aggregatebykey tricky 

idanta part of learning ante 

Link to comment
Share on other sites

16 minutes ago, vendetta said:

scala lone kavali that to only using aggregateByKey

 

1

Idi scala code:

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false);

Result.collect().foreach(println);

You are trying to understnd what is going with above Result variable ?

Link to comment
Share on other sites

3 minutes ago, former said:

Idi scala code:

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).aggregateByKey((0.0,0.0,0,9999999999999.0))((a,b)=>(math.max(a._1,b),a._2+1,a._3+y,math.min(a._4,y)),(a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4))).sortBy(_._1, false);

Result.collect().foreach(println);

You are trying to understnd what is going with above Result variable ?

That Accumulator and combiner part man 

Link to comment
Share on other sites

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).

what this is doing it taking in each row from resultDF and taking a(1) and a(2) and creating it as paired RDD 
--> ouput will be like (a(1), a(2))

aggregateByKey((0.0,0.0,0,9999999999999.0))
now you are using this pairedRDD to create a list of 4 values, and you are initializing this in the above step

 

i changed some code below, i dont think your version will work
        ((a,b)=>((math.max(a._1,b),a._2+1),(a._3+y,math.min(a._4,y))),   
        now you are passing this pairedRDD and creating another PairedRDD with aggregation, (this aggregating is self explanatory )
        
        ((a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4)))).sortBy(_._1, false);
        now you take the output  of the above paired RDD and you are creating a list of 4 values using aggregations 

Link to comment
Share on other sites

8 hours ago, kasi said:

var Result = resultDF.map(a=>(a(1).toInt,a(2).toDouble)).

what this is doing it taking in each row from resultDF and taking a(1) and a(2) and creating it as paired RDD 
--> ouput will be like (a(1), a(2))

aggregateByKey((0.0,0.0,0,9999999999999.0))
now you are using this pairedRDD to create a list of 4 values, and you are initializing this in the above step

 

i changed some code below, i dont think your version will work
        ((a,b)=>((math.max(a._1,b),a._2+1),(a._3+y,math.min(a._4,y))),   
        now you are passing this pairedRDD and creating another PairedRDD with aggregation, (this aggregating is self explanatory )
        
        ((a,b)=>(math.max(a._1,b._1),a._2+b._2,a._3+b._3,math.min(a._4,b._4)))).sortBy(_._1, false);
        now you take the output  of the above paired RDD and you are creating a list of 4 values using aggregations 

 

@kasi

a._1 b._1 a._2 b._2 etc enti bhayya ?? 

aggregatebyKey lo intilaize cheyatam ante expecting output data type reference values ni pass cheyyatama ?

Link to comment
Share on other sites

12 hours ago, former said:

@kasi

a._1 b._1 a._2 b._2 etc enti bhayya ?? 

aggregatebyKey lo intilaize cheyatam ante expecting output data type reference values ni pass cheyyatama ?

((val1, val2), (val3, val4) )

a._1 - val1

a._2 - val2

b._1 - val3

b._2 - val4

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...