Jump to content

Big data guys randi vaaa....


Sarvapindi

Recommended Posts

14 minutes ago, SharkTank said:

Real time  depends minimum 500 line of code . Simple practise wise ayte 30 lines code ni practise kosam. Datasets chala untay . Go to kaggle and download a dataset. Install scala spark and also intellij .Start practicing 

500 aa...inka ainatte 

Link to comment
Share on other sites


1)  used to create the folder on HDFS
hadoop fs -mkdir//////*****
2) starting Sparkshell
spark-shell --master yarn
3) importing Packages
 import org.apache.spark.sql.functions.{expr, col, column}
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)

4) removing the columns  from the data set

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("
fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="")  && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col("
date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))
5) Removing null values from the data set


val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo
tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i
sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull))


cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show


cars_nullaf.createOrReplaceTempView("cars")

val sqlDF = spark.sql("SELECT * FROM cars")

val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars")

Total_number_columns.show()


scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker,  COUNT(model) FROM cars Group by maker
,model")

scala> Number_of_models_by_manufactuer.show()

al Number_of_models_by_manufactuer = spark.sql("SELECT  maker,model,  COUNT(model) as top_car_models_sold F
ROM cars Group by maker,model" )

Number_of_models_by_manufactuer.show()

 val Type_of_Tranmissions_sold = spark.sql("SELECT  transmission, COUNT( transmission)  FROM cars Group by tr
ansmission")

Type_of_Tranmissions_sold.show()


val Type_of_Car_sold = spark.sql("SELECT   maker,transmission, count(*)  FROM cars Group by transmission, ma
ker")

Type_of_Car_sold.show()

val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur)  FROM cars  GROUP BY maker ,mode
l")
AVG_Price_car_by_model.show()


 

Link to comment
Share on other sites

22 minutes ago, Sarvapindi said:

500 aa...inka ainatte 

intrest unte edaina easy avtadi bro .......starting epdu kastamgane anpistadi when you get into it and do some hardwork you will start enjoying it. Hard work must . Practise practise coding . First spark practise cheyi bro  , adi simple untundi then slowly application building oops concepts must ...then ala chala chala nerchukuntu velpotaru 

Edited by SharkTank
Link to comment
Share on other sites

11 minutes ago, kevinUsa said:


1)  used to create the folder on HDFS
hadoop fs -mkdir//////*****
2) starting Sparkshell
spark-shell --master yarn
3) importing Packages
 import org.apache.spark.sql.functions.{expr, col, column}
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)

4) removing the columns  from the data set

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="") && (col("color_slug") !== "") && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("
fuel_type") !== "") && (col("date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))

val af_temp = af.where((col("maker") !== "") && (col("model") !== "") && (col("mileage") !== 0) && (col("manufacture_year")!== 0) && (col("engine_displacement") !== 0) && (col("engine_power
") !== 0) && (col("body_type") !=="")  && (col("stk_year") !== 0) && (col("transmission") !== "") && (col("door_count") !== "") && (col("seat_count") !== "") && (col("fuel_type") !== "") && (col("
date_created") !== "") && (col("date_last_seen") !== "") && (col("price_eur") !== 0))
5) Removing null values from the data set


val cars_nullaf = af_temp.where((col("maker").isNotNull) && (col("model").isNotNull) && (col("mileage").isNotNull) && (col("manufacture_year").isNotNull) && (col("engine_displacement").isNo
tNull) && (col("engine_power").isNotNull) && (col("stk_year").isNotNull) && (col("transmission").isNotNull) && (col("door_count").isNotNull) && (col("seat_count").isNotNull) && (col("fuel_type").i
sNotNull) && (col("date_created").isNotNull) && (col("date_last_seen").isNotNull) && (col("price_eur").isNotNull))


cars_nullaf.select("maker","model","engine_power","transmission","fuel_type").orderBy(desc ("engine_power")).show


cars_nullaf.createOrReplaceTempView("cars")

val sqlDF = spark.sql("SELECT * FROM cars")

val Total_number_columns = spark.sql("SELECT COUNT(*) FROM cars")

Total_number_columns.show()


scala> val Number_of_models_by_manufactuer = spark.sql("SELECT model, maker,  COUNT(model) FROM cars Group by maker
,model")

scala> Number_of_models_by_manufactuer.show()

al Number_of_models_by_manufactuer = spark.sql("SELECT  maker,model,  COUNT(model) as top_car_models_sold F
ROM cars Group by maker,model" )

Number_of_models_by_manufactuer.show()

 val Type_of_Tranmissions_sold = spark.sql("SELECT  transmission, COUNT( transmission)  FROM cars Group by tr
ansmission")

Type_of_Tranmissions_sold.show()


val Type_of_Car_sold = spark.sql("SELECT   maker,transmission, count(*)  FROM cars Group by transmission, ma
ker")

Type_of_Car_sold.show()

val AVG_Price_car_by_model = spark.sql("SELECT maker, model, AVG(price_eur)  FROM cars  GROUP BY maker ,mode
l")
AVG_Price_car_by_model.show()


 

try to implement everything using dataframes single shot . Spark sql usage enta limit cheste good, coz view occupies space. Use spark 2 . Ipdu indulo emundi . Error post chey . As i said give 

spark.conf.set("spark.sql.shuffle.partitions",2000) on your sparkscala console . This should fix.


 
spark.conf.set("spark.sql.shuffle.partitions",100)
Link to comment
Share on other sites

2 minutes ago, SharkTank said:

try to implement everything using dataframes single shot . Spark sql usage enta limit cheste good, coz view occupies space. Use spark 2 . Ipdu indulo emundi . Error post chey . As i said give 

spark.conf.set("spark.sql.shuffle.partitions"2000) on your sparkscala console . This should fix.



 

spark.conf.set("spark.sql.shuffle.partitions",100)

learning bro adi

em anna doubts unte pm chesta e m anna manchi maternial unte pm cheyi 

or ikkada post cheyi 

btw is my approach correct ?

i used   GCP 

for this 

 

Link to comment
Share on other sites

27 minutes ago, Killer66 said:

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

last ki ade aithadi

Link to comment
Share on other sites

38 minutes ago, Killer66 said:

Bhaiya big daya lite tesuko vere vi vetuko, . Coding rakapothey big data lo vundadam kastam 😐

rate kuda super em lev..current situation la reporting tools ki enthundo antey istunnar nee yavva...ala aithe waste inka..if we r sufer thopu we can demand otherwise waste e

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...