vendettaa Posted February 24, 2018 Author Report Share Posted February 24, 2018 23 minutes ago, NPReddy said: Not sure buddy. Since you asked for spark, i just tried to help you. How do u write hive queries? In shell? I think, You can create some functions to performa this. Since you are dealing with billions of records, i do not think it is recommended. Wait for experts. Udf but don't want to go for it Quote Link to comment Share on other sites More sharing options...
vendettaa Posted February 24, 2018 Author Report Share Posted February 24, 2018 2 hours ago, NPReddy said: Im not an expert but I think you can use except data frame api to perform this. put table 1 data into data frame 1 table 2 data into another data frame 2. dataframe1.select(keyColumn).except.dataframe2.select(keycolumn) you will get data from dataframe 1 which is not present in df2. May not be a perfect answer but you can change it according to your use case. @NPReddy thank you so much man except is helpful Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.