Jump to content

Data Quality


Konebhar6

Recommended Posts

1 minute ago, Konebhar6 said:

My Take away from Informatica and SnowFlake conferences - Having good quality data is of utmost importance for AI and ML to succeed. Data Quality and Observability are the Industry terms being used to refer to this practice. Most companies will open positions related to this in coming years. Good space to enter for people with Analytical skills, DB Skills, Analyst etc.

Informatica and Collibra seem to be the front runners with decent platforms. A lot of new companies with niche features around this have come up. Have to see who will grow into a good tool/company.

Not sure on quality meaning no company will get quality data on their plate,

will take an example of how fb with WhatsApp is using the data in most of the WhatsApp group 99 percent is junk data tye most useless data in the world, happy new year, happy birthday hi how are you, soke motivational forwards etc

But you type one message to your friend like stocks, day trading, Vegas, Hawaii etc they curate the meaning full 0.5% of data out of our 99.5% data and able to serve the relevant ad on the user , that high power companies are going to stay 

With some meaningful sentence you observe the ads on related stuff fb account or insta account , my point is you won't have quality data always we have to find the meaning from the junk data , cure it and use it

 

  • Upvote 1
Link to comment
Share on other sites

10 minutes ago, csrcsr said:

Not sure on quality meaning no company will get quality data on their plate,

will take an example of how fb with WhatsApp is using the data in most of the WhatsApp group 99 percent is junk data tye most useless data in the world, happy new year, happy birthday hi how are you, soke motivational forwards etc

But you type one message to your friend like stocks, day trading, Vegas, Hawaii etc they curate the meaning full 0.5% of data out of our 99.5% data and able to serve the relevant ad on the user , that high power companies are going to stay 

With some meaningful sentence you observe the ads on related stuff fb account or insta account , my point is you won't have quality data always we have to find the meaning from the junk data , cure it and use it

 

This role DataQuality/DataAnalyst will be responsible for doing that. Analyze data, Specify rules to the data sets, detect anamolies, and curate data for further processing.

For AI and ML to work properly with desired results reducing hallucinations - Quality data is needed. Pretty much every slide, every AI/ML topic covered this.

Link to comment
Share on other sites

4 minutes ago, Konebhar6 said:

This role DataQuality/DataAnalyst will be responsible for doing that. Analyze data, Specify rules to the data sets, detect anamolies, and curate data for further processing.

For AI and ML to work properly with desired results reducing hallucinations - Quality data is needed. Pretty much every slide, every AI/ML topic covered this.

ANna My point is if you don't have quality data you cannot run ai period, but no company will have hey this the quality data on table now you do ML models on it , that way way complex that's where big tech, trading , financial  companies are winning and small companies are still struggling every small company has ai ml team recruited high paying data scientists from reputed uni buy nothing came out for most of them as of now it's still evolving

Link to comment
Share on other sites

4 minutes ago, csrcsr said:

ANna My point is if you don't have quality data you cannot run ai period, but no company will have hey this the quality data on table now you do ML models on it , that way way complex that's where big tech, trading , financial  companies are winning and small companies are still struggling every small company has ai ml team recruited high paying data scientists from reputed uni buy nothing came out for most of them as of now it's still evolving

Need Quality data to run AI - Yes. I am saying same thing.

No Company will have quality data - Yes. I am saying the same thing.

Point is to get from bad data to quality data. A lot of tools/companies emerging in this space. This is the space I am talking about. Its not just that, maintaining data lineage from Source to destination systems and understanding dependencies and how data is changing with its impact. 

Link to comment
Share on other sites

33 minutes ago, Konebhar6 said:

This role DataQuality/DataAnalyst will be responsible for doing that. Analyze data, Specify rules to the data sets, detect anamolies, and curate data for further processing.

For AI and ML to work properly with desired results reducing hallucinations - Quality data is needed. Pretty much every slide, every AI/ML topic covered this.

100% Agree

 

In my current role,  i work on services that ingest unstructured data -> use Ml models to enrich the data and convert it to as good a structure that can be queried-> use our custom query parsers and ranking algos to retrieve data 

 

Not using an LLM until now but pivoting now 

 

Your Point about companies focusing on data quality improvements is true and it has been ongoing since few years 

 

 

  • Like 1
  • Upvote 1
Link to comment
Share on other sites

26 minutes ago, csrcsr said:

ANna My point is if you don't have quality data you cannot run ai period, but no company will have hey this the quality data on table now you do ML models on it , that way way complex that's where big tech, trading , financial  companies are winning and small companies are still struggling every small company has ai ml team recruited high paying data scientists from reputed uni buy nothing came out for most of them as of now it's still evolving

I think AI will be able to filter out the junk data from the real data this cleaning up before loading…  instead of ELT, they will use ETL (clean up data as part of transformation)

google photos, WhatsApp can detect duplicate photos and videos and have the same object reference used across multiple users instead of creating copies.. 

Link to comment
Share on other sites

12 minutes ago, Konebhar6 said:

Need Quality data to run AI - Yes. I am saying same thing.

No Company will have quality data - Yes. I am saying the same thing.

Point is to get from bad data to quality data. A lot of tools/companies emerging in this space. This is the space I am talking about. Its not just that, maintaining data lineage from Source to destination systems and understanding dependencies and how data is changing with its impact. 

true anna , but to have quality data we still need quality applications to get the user base , the companies are only running after AI AI in FOMO , forgetting their core products , big companies are still doing great with their core products and utilizing that data for AI , your core product it self is ok ok in some apps like for example webull they are investing in AI instead of making the applications more modern robbinhood and they are using the user base without DAU MAU you wont get the quality , AI / ML should be one of the teams its not the only thing for your company ani naa yokka idi

Link to comment
Share on other sites

6 minutes ago, tollywood_hater said:

Mee idharidhi CEO level bhayaa , eee db lo ila migilipoyaru 

emo...bayata companies lo CEOs emo :D

Link to comment
Share on other sites

Just now, Thokkalee said:

I think AI will be able to filter out the junk data from the real data this cleaning up before loading…  instead of ELT, they will use ETL (clean up data as part of transformation)

google photos, WhatsApp can detect duplicate photos and videos and have the same object reference used across multiple users instead of creating copies.. 

yaa they do lot of crap , for example mee ring camera lo if your kid wears Steph curry shirt or shoe it can scan and they are serving the ad on your amazon account that much depth they are doing, if you have yosemite vernal falls hike photos in google account they are throwing a hiking stick ad its gettingf more and more powerful earlier the ad used to come after 3 days etc because its used to be batchify kind of stuff now a days with power ful gpus and real time ads are coming in seconds , i am just making ads as example but imagine thse powerful tools coming to trading , penny stocks etc 

Link to comment
Share on other sites

17 minutes ago, Konebhar6 said:

Point is to get from bad data to quality data. A lot of tools/companies emerging in this space. This is the space I am talking about. Its not just that, maintaining data lineage from Source to destination systems and understanding dependencies and how data is changing with its impact.

Bro, do you know any such tech/companies at the conferences? 

We are looking at a few vendors for our data clean up but trust me, most vendors are just bluffing and we can see that in their presentations. Unfortunately, our data is so bad that we don't trust our own generated data and have been looking at alternate solutions for data cleanup tools or firms. Internally we tried our best but at best we can say its 80% clean. We are unable to leverage our data for the future at this stage...

Link to comment
Share on other sites

4 minutes ago, vetrivel said:

100% Agree

 

In my current role,  i work on services that ingest unstructured data -> use Ml models to enrich the data and convert it to as good a structure that can be queried-> use our custom query parsers and ranking algos to retrieve data 

 

Not using an LLM until now but pivoting now 

 

Your Point about companies focusing on data quality improvements is true and it has been ongoing since few years 

 

 

thats cool bro you are working on latest tech keep focussing on that, if possible when you have time can you have road map for learning ML AI tools for backend / front end develoeprs not like core math and scientist level but where we can conribute

Link to comment
Share on other sites

1 minute ago, csrcsr said:

true anna , but to have quality data we still need quality applications to get the user base , the companies are only running after AI AI in FOMO , forgetting their core products , big companies are still doing great with their core products and utilizing that data for AI , your core product it self is ok ok in some apps like for example webull they are investing in AI instead of making the applications more modern robbinhood and they are using the user base without DAU MAU you wont get the quality , AI / ML should be one of the teams its not the only thing for your company ani naa yokka idi

Entha confusion unte anta better kada jobs ki :D Every new practice starts this way .. confusion ... Salim Pheku type projections ... FOMO by others ... We all saw what chat bots could do - nothing except frustrate users. They are getting better but still not there yet.  

In my opinion so far the best AI I saw was Digital Assistant ... Amazing notes by Microsoft Co-Pilot. We are very primitive right now ... but these technologies will emerge ... 

I work in pharma industry ... there are plenty of value addition AI/ML could do - Help business take decisions based on patterns found in data. 

  • Upvote 1
Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...