Jump to content

How Does Large Sets Of Data Can Be Efficiently Retrieved


ahimsavaadhi1

Recommended Posts

  • Replies 48
  • Created
  • Last Reply

Top Posters In This Topic

  • ahimsavaadhi1

    14

  • chalkpiece

    5

  • xxxmen

    4

  • andhravodu

    4

Popular Days

Top Posters In This Topic

Each file around 10 mb to 500mb . And generating of those files is very frequently

My current work churns out almost 40gigs per month data

And we need to keep it mandatorily for 10 years .

And sometimes we need to retrieve that data ... Which is cumbersome ...

Ila cheyyii...

Aaa files ni server rootlo pettandi...aa path ni db lo configure chesukooo....ne proj correct ga teliyadhu but excel files ..word docs..pdfs ilaaa user up@oad or our own files app server level lo store chesi...aa path db lo maintain....retrieve criteria ni batti path read chesi..url build chesi user ki link ivvu..when user clicks the link...vile will open from server location.......when I say file system in server...it is app server..not db server
Link to comment
Share on other sites

Cassandra ?? Or Couch ? Or Mongo ?

will Database automatically split those files and store them and when I want to retrieve will it automatically fetches it ?

Mongo grid fs

If u really interested to have the contents of file searched and matched with search term use elastic search database its free
Link to comment
Share on other sites

Mongo grid fs

If u really interested to have the contents of file searched and matched with search term use elastic search database its free


Not the contents but the file itself for retrieval .

We use ELK stack but it doesn't fit for the solution I'm looking for
Link to comment
Share on other sites

Each file around 10 mb to 500mb . And generating of those files is very frequently

My current work churns out almost 40gigs per month data

And we need to keep it mandatorily for 10 years .

And sometimes we need to retrieve that data ... Which is cumbersome ...

 

 

nee reqs lo, 40gb a month = 500gb a year, 5 tb for 10 years or <150gb for your 3 months of data. any rdbms can easily handle those numbers. bigger question is neeku vachedi structured or unstructured data? any xml?

 

structured data ayite you can use any commercial rdbms. even yahoo, facebook kooda mysql vadutaru man. Oracle, mysql handle xml also with very little effort

 

you have to explore big data world if your data is unstructured

Link to comment
Share on other sites

nee reqs lo, 40gb a month = 500gb a year, 5 tb for 10 years or <150gb for your 3 months of data. any rdbms can easily handle those numbers. bigger question is neeku vachedi structured or unstructured data? any xml?

structured data ayite you can use any commercial rdbms. even yahoo, facebook kooda mysql vadutaru man. Oracle, mysql handle xml also with very little effort

you have to explore big data world if your data is unstructured


It's unstructured data . Files and proprietary file formats .
Link to comment
Share on other sites

It's unstructured data . Files and proprietary file formats .

 

 

if there is any chance, neeku vache unstructured data ni you can parse into structured data, then rdbms. otherwise, jai mongodb

 

out of curiosity, current manual setup ela handle chestundi? shell script calls or python/perl vadutunnara?

Link to comment
Share on other sites

if there is any chance, neeku vache unstructured data ni you can parse into structured data, then rdbms. otherwise, jai mongodb

out of curiosity, current manual setup ela handle chestundi? shell script calls or python/perl vadutunnara?


One of the project,It's a proprietary software that generates chunks of data in DLF format and sent to imaging .

However the requirement is sometimes the DLF file needs to retrieved and sent for editing if there are issues or change in properties .

Everyday on average 5k files get generated each varying between 1000kb to 18Mb.

So when there is a need the developers come back for the file and I end up copying those files as they don't have access prod file system.

It was so frequent That my tasks are getting hampered and end up doing repetitive tasks manually.

So I want to build a webapp around the files and store those files in the database and can be presented for retrieval with appropriate audit and permissions.

You must be asking why I need the database to store files,
I have other requirement where I need to use files for versioning and need to be retrieved when ever its required.
Link to comment
Share on other sites

One of the project,It's a proprietary software that generates chunks of data in DLF format and sent to imaging .

However the requirement is sometimes the DLF file needs to retrieved and sent for editing if there are issues or change in properties .

Everyday on average 5k files get generated each varying between 1000kb to 18Mb.

So when there is a need the developers come back for the file and I end up copying those files as they don't have access prod file system.

It was so frequent That my tasks are getting hampered and end up doing repetitive tasks manually.

So I want to build a webapp around the files and store those files in the database and can be presented for retrieval with appropriate audit and permissions.

You must be asking why I need the database to store files,
I have other requirement where I need to use files for versioning and need to be retrieved when ever its required.

 

If I were you, I would create a shared mount point with maybe 1tb of storage to start and use a shell script to copy all files into that mount, Ex: find command mtime option. If each run of your software creates files that can be identified by day, just wildcard matching will be fine. Run the shell and you're all clear, no need to copy manually. give devs read access to the file system and ask them to copy from there

Link to comment
Share on other sites

cassandra .. big data

maa company lo using ani thelusu .. no working knowledge on it .. so intha kanna no knowledge as of now ..


Mee office lo cassandra ni using aa... Baguntunda ?
Link to comment
Share on other sites

×
×
  • Create New...