Jump to content

For data science/scientist obsessed uncles.


Tellugodu

Recommended Posts

A Letter to Those Seeking to Become a Data Scientist

I tried to shed a light on the industry through my own story.
 

I write this letter to shed a light on this industry as much as I can. This may help some of you find answers to questions like “Can I switch to this industry if I have no math background?”, “Do I need to be innovative to solve an industry problem?”, or “Do my current skill sets add value to a data science project?”. I shared my story to let you know where I come from along with the status quo of the industry to understand how you should decide. Please note that I tried to be neutral on the status quo.


— Stay away from unnecessary details in math.

My journey into data science started in 2005 when I became familiar with a beautiful concept named manifold learning. I loved mathematics so I fell in love with manifold learning. You had to be good at linear algebra, analytic geometry, and probability theory to understand and implement manifold learning techniques. I developed a face recognition engine using these techniques that worked effectively, at least as a student project. Using manifold learning techniques, I was able to interpret high-dimensional data which was very hard to analyze. That was giving me power.

Today, the industry does not ask you to know the math behind the algorithms. Why? Because most of the solutions are built through data-focused, rather than model-focused, methodologies . So, it would be ok if you do not know math in depth. You just need to learn how to use libraries.

— You do not need to be innovative in this industry.

In 2013, I started learning natural language processing or NLP with a powerful library named NLTK. In that year, I had a chance to be part of an innovative company working on a gesture control armband. The company aimed to build an armband that can fit on your forearm and recognize your hand gestures based on the muscle signals recorded at the forearm level. I was in charge of developing a recognition engine that can run on an ARM micro-controller and can address a large number of users. I invented what we called a muscle language and used NLP techniques to design a gesture recognition engine. The algorithm had to run on a low computational power processor and still works well for our users. It was a very successful project.

You do not need to be innovative, anymore. The innovation happens in research labs of big companies, or very few startups. They introduce powerful libraries, and you just need to learn how to use and tune those libraries. So, you do not need to be an innovative problem solver. Especially, when you have no constraint on computation power or data storage.

— Take advantage of your domain knowledge.

In 2015, my friend and I took a project to build a computer vision or CV solution to be used in the automotive industry. My friend was a mechanical engineer and I was, obviously, a machine learning engineer. We were asked to develop a particle detection solution. We had to design and build our data collection setup that needed other knowledge such as fluid dynamics. We figured out that particles could easily be trapped in the imaging chamber due to the fluid dynamic laws if we do not care about the details. Previously, I was aware of the significance of data quality for any data science project. However, I noticed it from the bottom of my heart in this project.

If you are, for example, a chemical engineer or an environment engineer and want to be a data scientist, you definitely can bring a lot of value from your toolsets to a data science project. If you work with, for example, LinkedIn profile data it does not make much difference what background you come from. But, otherwise, your domain knowledge matters.

— Learn the top widely-used tools.

In 2018, I joined a company that aimed to build an NLP-based solution to understand and interpret regulatory documents. We developed parsing engines that could read documents and structure the data as well as NLP models to interpret data. I used Spacy this time. The NLTK library lost the game to Spacy, especially for industry solutions. Spacy was one of those few widely-used libraries that were developed by small startups.

You must learn tools that are currently and widely used. You should select a set of tools such as Spacy, scikit-learn, and TensorFlow. Then, forget about other tools even those which were widely-used previously. So, if you are new to the field, it is completly ok that you did not know NLTK but you should learn Spacy and you should learn it fast!

— Select your area of focus in data science firmly.

In the past years, I designed machine learning models for times-series, image, and text data. However, I do not count myself as a computer vision expert. I have more experience working with text and time-series data. Technology advances fast, and you can not become an expert in all of these fields. They are all parts of artificial intelligence or AI but with different tools and methodologies that take much time to learn.

It is ok if you are interested to do different exciting projects in computer vision and natual language processing. However, you should select your area of focus sooner rather than later. That is the only way you can become master of the domain. In the end, life is an energy and time-limited game.

The Last Word

Year after year, the significance of a data scientist is getting less and less compared to the top widely-used libraries and tools. This is not necessarily bad though. This means you may not need years of experience working with many different libraries and frameworks. You just need to learn the recently-introduced ones.

A data scientist role can be outsourced in many cases especially during the post-COVID-19 era. This can happen much easier in companies working on non-confidential data. If you are living in North America that may hit you to some extent.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
Skip
 
 

It is highly recommended to learn data science if you want to advance your skillset within a company or if you want to run your own startup. However, IMHO, I can not recommend you to quit your job to become a data science rockstar. It is no longer the right time to do that.

I wrote an article explaining where artificial intelligence can add real value to a business. If you decide to pursue a career as a data scientist, I recommend reading this article. Kudos 😊

Link to comment
Share on other sites

5 hours ago, Daaarling said:

What is the difference between Data analyst and Data science. Senior Data Analyst is almost a Data scientist

data analyst -- load and read historical data

data scientist -- predict models on future data

The differences are much more but this is a one line answer

 

  • Upvote 1
Link to comment
Share on other sites

1 hour ago, kathanayaka said:

data analyst -- load and read historical data

data scientist -- predict models on future data

The differences are much more but this is a one line answer

 

Data engineer? 

Link to comment
Share on other sites

11 minutes ago, BeautyQueen said:

Data engineer? 

Data Analyst -- This isnt much of a role actually. Working on some ETL tools. 

Data Engineer -- Advanced than Analyst and working on spark hadoop big data (on premise and cloud) databases datawarehousing Serverless ETL and many more

Data Scientist -- ML Supervised unsupervised Deep Learning Keas TF Pytorch Reinforcement Learning NLP Transfer Learning Computer Vision and the list goes on .(not all but any of these)

  • Like 1
Link to comment
Share on other sites

14 hours ago, Tellugodu said:

A Letter to Those Seeking to Become a Data Scientist

I tried to shed a light on the industry through my own story.
 

I write this letter to shed a light on this industry as much as I can. This may help some of you find answers to questions like “Can I switch to this industry if I have no math background?”, “Do I need to be innovative to solve an industry problem?”, or “Do my current skill sets add value to a data science project?”. I shared my story to let you know where I come from along with the status quo of the industry to understand how you should decide. Please note that I tried to be neutral on the status quo.


— Stay away from unnecessary details in math.

My journey into data science started in 2005 when I became familiar with a beautiful concept named manifold learning. I loved mathematics so I fell in love with manifold learning. You had to be good at linear algebra, analytic geometry, and probability theory to understand and implement manifold learning techniques. I developed a face recognition engine using these techniques that worked effectively, at least as a student project. Using manifold learning techniques, I was able to interpret high-dimensional data which was very hard to analyze. That was giving me power.

Today, the industry does not ask you to know the math behind the algorithms. Why? Because most of the solutions are built through data-focused, rather than model-focused, methodologies . So, it would be ok if you do not know math in depth. You just need to learn how to use libraries.

— You do not need to be innovative in this industry.

In 2013, I started learning natural language processing or NLP with a powerful library named NLTK. In that year, I had a chance to be part of an innovative company working on a gesture control armband. The company aimed to build an armband that can fit on your forearm and recognize your hand gestures based on the muscle signals recorded at the forearm level. I was in charge of developing a recognition engine that can run on an ARM micro-controller and can address a large number of users. I invented what we called a muscle language and used NLP techniques to design a gesture recognition engine. The algorithm had to run on a low computational power processor and still works well for our users. It was a very successful project.

You do not need to be innovative, anymore. The innovation happens in research labs of big companies, or very few startups. They introduce powerful libraries, and you just need to learn how to use and tune those libraries. So, you do not need to be an innovative problem solver. Especially, when you have no constraint on computation power or data storage.

— Take advantage of your domain knowledge.

In 2015, my friend and I took a project to build a computer vision or CV solution to be used in the automotive industry. My friend was a mechanical engineer and I was, obviously, a machine learning engineer. We were asked to develop a particle detection solution. We had to design and build our data collection setup that needed other knowledge such as fluid dynamics. We figured out that particles could easily be trapped in the imaging chamber due to the fluid dynamic laws if we do not care about the details. Previously, I was aware of the significance of data quality for any data science project. However, I noticed it from the bottom of my heart in this project.

If you are, for example, a chemical engineer or an environment engineer and want to be a data scientist, you definitely can bring a lot of value from your toolsets to a data science project. If you work with, for example, LinkedIn profile data it does not make much difference what background you come from. But, otherwise, your domain knowledge matters.

— Learn the top widely-used tools.

In 2018, I joined a company that aimed to build an NLP-based solution to understand and interpret regulatory documents. We developed parsing engines that could read documents and structure the data as well as NLP models to interpret data. I used Spacy this time. The NLTK library lost the game to Spacy, especially for industry solutions. Spacy was one of those few widely-used libraries that were developed by small startups.

You must learn tools that are currently and widely used. You should select a set of tools such as Spacy, scikit-learn, and TensorFlow. Then, forget about other tools even those which were widely-used previously. So, if you are new to the field, it is completly ok that you did not know NLTK but you should learn Spacy and you should learn it fast!

— Select your area of focus in data science firmly.

In the past years, I designed machine learning models for times-series, image, and text data. However, I do not count myself as a computer vision expert. I have more experience working with text and time-series data. Technology advances fast, and you can not become an expert in all of these fields. They are all parts of artificial intelligence or AI but with different tools and methodologies that take much time to learn.

It is ok if you are interested to do different exciting projects in computer vision and natual language processing. However, you should select your area of focus sooner rather than later. That is the only way you can become master of the domain. In the end, life is an energy and time-limited game.

The Last Word

Year after year, the significance of a data scientist is getting less and less compared to the top widely-used libraries and tools. This is not necessarily bad though. This means you may not need years of experience working with many different libraries and frameworks. You just need to learn the recently-introduced ones.

A data scientist role can be outsourced in many cases especially during the post-COVID-19 era. This can happen much easier in companies working on non-confidential data. If you are living in North America that may hit you to some extent.

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Skip
 
 

It is highly recommended to learn data science if you want to advance your skillset within a company or if you want to run your own startup. However, IMHO, I can not recommend you to quit your job to become a data science rockstar. It is no longer the right time to do that.

I wrote an article explaining where artificial intelligence can add real value to a business. If you decide to pursue a career as a data scientist, I recommend reading this article. Kudos 😊

what is the source of this article

 I want to read the Data Scientist mentioned at the last

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...