Hello all,

I write this article today for those who are interested in knowing more about data science careers, but not sure of how to get started

I’ll start by going over data science history and major concepts behind it like big data, data mining, machine learning and data analytics

Then will show you some best practices in data science

we will check out the skills needed for data science jobs


History of  Data Science from Forbes magazine:

The best article I read about data science history is Forbes

here is the link, enjoy reading it, you will find that data science started from 1962

A Very Short History of Data Science


Data Science Concepts and Scope:

Data Science is an extensive term, it contains multi concepts and specializations

Including big data, machine learning, data mining and data analytics. Big data is especially relevant to data science these days.

1- Big data:

Big data has around 12 definitions.

I like the 3 V rules definition, it simple and include the characteristics of Big data

Volume ( the size of data is Huge  ), Variety (various type of data- structured, unstructured and semi-structured data ) and Velocity ( the frequency of incoming data need to be processed )

https://www.forbes.com/sites/gilpress/2014/09/03/12-big-data-definitions-whats-yours/#6c99d4d713ae

 

2-Machine Learning:

is to free humans from doing the tasks of trying numerous possibilities of solving a problem to isolate the best solution.

https://www.forbes.com/sites/bernardmarr/2017/05/04/what-is-machine-learning-a-complete-beginners-guide-in-2017/#6b9a912a578f

3-Data Mining:

Data mining is one of the aspects of data science. It is a process of discovering a pattern in a data set. In the beginning of a mining process, you don’t know what you’re looking for.

You employ algorithms, such as those used in machine learning, to discover a previously unknown pattern or relationship. Therefore, data mining uses machine learning as a tool in its search for new knowledge without any assumptions or previous knowledge.   

4-Data Analytics:

Unlike data mining, here you have assumptions and knowledge and need to reach to decision using data analytics

 


Enabling Technologies to achieve data science in reality 

There are a number of underlying technologies that make data science a reality. These include data infrastructure, data management, and visualization technologies.

  1. Data infrastructure: technologies support how data is shared, processed and consumed. like  Hadoop which is distributed file system or HDFS.
  2. Data Management: is handled by database management systems or DBMS.Data Science requires highly scalable, reliable, and efficient ways to store, manage, and process data. Which is why DBMS plays a critical role in data science. but as big data concepts, we have unstructured data need to be managed and stored
  3. Data Visualization: Once data analysis is done, the newly acquired insight needs to be conveyed to the leadership and the rest of an organization. and this is the step of data visualization

 

 

Now we have a clear vision of what data science and what is the concepts related to it

and the 3 technologies needed to have knowledge about to achieve the data science

 

In next article, we will go through real best practices in data science

see you  🙂

 

Advertisements