I write this article today for those who are interested in knowing more about data science careers, but not sure of how to get started
I’ll start by going over data science history and major concepts behind it like big data, data mining, machine learning and data analytics
Then will show you some best practices in data science
we will check out the skills needed for data science jobs
History of Data Science from Forbes magazine:
The best article I read about data science history is Forbes
here is the link, enjoy reading it, you will find that data science started from 1962
Data Science Concepts and Scope:
Data Science is an extensive term, it contains multi concepts and specializations
Including big data, machine learning, data mining and data analytics. Big data is especially relevant to data science these days.
1- Big data:
Big data has around 12 definitions.
I like the 3 V rules definition, it simple and include the characteristics of Big data
Volume ( the size of data is Huge ), Variety (various type of data- structured, unstructured and semi-structured data ) and Velocity ( the frequency of incoming data need to be processed )
is to free humans from doing the tasks of trying numerous possibilities of solving a problem to isolate the best solution.
Data mining is one of the aspects of data science. It is a process of discovering a pattern in a data set. In the beginning of a mining process, you don’t know what you’re looking for.
You employ algorithms, such as those used in machine learning, to discover a previously unknown pattern or relationship. Therefore, data mining uses machine learning as a tool in its search for new knowledge without any assumptions or previous knowledge.
Unlike data mining, here you have assumptions and knowledge and need to reach to decision using data analytics
Enabling Technologies to achieve data science in reality
There are a number of underlying technologies that make data science a reality. These include data infrastructure, data management, and visualization technologies.
- Data infrastructure: technologies support how data is shared, processed and consumed. like Hadoop which is distributed file system or HDFS.
- Data Management: is handled by database management systems or DBMS.Data Science requires highly scalable, reliable, and efficient ways to store, manage, and process data. Which is why DBMS plays a critical role in data science. but as big data concepts, we have unstructured data need to be managed and stored
- Data Visualization: Once data analysis is done, the newly acquired insight needs to be conveyed to the leadership and the rest of an organization. and this is the step of data visualization
Now we have a clear vision of what data science and what is the concepts related to it
and the 3 technologies needed to have knowledge about to achieve the data science
In next article, we will go through real best practices in data science
see you 🙂