Initial Idea 1
Initial Idea 1
Introduction
- What is Data Science?
- History
- Data Science LifeCycle
- Necessary Tools
- Who is Data Scientist?
- Define objectives: Work with your customer and other stakeholders to understand and identify the business problems. Formulate questions that define the business goals that the data science techniques can target.
- Identify data sources: Find the relevant data that helps you answer the questions that define the objectives of the project. (Microsoft Learn, 2023)
- Data mining, also known as knowledge discovery in data (KDD), is the process of uncovering patterns and other valuable information from large data sets. (IBM, 2023)
- Data scrubbing, or data cleaning, is the process of standardizing the data according to a predetermined format. It includes handling missing data, fixing data errors, and removing any data outliers. Some examples of data scrubbing are:·
- Changing all date values to a common standard format.·
- Fixing spelling mistakes or additional spaces.·
- Fixing mathematical inaccuracies or removing commas from large numbers. (Amazon, 2023)
- Data exploration is preliminary data analysis that is used for planning further data modelling strategies. Data scientists gain an initial understanding of the data using descriptive statistics and data visualization tools. Then they explore the data to identify interesting patterns that can be studied or actioned. (Amazon, 2023)
- Feature engineering, in simple terms, is the act of converting raw observations into desired features using statistical or machine-learning approaches. (Patel, 2021)
- Software and machine learning algorithms are used to gain deeper insights, predict outcomes, and prescribe the best course of action. Machine learning techniques like association, classification, and clustering are applied to the training data set. The model might be tested against predetermined test data to assess result accuracy. The data model can be fine-tuned many times to improve result outcomes. (Amazon, 2023)
- Data scientists work together with analysts and businesses to convert data insights into action. They make diagrams, graphs, and charts to represent trends and predictions. Data summarization helps stakeholders understand and implement results effectively. (Amazon, 2023)
Data Science Tools:
- R Studio: An open-source programming language and environment for developing statistical computing and graphics.
- Python: It is a dynamic and flexible programming language. Python includes numerous libraries, such as NumPy, Pandas, and Matplotlib, for analyzing data quickly.
To facilitate sharing code and other information, data scientists may use GitHub and Jupyter Notebooks. (IBM, 2023)
(Javinpaul, 2023) |
Data scientists are a new breed of analytical data experts who have the technical skills to solve complex problems – and the curiosity to explore what problems need to be solved. (SAS UK, 2022)
Agarwal, S. (2018) Sudeep Agarwal. Available at: https://www.sudeep.co/data-science/2018/02/09/Understanding-the-Data-Science-Lifecycle.html (Accessed: April 19, 2023).
Amazon (2023) What is Data Science. The University. Available at: https://aws.amazon.com/what-is/data-science (Accessed: April 19, 2023).
IBM (2023) What is
data mining?. Available at: https://www.ibm.com/topics/data-mining
(Accessed: April 19, 2023).
IBM (2023) What is
Data Science?. Available at: https://www.ibm.com/uk-en/topics/data-science
(Accessed: April 19, 2023).
Ismath, R. (2020) Introduction to
data science, Medium. Analytics Vidhya. Available at:
https://medium.com/analytics-vidhya/introduction-to-data-science-28deb32878e7
(Accessed: April 19, 2023).
Javinpaul (2023) Top 10 Tools Data Engineers and data
scientist should learn in 2023, Medium. Javarevisited.
Available at: https://medium.com/javarevisited/10-essential-tools-data-scientists-should-learn-in-2022-acbae6558643
(Accessed: April 19, 2023).
Microsoft Learn (2023) Data acquisition and understanding
of Team Data Science Process - Azure Architecture Center, Microsoft
Learn. Available at:
https://learn.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle-data
(Accessed: April 19, 2023).
Patel, H. (2021) What is feature engineering - importance,
tools and techniques for machine learning, Medium. Towards Data
Science. Available at:
https://towardsdatascience.com/what-is-feature-engineering-importance-tools-and-techniques-for-machine-learning-2080b0269f10
(Accessed: April 19, 2023).
SAS UK (2022) What is a data scientist? |. Available at:
https://www.sas.com/en_gb/insights/analytics/what-is-a-data-scientist.html
(Accessed: April 19, 2023).
Comments
Post a Comment