Diagram 2 - draw.io (Initial Idea 1)

Diagram: Data Science Process Flowchart

A data science process flowchart illustrates the sequential steps involved in a typical data science project. Here's an explanation of the key components commonly found in a data science process flowchart:

01 - Business Understanding: 
  • Define objectives: Work with your customer and other stakeholders to understand and identify the business problems. Formulate questions that define the business goals that the data science techniques can target.
  • Identify data sources: Find the relevant data that helps you answer the questions that define the objectives of the project. (Microsoft Learn, 2023)
02 - Data Mining:
  • Data mining, also known as knowledge discovery in data (KDD), is the process of uncovering patterns and other valuable information from large data sets. (IBM, 2023)
03 - Data Cleaning:
  • Data scrubbing, or data cleaning, is the process of standardizing the data according to a predetermined format. It includes handling missing data, fixing data errors, and removing any data outliers. Some examples of data scrubbing are:· 
    • Changing all date values to a common standard format.·  
    • Fixing spelling mistakes or additional spaces.·  
    • Fixing mathematical inaccuracies or removing commas from large numbers. (Amazon, 2023)
04 - Data Exploration:
  • Data exploration is preliminary data analysis that is used for planning further data modelling strategies. Data scientists gain an initial understanding of the data using descriptive statistics and data visualization tools. Then they explore the data to identify interesting patterns that can be studied or actioned. (Amazon, 2023)
05 - Feature Engineering:
  • Feature engineering, 
06 - Predictive Modeling:
  • Software and machine learning algorithms are used to gain deeper insights, predict outcomes, and prescribe the best course of action. Machine learning techniques like association, classification, and clustering are applied to the training data set. The model might be tested against predetermined test data to assess result accuracy. The data model can be fine-tuned many times to improve result outcomes. (Amazon, 2023) 
07 - Data Visualization:
  • Data scientists work together with analysts and businesses to convert data insights into action. They make diagrams, graphs, and charts to represent trends and predictions. Data summarization helps stakeholders understand and implement results effectively. (Amazon, 2023)


Reference list:

Amazon (2023) What is Data Science. The University. Available at: https://aws.amazon.com/what-is/data-science (Accessed: April 19, 2023).

IBM (2023) What is data mining?. Available at: https://www.ibm.com/topics/data-mining (Accessed: April 19, 2023).

Microsoft Learn (2023) Data acquisition and understanding of Team Data Science Process - Azure Architecture CenterMicrosoft Learn. Available at: https://learn.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle-data (Accessed: April 19, 2023).

Patel, H. (2021) What is feature engineering - importance, tools and techniques for machine learningMedium. Towards Data Science. Available at: https://towardsdatascience.com/what-is-feature-engineering-importance-tools-and-techniques-for-machine-learning-2080b0269f10 (Accessed: April 19, 2023).

Comments

Popular Posts