Initial Idea 1

Initial Idea 1

Introduction

As my Initial Idea 1 for my Major Project, I choose Data Science because of the growing demand and interest in this field. I tried to explore it further and included the following in my blog, such as:
  • What is Data Science?
  • History
  • Data Science LifeCycle
  • Necessary Tools
  • Who is Data Scientist?


Data Science
Data science combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data. These insights can be used to guide decision-making and strategic planning. (IBM, 2023)

Data Science covers all of these areas such as Mathematics, Machine learning, Statistical Research, Data Processing and Computer Science.

(Ismath, 2020)
History of Data Science
While the term data science is not new, the meanings and connotations have changed over time. The word first appeared in the ’60s as an alternative name for statistics. In the late ’90s, computer science professionals formalized the term. A proposed definition for data science saw it as a separate field with three aspects: data design, collection, and analysis. It still took another decade for the term to be used outside of academia. (Amazon, 2023)

Data Science LifeCycle:

(Agarwal, 2018)
01 - Business Understanding: 
  • Define objectives: Work with your customer and other stakeholders to understand and identify the business problems. Formulate questions that define the business goals that the data science techniques can target.
  • Identify data sources: Find the relevant data that helps you answer the questions that define the objectives of the project. (Microsoft Learn, 2023)
02 - Data Mining:
  • Data mining, also known as knowledge discovery in data (KDD), is the process of uncovering patterns and other valuable information from large data sets. (IBM, 2023)
03 - Data Cleaning:
  • Data scrubbing, or data cleaning, is the process of standardizing the data according to a predetermined format. It includes handling missing data, fixing data errors, and removing any data outliers. Some examples of data scrubbing are:· 
    • Changing all date values to a common standard format.·  
    • Fixing spelling mistakes or additional spaces.·  
    • Fixing mathematical inaccuracies or removing commas from large numbers. (Amazon, 2023)
04 - Data Exploration:
  • Data exploration is preliminary data analysis that is used for planning further data modelling strategies. Data scientists gain an initial understanding of the data using descriptive statistics and data visualization tools. Then they explore the data to identify interesting patterns that can be studied or actioned. (Amazon, 2023)
05 - Feature Engineering:
  • Feature engineering, 
06 - Predictive Modeling:
  • Software and machine learning algorithms are used to gain deeper insights, predict outcomes, and prescribe the best course of action. Machine learning techniques like association, classification, and clustering are applied to the training data set. The model might be tested against predetermined test data to assess result accuracy. The data model can be fine-tuned many times to improve result outcomes. (Amazon, 2023) 
07 - Data Visualization:
  • Data scientists work together with analysts and businesses to convert data insights into action. They make diagrams, graphs, and charts to represent trends and predictions. Data summarization helps stakeholders understand and implement results effectively. (Amazon, 2023)

Data Science Tools:
Data scientists rely on popular programming languages to conduct exploratory data analysis and statistical regression. These open-source tools support pre-built statistical modelling, machine learning, and graphics capabilities. These languages include the following:
  • R Studio: An open-source programming language and environment for developing statistical computing and graphics.
  • Python: It is a dynamic and flexible programming language. Python includes numerous libraries, such as NumPy, Pandas, and Matplotlib, for analyzing data quickly.

To facilitate sharing code and other information, data scientists may use GitHub and Jupyter Notebooks. (IBM, 2023)

(Javinpaul, 2023)

Data Scientist
Data scientists are a new breed of analytical data experts who have the technical skills to solve complex problems – and the curiosity to explore what problems need to be solved. (SAS UK, 2022)
Conclusion
Overall, I covered information about the Data Science field, History, LifeCycle with an explanation of every step in the cycle, the significant tool that is utilised in the field such as TensorFlow, Matlab and etc, as well as the definition of Data Scientist.




Reference list:

Agarwal, S. (2018) Sudeep Agarwal. Available at: https://www.sudeep.co/data-science/2018/02/09/Understanding-the-Data-Science-Lifecycle.html (Accessed: April 19, 2023).

Amazon (2023) What is Data Science. The University. Available at: https://aws.amazon.com/what-is/data-science (Accessed: April 19, 2023).

IBM (2023) What is data mining?. Available at: https://www.ibm.com/topics/data-mining (Accessed: April 19, 2023).

IBM (2023) What is Data Science?. Available at: https://www.ibm.com/uk-en/topics/data-science (Accessed: April 19, 2023).

Ismath, R. (2020) Introduction to data scienceMedium. Analytics Vidhya. Available at: https://medium.com/analytics-vidhya/introduction-to-data-science-28deb32878e7 (Accessed: April 19, 2023).

Javinpaul (2023) Top 10 Tools Data Engineers and data scientist should learn in 2023Medium. Javarevisited. Available at: https://medium.com/javarevisited/10-essential-tools-data-scientists-should-learn-in-2022-acbae6558643 (Accessed: April 19, 2023).

Microsoft Learn (2023) Data acquisition and understanding of Team Data Science Process - Azure Architecture CenterMicrosoft Learn. Available at: https://learn.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle-data (Accessed: April 19, 2023).

Patel, H. (2021) What is feature engineering - importance, tools and techniques for machine learningMedium. Towards Data Science. Available at: https://towardsdatascience.com/what-is-feature-engineering-importance-tools-and-techniques-for-machine-learning-2080b0269f10 (Accessed: April 19, 2023).

SAS UK (2022) What is a data scientist? |. Available at: https://www.sas.com/en_gb/insights/analytics/what-is-a-data-scientist.html (Accessed: April 19, 2023).


Comments

Popular Posts