Introduction to Data Science

Esma Bozkurt
3 min readMar 8, 2021

Hi, I’m Esma Bozkurt. I am a student of electrical and electronics engineering. I have completed the Data Science with Deep Learning training with Dear Mr. Data Scientist Zafer Acar and I have been a Data Scientist for 2 months. I would like to thank my esteemed professor for providing an American-standard Data Science and Artificial Intelligence training with 256 hours of lectures and real-life projects in approximately 6 weeks. I am starting a series to convey what I learned from him and to help friends who want to progress in this field. In this series:

  • Reading Data, Data Visualization, Exploratory Data Analysis (EDA)-PPR, Feature Engineering, Missing Value Imputation, Creating Dummies
  • Regression, Classification, Clustering, Pycarot
  • Principal Component Analysis (PCA), Graph Theory, Natural Language Processing (NLP), Computer Vision, Deep Learning, Recommender Systems, Time Series Analysis

I will summarize these issues as much as possible and support them with projects. Now let me dive into the main subject and introduce you to data science, which you have heard a lot recently.

Data is a raw piece of information. Almost everything around us is data. There are many types of data, numerical, logical, and our names, our numbers, and every post we like are data. Data science is a discipline that brings together many fields, including statistics, scientific methods, and data analysis, to extract value from data.

It makes sense of the past.

He interprets the present.

Makes predictions about the future.

While doing these, it uses statistics and probability, Machine Learning and Deep Learning, which are sub-disciplines of Artificial Intelligence. These are all terms that complement and explain each other, and the data scientist uses them to make sense of and analyze data. Python and R programming languages ​​are generally used as tools.

In data science, the first step is to read and make sense of the data. We will do this using the Python Pandas library. Then we will use the visualization tools, which are the best way to make sense of the data. Data analysis is the next thing to be done.

Python — Sample graphics created using the Matplotlib Library

Matplotlib is a low-level python library with a Matlab-like interface. With the Seaborn library, which has a high-level interface, we can get more attractive graphics that will make your work easier with shorter codes.

For example, like heatmap where we can examine all data correlations
Or comparing sns.pairplot with whole rows and columns…

After reading the data, we can visualize using the columns we want to create meaningful graphs. Want a more assertive graphic? Let’s see how we can examine the area and population of a state’s cities without being overwhelmed by numbers.

Area and population distribution map of cities in the state of California

First of all, we read the data file with information such as latitude, longitude, population, area with pandas library and selected the columns we are interested in. We draw a scatter chart using size and color. Don’t worry, this was an ambitious attempt to show what we can do. We will use useful graphics with shorter codes in projects. So see you in my next post, I hope you enjoyed my post ! :)

github account/esmabozkurt

--

--