Definition of Descriptive Statistics
Descriptive statistics is a branch of statistics that deals with making data more meaningful and easy to grasp. It involves summarising and organizing data concisely. The main focus is to describe and analyze a dataset’s essential features and characteristics without making broad assumptions or predictions about a larger data group.
The primary objective of descriptive statistics is to provide a clear and compact data summary. This helps researchers and analysts gain valuable insights and comprehend the dataset’s patterns, trends, and distributions. The summary typically includes measures like central tendency (such as mean, median, and mode), dispersion (including range, variance, and standard deviation), and the shape of the distribution (like skewness and kurtosis).
Descriptive statistics also includes presenting data in graphical form through charts, graphs, and tables. These visual representations further assist in understanding and interpreting the information effectively. Common graphical techniques include histograms, bar charts, pie charts, scatter plots, and box plots. Using descriptive statistics, researchers can communicate their findings more efficiently and make informed decisions based on the data’s characteristics.
Also read:Â Visual Analytics: Transforming Data into Actionable Insights
Types of Descriptive Statistics
Descriptive statistics encompass various types and measures to help us make sense of data. Some experts categorize them into two types, while others may describe three or four.
Distribution (Also Called Frequency Distribution)
Datasets contain a range of scores or values. To summarise the frequency of each possible value in a variable, statisticians use graphs or tables, presenting the information in percentages or numbers. For example, if you conducted a poll to find people’s favorite Beatles, you’d set up columns with each Beatle’s name (John, Paul, George, and Ringo) and the number of votes they received.
Statisticians visualize frequency distributions using graphs or tables.
Measures of Central Tendency
Measures of central tendency help us find the average or center of a dataset using three methods: mean, mode, and median.
Mean: Also known as “M,” the mean is the most common method to find averages. You calculate it by adding all the values and dividing the sum by the number of responses, represented as “N.” For example, if you want to know the average hours of sleep someone gets in a week, you’d gather the daily hours (e.g., 6, 8, 7, 10, 8, 4, 9), and the sum would be 52. With seven responses (N=7), dividing 52 by 7 gives you a mean of 7.3.
Mode: The mode represents the most frequent value in the dataset. There can be one or multiple modes, including “zero.” To find the mode, arrange the dataset from lowest to highest and look for the most common value. In the sleep study example, the mode is eight.
Median: The median is the value in the middle of the dataset when arranged in ascending order. Once again, the median is eight using the sleep study data (4, 6, 7, 8, 8, 9, 10).
Recommended Courses :
- Decode Data Science with ML 1.0
- Decode Full Stack Web Dev 1.0
- Decode JAVA with DSA 1.0
- System Design
Variability (Also Called Dispersion)
Variability helps statisticians understand how spread out the responses are in a dataset. This aspect includes three measures: range, standard deviation, and variance.
Range: To determine the range, find the difference between the highest and lowest values in the dataset. In the sleep study example, subtracting four (lowest value) from ten (highest value) gives us a range of six.
Standard Deviation: This measure shows the average variability in the dataset, indicating how far each score is from the mean. A larger standard deviation means greater variability. Calculating it involves six steps:
- List the scores and their mean.
- Find the deviation by subtracting the mean from each score.
- Square each deviation.
- Add up all the squared deviations.
- Divide the sum of squared deviations by N-1.
- Take the square root of the result.
Understanding descriptive statistics helps researchers and analysts gain valuable insights and present data more effectively.
Also read:Â Top 30 Excel Formulas And Functions You Should Know
Univariate Descriptive Statistics
Univariate descriptive statistics focus on analyzing one variable at a time without comparing it to other variables. It allows researchers to describe individual variables, making it useful for descriptive statistics. Some of the patterns identified in this type of data include:
- Measures of central tendency (mean, mode, and median)
- Data dispersion (standard deviation, variance, range, minimum, maximum, and quartiles)
- Tables of frequency distribution
- Graphical representations like pie graphs, frequency polygons, histograms, and bar graphs
Bivariate Descriptive Statistics
Bivariate descriptive statistics involve analyzing two variables simultaneously to see if they are correlated. The columns represent the independent variable, and the rows represent the dependent variable.
There are many real-world applications for bivariate data analysis. For example, predicting when a natural event might occur can be highly valuable. It’s a powerful tool in a statistician’s toolkit.Â
Plotting one parameter against the other on a two-dimensional plane can give us a deeper understanding of the data. For instance, in the scatterplot below, we can see the relationship between the time between eruptions at Old Faithful and the duration of the eruptions.
Also read:Â Predictive Analysis: Predicting the Future with Data
FAQs
Can you explain what descriptive statistics is and how it is used?Â
Descriptive statistics provides fundamental details about variables in a dataset and can reveal possible connections between variables. The three most frequently used descriptive statistics can be presented visually through graphical or pictorial methods.
Could you explain the different types of descriptive statistics?
Descriptive statistics involve a dataset's frequency distribution, central tendency, and variability. Frequency distribution refers to the number of times different responses occur. Measures of central tendency provide the average value for each response.
What are the main categories of descriptive statistics?Â
There are four major categories of descriptive statistics:Â
Measures of Frequency: Count, Percent, and Frequency - these show how often something occurs.Â
Measures of Central Tendency: Mean, Median, and Mode provide information on a data set's typical or central value.Â
Measures of Dispersion or Variation: Range, Variance, and Standard Deviation - this help to understand how spread out the data is.Â
Measures of Position - these show where a particular data point stands in relation to the rest of the data set.
What are the applications of descriptive analysis?
Descriptive analysis is useful in finance to showcase financial metrics and highlight business performance in reports.
Can you provide 8 descriptive statistics?Â
Descriptive statistics summaries data. Measures of central tendency (mean, median, mode) show typical values. Measures of variability (standard deviation, variance, min/max, kurtosis, skewness) indicate data spread.