The R is both a programming language as well as a development environment that can be used for statistical computations and graphics. The R language is much similar to the S language, but R is better when it comes to mathematical computations. This article shows, how you can use R for statistics.

R language is highly extensible. It offers a wide range of statistical techniques that are helpful in analysis. The major strengths of R languages lie in its well-defined documentation, good graphic features, strong analytical functions.

## The Mean, Median and Mode in R for Statistics

In this section, let’s see how we can find the mean, median and mode of the data in the R programming language.

Mean:

The mean of the values is given by the sum of the values and dividing by the total number of values. In R language you have to use the mean() function to get the mean of the values.

```#Creating a vector
df <- c(34,56,25,67,54,56,78,98,34,16,58)

#Returns the mean of the values in vector
mean(df)```
``52.36364``

Median:

The center or the middle value of the data is called a Median of the data. R language offers the median() function to calculate the median of the data.

```#Creates a vector
df <- c(34,56,25,67,54,56,78,98,34,16,58)

#Returns the median of data
median(df)```
``56``

Mode:

The mode of the data is given as the highest occurrence of the value in the given data. If the frequency of the data is the same, then there will be no mode in the given input data.

```#Install required libraries
library(modeest)
df <- c(34,56,25,67,54,56,78,98,34,16,58)

#Returs the mode of the value
mfv(df)```
`` 34 56``

## Descriptive Statistics in R language

As you know R is a supreme language for statistical analysis, It includes various functions that assist you in statistical computations of the given data. The descriptive statistics of the data can give plenty of insights over the data shape, distribution, and more. The summary() function in R will give you 7 statistical properties of given data.

• Mean
• Median
• 1st Quartile
• 3rd Quartile
• Minimum value
• Maximum value
• NA’s

## Descriptive Statistics in Action

R for statistics is the greatest asset for modern analytical problems. For any input data, with just a single function, R will give you the 7 statistical properties of that data. R is TOO FAST…

In this section let’s see Descriptive statistics in Action!!!

Let’s import one of the in-built dataset in R studio i.e. “Airquality” dataset.

```#Impoting the data
df <- datasets::airquality

df```
``````      Ozone Solar.R Wind Temp Month Day
1      41     190  7.4   67     5   1
2      36     118  8.0   72     5   2
3      12     149 12.6   74     5   3
4      18     313 11.5   62     5   4
5      NA      NA 14.3   56     5   5
6      28      NA 14.9   66     5   6
7      23     299  8.6   65     5   7
8      19      99 13.8   59     5   8
9       8      19 20.1   61     5   9
10     NA     194  8.6   69     5  10
11      7      NA  6.9   74     5  11
12     16     256  9.7   69     5  12``````

Fantastic!!! Our data is now ready for some statistical interference. Let’s make use of the summary() function to get the statistical properties of this data.

```#returns the statistical properties of data
summary(df)```
``````       Ozone            Solar.R              Wind
Min.   :  1.00      Min.   :  7.0      Min.   : 1.700
1st Qu.: 18.00      1st Qu.:115.8      1st Qu.: 7.6.0
Median : 31.50      Median :205.0      Median : 9.700
Mean   : 42.13      Mean   :185.9      Mean   : 9.958
3rd Qu.: 63.25      3rd Qu.:258.8      3rd Qu.:11.500
Max.   :168.00      Max.   :334.0      Max.   :20.700
NA's   :37          NA's   :7

``````
``````     Temp             Month
Min.   :56.00      Min.   :5.000
1st Qu.:72.00      1st Qu.:6.000
Median :79.00      Median :7.000
Mean   :77.88      Mean   :6.993
3rd Qu.:85.00      3rd Qu.:8.000
Max.   :97.00      Max.   :9.000

Day
Min.   : 1.0
1st Qu.: 8.0
Median :16.0
Mean   :15.8
3rd Qu.:23.0
Max.   :31.0

``````

Inference Drawn:

• The given data has Missing values (NA) in 2 columns. i.e. Ozone and Solar.R
• The maximum temperature is 97 and minimum is 56.
• The maximum Ozone is 168 and the minimum is 1.
• The wind speed maximum value is 20 and minimum is 1.7
• By all these insights, we can say that the data is pretty much accurate and can use for further analysis to get more insights.

## R for Statistics – Correlation between Attributes

With the help of summary() function, we got around 7 statistical properties of our data. We got to know that our data is good to go for further analysis. Now we have found the correlation between the attributes. For this, we can use the cor() function in the R language

```#Importing data
#Prints data
df```
``````           Rural Male  Rural Female Urban Male Urban Female
50-54       11.7          8.7       15.4          8.4
55-59       18.1         11.7       24.3         13.6
60-64       26.9         20.3       37.0         19.3
65-69       41.0         30.9       54.6         35.1
70-74       66.0         54.3       71.1         50.0``````

Well, this is our data and note that to find the correlation between the variables of data, the data must be numeric. To avoid this you can convert the data using as.numeric() function.

Now, let apply cor() function to get the correlation between all the attributes of the data.

```#Returns the correlation between attributes
``````              Rural Male Rural Female Urban Male Urban Female
Rural Male    1.0000000    0.9979869  0.9841907    0.9934646
Rural Female  0.9979869    1.0000000  0.9739053    0.9867310
Urban Male    0.9841907    0.9739053  1.0000000    0.9918262
Urban Female  0.9934646    0.9867310  0.9918262    1.0000000``````

The most commonly used correlation method is the Pearson correlation. In this method, the correlation lies between -1 to 1 and 1 being the highest correlated and -1 being the least correlated values. If the correlation is 0, then there is no correlation between those attributes.

In this data, you can see almost all the attributes are highly correlated to each other.

## R for statistics – The Quantile distribution

The quantile distribution is also one of the most important factors in data analysis to understand the data distribution.

Let’s see how we can find the quantile distribution of the data.

```#Importing data
df <- datasets::iris
df```
``````      Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
7            4.6         3.4          1.4         0.3     setosa
8            5.0         3.4          1.5         0.2     setosa
9            4.4         2.9          1.4         0.2     setosa
10           4.9         3.1          1.5         0.1     setosa
11           5.4         3.7          1.5         0.2     setosa
12           4.8         3.4          1.6         0.2     setosa``````
```#Returns the quantile distribution
quantile(iris\$Sepal.Length)```
``````0%  25%  50%  75% 100%
4.3  5.1  5.8  6.4  7.9 ``````

## R for statistics – Skewness of the data

The skewness of the data is defined as the assumed Gaussian distribution in one or other directions. In the area of data preparation, it’s pretty much important to know the skewness of the data. Let’s use skewness() function for this purpose.

Let’s use the same IRIS dataset for this.

```#Importing data
df <- datasets::iris
df```
``````      Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
6            5.4         3.9          1.7         0.4     setosa
7            4.6         3.4          1.4         0.3     setosa
8            5.0         3.4          1.5         0.2     setosa
9            4.4         2.9          1.4         0.2     setosa
10           4.9         3.1          1.5         0.1     setosa
11           5.4         3.7          1.5         0.2     setosa
12           4.8         3.4          1.6         0.2     setosa``````

Now, we will see the skewness of the given dataset using the skewness() function in R language.

For this, we have to import a library = e1071

```#Imports the library
library(e1071)
#Returns the skewness of the data
skewness(iris\$Sepal.Length)```
``0.3086407``
```#Import library
ibrary(e1071)
#Returns the skewness of the data
skewness(iris\$Petal.Length)```
``-0.2694109``

As you can observe the output, if the skewness is positive, it is right-skewed and if the output is negative it is left-skewed. Note that if the skewness value is near to zero means it is less skewed.

## Wrapping Up

R for statistics is one of the most important aspects of programming and statistic interference. You can use the R language for many statistical purposes as shown in this article.

R has plenty of in-built functions which will give amazing insights over the input data. You can get around 7 statistical properties of the data with the function summary()

By now, I hope you got better of R for the statistics topic. That’s all for now. Happy R!!!

More read: R CRAN project for statistics

Categorized in: