# Exploratory data analysis

Download your data and save it in a data frame called `gadata`

``````# Get the Sessions by Month in 2014
query.list <- Init(start.date = "2014-01-01",
end.date = "2014-12-31",
dimensions = "ga:date",
metrics = "ga:sessions",
table.id = "ga:00000000")
``````

Let's do some basic operations on the data.

## Min

What is the minimum number of sessions in 2014?

``````min(gadata\$sessions)
``````
`````` 0
``````

### Number of days with 0 sessions recorded

It seems like there was an error in tracking and there is no data for some days. When was it? Display the days with 0 sessions.

``````subset(gadata, ga.data\$sessions == 0)
``````
``````        date sessions
7   20140107        0
8   20140108        0
129 20140509        0
130 20140510        0
131 20140511        0
132 20140512        0
133 20140513        0
134 20140514        0
135 20140515        0
``````

How many days were there with 0 sessions? Use function `nrow()` to count rows with this condition.

``````nrow(subset(gadata, ga.data\$sessions == 0))
``````
`````` 9
``````

There was 9 days with 0 sessions.

``````summary(gadata)
``````

## Max

When was the biggest traffic on your website? Use `max()` function.

``````> max(gadata\$sessions)
``````
`````` 204
``````

The highest traffic is 204 sessions in 1 day. When was it?

``````subset(gadata, gadata\$sessions == 204)
``````
``````       date sessions
59 20140228      204
``````

You can reach these results in just one step, replacing the value with `max()`. This way, it is shorter but harder to read:

``````subset(gadata, gadata\$sessions == max(gadata\$sessions))
``````
``````       date sessions
59 20140228      204
``````

## Mean

What is the mean number of sessions per day? To calculate this, use the `mean()` function.

``````mean(gadata\$sessions)
``````
`````` 27.6
``````

The average number of sessions per day is equal to 27.6.

## Standard deviation

You can check the diversity of the number of sessions per day. Use the `sd()` function.

``````sd(gadata\$sessions)
``````
`````` 22.12984
``````

The average number of sessions is equal 27.6 +/- 22.12984. This dataset has big diversity and in that case it is better not to trust only the average value.

## Median

If a dataset has high standard deviation it is better to calculate the median (the most popular value in a dataset).

``````median(gadata\$sessions)
``````
`````` 21
``````

The most popular number of sessions id 21 sessions per day.

## Summary

If you want, you can get all of this statistics in one function: `summary`.

``````summary(gadata)
``````
``````     date              sessions
Length:365         Min.   :  0.0
Class :character   1st Qu.: 12.0
Mode  :character   Median : 21.0
Mean   : 27.6
3rd Qu.: 40.0
Max.   :204.0
``````

As a result you will get basic statistics for numeric variables and description for character variables.

## Source code

The complete source code of the examples showed above is in my GitHub repository: