# Exploratory data analysis

Download your data and save it in a data frame called `gadata`

```
# Get the Sessions by Month in 2014
query.list <- Init(start.date = "2014-01-01",
end.date = "2014-12-31",
dimensions = "ga:date",
metrics = "ga:sessions",
table.id = "ga:00000000")
```

Let's do some basic operations on the data.

## Min

What is the minimum number of sessions in 2014?

```
min(gadata$sessions)
```

```
[1] 0
```

### Number of days with 0 sessions recorded

It seems like there was an error in tracking and there is no data for some days. When was it? Display the days with 0 sessions.

```
subset(gadata, ga.data$sessions == 0)
```

```
date sessions
7 20140107 0
8 20140108 0
129 20140509 0
130 20140510 0
131 20140511 0
132 20140512 0
133 20140513 0
134 20140514 0
135 20140515 0
```

How many days were there with 0 sessions? Use function `nrow()`

to count rows with this condition.

```
nrow(subset(gadata, ga.data$sessions == 0))
```

```
[1] 9
```

There was 9 days with 0 sessions.

```
summary(gadata)
```

## Max

When was the biggest traffic on your website? Use `max()`

function.

```
> max(gadata$sessions)
```

```
[1] 204
```

The highest traffic is 204 sessions in 1 day. When was it?

```
subset(gadata, gadata$sessions == 204)
```

```
date sessions
59 20140228 204
```

You can reach these results in just one step, replacing the value with `max()`

. This way, it is shorter but harder to read:

```
subset(gadata, gadata$sessions == max(gadata$sessions))
```

```
date sessions
59 20140228 204
```

## Mean

What is the mean number of sessions per day? To calculate this, use the `mean()`

function.

```
mean(gadata$sessions)
```

```
[1] 27.6
```

The average number of sessions per day is equal to 27.6.

## Standard deviation

You can check the diversity of the number of sessions per day. Use the `sd()`

function.

```
sd(gadata$sessions)
```

```
[1] 22.12984
```

The average number of sessions is equal 27.6 +/- 22.12984. This dataset has big diversity and in that case it is better not to trust only the average value.

## Median

If a dataset has high standard deviation it is better to calculate the median (the most popular value in a dataset).

```
median(gadata$sessions)
```

```
[1] 21
```

The most popular number of sessions id 21 sessions per day.

## Summary

If you want, you can get all of this statistics in one function: `summary`

.

```
summary(gadata)
```

```
date sessions
Length:365 Min. : 0.0
Class :character 1st Qu.: 12.0
Mode :character Median : 21.0
Mean : 27.6
3rd Qu.: 40.0
Max. :204.0
```

As a result you will get basic statistics for numeric variables and description for character variables.

## Source code

The complete source code of the examples showed above is in my GitHub repository:

github.com/michalbrys/R-Google-Analytics/blob/master/2_eda.R