Monday, June 27, 2016

Visualizing in R through ggplot2


Visualizations help us interpret the data quickly and make informed business decisions. For the past few days, I have been exploring visualizing features on R through the ggplot package and thought I could share some of the interesting things I learned.

Installation:

# Install and load ggplot2 package on R
> install.package("ggplot2")
> library (ggplot2)

Visualizing the Quantiles:

Let's plot the car manufacturer against city and highway mileage. NOTE: This data set  is included in R.

> ggplot(data = mpg, mapping = aes(sample = cty, color=class)) + stat_qq(geom='point', distribution='qunif') + labs(x='Quantiles', y='City Mileage', color = 'Class', title = 'Quantile Plot, City Mileage (Grouped by Class)')










Clearly, 2- Seater and compact have the highest mpg among all the classes with the suv, minivan and pick up coming at the bottom 3. No surprises there - but it's interesting to see that  differences in mpg are lower at the lower quartiles and tend to increase reaching the maximum at higher quartiles.  This basically means that the maximum mpg are significantly higher for 2seater when compared to pick up, whereas the difference between their lowest mpg's are not that significant. This will be more clear from the below line graph that shows the trend.
























facet_wrap:

We used color coding on above plot to differentiate the "class", but let's say you don't like to have all the lines on the same chart and you'd prefer to see them on different charts, then facet_wrap function can help. It basically creates multiple plots for each class category.

>ggplot() + facet_wrap('class', nrow = 2)


























facet_wrap + Combined average:

It's difficult to compare individual charts and gather insights, so let's compare each of them with the combined average to see where they stand.

























Things to call out? Interestingly, Minivans do have a higher distribution than the mean on the lower quartiles, but they tend to reduce as we move to the higher quartiles. 2 - Seater's graph is exactly the opposite - higher than average at lower quartiles that decreases as we move to higher quartiles. 

We can also plot an error bar based on a  95% CIs using a normal approximation.












































Companies have been able to increase their car's fuel efficiency with the roll-out of more efficient engines every year. Exception to this are the mid -sized and subcompact cars which showed a decreasing trend year by year. Interesting!

Engine size does have an impact on both city and highway mileages evident from the below plot.



















I definitely recommend you checking out this package as it's  a powerful tool for visualization and I
love the fact that it does both analysis and the plotting at the same time. More to come on this topic