4 Examples of analysing data in R
These are some example vignettes - worked examples - of how to run a set of analyses in R. You might find some of them useful. You can also explore the examples https://quantifiedself.com/show-and-tell/ and various other sources!
Remember, there is no expectation that you code in DSI 36100. There are many many freely available tools that will help you run the kind of basic exploratory statistical analyses you need for AT2 (e.g. spreadsheet tools), and visualise data.
There are also free tools that will analyse text and images (although possibly not in batch, unless you use an app to collect the images), again without coding. The key thing for this assignment is thinking creatively about how to analyse and make sense of the data and the justification for that, not demonstrating advanced coding skills.
Most of the examples below are from MDSI students (a couple from me, and a few externals).
4.1 Useful other tools or snippets of code
You might also find these few posts useful:
- An Introduction to Accessing RESTful APIs Using R, by Werner Schott, March 31, 2019: In the past, working in R has meant importing data into the application from numerous sources. I have found this very manual and not very reproducible in the development of applications. One of the weaknesses I have identified and would like to address is to understand how to access online data resources using the appropriate packages in R. The scope of this exercise is not to master the use of one package or several packages, but rather understand how certain R packages fit into the overall process, and what that process is. http://rpubs.com/plantagenet/481658
- Use R-Studio on any PC (with portable apps), Joshua McCarthy, 29 March 2019, Someone out there has kindly adapted R to the Portable Apps framework! This means we can use R and R-Studio anywhere we like without needing to install R and R-Studio locally, quite handy for work, University computers and VM’s. We can also keep all our libraries and packages in one location making them easier to manage when moving between workstations. http://rpubs.com/jsmccid/rportable
- https://github.com/thomasp85/gganimate - this is very cool, create animated gif visualisations http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html - nice examples of a bunch of ggplot charts
- Create sonified data (using audio to represent the shape of data, as we use visualisation) http://playitbyr.org/
- https://cran.r-project.org/web/packages/googleway/vignettes/googleway-vignette.html - connect to a variety of google map based services
- Want to visualise your web history? Here’s one way to do that https://www.webhistorian.org/education/
- Some time ago I saw a blog post on visualising your twitter history using streamgraphs in R. As my tweets moved from mostly psychology/philosophy tweeting around my teaching, through to my current mish-mash of learning analytics stuff, I thought it’d be interesting to play with this. So, here’s the code reproduced, plus my streamgraph. http://sjgknight.com/finding-knowledge/2016/01/visualising-twitter-history/
Remember, not all of these approaches are good or sensible approaches, consider how you use standard features like tables and reporting numbers, and don’t go overboard with complex visualisations.
4.2 gganimate
if(!file.exists("vignettes/gganimate.html")){
file.copy(RShowDoc("gganimate", package = "gganimate"), "vignettes/", overwrite = FALSE)
#file.copy(vignette("gganimate"), "vignettes/", overwrite = FALSE)
}
knitr::include_url("vignettes/gganimate.html")4.3 Garmin
#uses the library trackeR. See vignette at https://cran.r-project.org/web/packages/trackeR/vignettes/TourDetrackeR.html - read for detail
#First we're going to load some dummy data.
#to do this with your own data, you'll need to use the function read_container
#TO DO THAT YOU SHOULD UNCOMMENT THE FOLLOWING TWO LINES
#BUT FOR THE PURPOSES OF THIS DEMO I'M JUST GOING TO IMPORT THE PACKAGE DATA
#filepath <- "private/your_Garmin_export.TCX.gz"
#runDF <- read_container(filepath, type = "tcx", timezone = "GMT") #check other options
##################################################
##################################################
#knitr::include_url("http://cran.r-project.org/web/packages/trackeR/vignettes/TourDetrackeR.html")
#Because I actually have this file locally as part of the package I'm going to save a copy into a new directory, and then display that version
if(!file.exists("vignettes/TourDetrackeR")){
file.copy(RShowDoc("TourDetrackeR", package = "trackeR"), "vignettes/", overwrite = FALSE)
}## [1] FALSE
knitr::include_url("vignettes/TourDetrackeR.html")4.4 Walking data (and covid impacts)
knitr::include_url("https://methodmatters.github.io/impact-covid-19-pandemic-2020-steps/")4.5 Your location
#This one is from this markdown https://rpubs.com/Geoff_W/481405 by Geoff, an alumnus of the MDSI
if(!file.exists("vignettes/location.html")){
download.file("https://rstudio-pubs-static.s3.amazonaws.com/481405_ff86c5e272d84cc6a47c3d4b3c0fb0c3.html", "vignettes/location.html")
}
knitr::include_url("vignettes/location.html")4.6 Text mining demo
Show some functions basic text stuff.
knitr::include_url("https://rafalab.github.io/dsbook/text-mining.html")if(!file.exists("vignettes/tidytext")){
file.copy(RShowDoc("tidytext", package = "tidytext"), "vignettes/", overwrite = FALSE)
}## [1] FALSE
knitr::include_url("vignettes/tidytext.html")#knitr::include_url("https://cran.r-project.org/web/packages/tidytext/vignettes/tidytext.html")4.7 Sentiment on text
This is a nicer sentiment analysis package
#knitr::include_url("https://github.com/trinker/sentimentr/blob/master/README.md")
if(!file.exists("vignettes/sentimentr.html")){
download.file("https://github.com/trinker/sentimentr/blob/master/README.md", "vignettes/sentimentr.html")
}
knitr::include_url("vignettes/sentimentr.html")4.8 Named entity recognition
Can we identify things (people, places, etc.) in text?
#I dn't have storng views on whether this package is the best (the vignette could be expanded a bit...), but it is a tidy approach to NLP and has a nice entity extraction (see entity_type and entity)
knitr::include_url("https://statsmaths.github.io/cleanNLP/state-of-union.html")#another simple option is entity
#knitr::include_url("https://github.com/trinker/entity/blob/master/README.md")
if(!file.exists("vignettes/NER.html")){
download.file("https://github.com/trinker/entity/blob/master/README.md", "vignettes/NER.html")
}
knitr::include_url("vignettes/NER.html")#https://towardsdatascience.com/quick-guide-to-entity-recognition-and-geocoding-with-r-c0a915932895
#Notice they also geocode their data using https://cran.r-project.org/web/packages/tidygeocoder/vignettes/tidygeocoder.html 4.9 Image processing Images
Here’s a silly example of some functions for dealing with image data, including image recognition
#knitr::include_url("https://rpubs.com/sjgknight/food")
knitr::include_url("vignettes/clarifai.html")4.10 EDA and Descriptive Statistics
Show some…
4.11 QS_Ledger (a set of Python modules)
The repo at https://github.com/markwk/qs_ledger is a set of Python scripts to get data as below. I’ve included this as a submodule in git, in the vignettes directory. At time of writing this included:
- Apple Health: fitness and health tracking, data analysis and dashboard from iPhone or Apple Watch (includes example of Elastic Search integration and Kibana Health Dashboard).
- AutoSleep: iOS sleep tracking data analysis of sleep per night and rolling averages.
- Fitbit: fitness and health tracking and analysis of Steps, Sleep, and Heart Rate from a Fitbit wearable.
- GoodReads: book reading tracking and data analysis for GoodReads.
- Google Calendar: past events, meetings and times for Google Calendar.
- Google Sheets: get data from any Google Sheet which can be useful for pulling data from IFTTT integrations that add data.
- Habitica: habit and task tracking with Habitica’s gamified approach to task management.
- Instapaper: articles read and highlighted passages from Instapaper.
- Kindle Highlights: Parser and Highlight Extract from Kindle clippings, along with a sample data analysis and tool to export highlights to separate markdown files.
- Last.fm: music tracking and analysis of music listening history from Last.fm.
- Oura: oura ring activity, sleep and wellness data.
- RescueTime: track computer usage and analysis of computer activities and time with RescueTime.
- Pocket: articles read and read count from Pocket.
- Strava: activities downloader (runs, cycling, swimming, etc.) and analysis from Strava.
- Todoist: task tracking and analysis of todo’s and tasks completed history from Todoist app.
- Toggl: time tracking and analysis of manual timelog entries from Toggl.
- WordCounter: extract wordcounter app history and visualize recent periods of word counts.
4.13 Other fun examples
4.13.1 search history
if(!file.exists("vignettes/search.html")){
download.file("https://rstudio-pubs-static.s3.amazonaws.com/359000_436aec1332894fe8b4651e1aab587d12.html", "vignettes/search.html")
}
knitr::include_url("vignettes/search.html")…your example here!
4.12 Social media analysis
There are lots of tools for analysing social media data, including ones that do not require any coding skill.
For facebook data two easy approaches are: * you can extract ad data https://github.com/RitwikGA/FacebookReportingTool * analyse your ‘likes’ for personality https://applymagicsauce.com/demo
This post describes working with your Facebook download data in R