r functions: pca

in programming on 10 January 2013

Summary

Started a github repository to put online R functions I've create for some common types of analysis and plots. Aim to have a core set of functions to make figures look prettier, even on preliminary analysis. A couple examples are included focusing on the first function: principal component analysis.

A GitHub repository can be found at: R plotting functions.

For those who want to dive right in, the git repository (includes a readme):

R plotting functions

Update: The PCA script presented here could be greatly simplified using ggplot, a very useful graphics program in R. I'll leave implementation for a future post.

Been writing a lot of R functions and trying to make them generalizable and accepting of multiple input types. Since I was helped by others who posted code and other useful information about R online, thought I should contribute back.

There are a multitude of packages to help create useful and pretty plots with R. But sometimes it is also helpful to have functions that combine features of these packages into one nice example. That is what I hope to achieve with this repository. Each type of plot or analysis will have its own functions, examples and plots. This will allow users to verify that the functions work. Further, I hope those who just want to see a particular R feature implemented within a wider, working function can benefit from this.

left: USA Crime PCA: high population urban centers cluster in this high-dimension analysis.

I have included two example images created with my first function, a script to do principal component analysis on arbitrary datasets. By getting the scores from the PCA object R creates, I can create a plot that is softer on the eyes that biplot or other standard functions. In addition, it allows me to input any arbitrary list and have the function highlight the subset of items in that list on the component graph. This allow easier visualization and understanding for human readers.

The first example looks at multiple crime statistics in the USA across states. Analyzing each individually might not tell us much about crime in the US at it relates to each state, but by doing PCA we see that there is some relation between these variables and that states with large urban populations group together, seen by looking at the clustering of the 70th percentile states.

Next, I included some preliminary data, mostly uninformative to the uninitiated but visually nice, looking at biophysical protein properties across the entire yeast genome and then highlight the kinases to show this analysis can properly group related protein subsets. Obviously this is a rough first-analysis, but you get the picture.

S. cer protein properties: yeast kinases group together when analyzing several biophysical protein properties.

Alright, this was supposed to be short, so I'll end it here. In the future I'll include code and explain the thought process behind it.

-biafra

bahanonu [at] alum.mit.edu

additional articles to journey through:

bash scripting: youtube downloading macro
17 may 2013 | programming

<p> Once again, the command line is the root of all that is good in the world. This time, it has helped improve on a long[...]-standing issue for me: what is the easiest way to get a copy of all the <a href='http://www.youtube.com/playlist?list=PLmku2swCXQpqWAZSscjV4h9bcLennVcif' target='_blank'>luscious melodies</a> i hear on youtube? Courtesy of <a href='http://rg3.github.io/youtube-dl/' target='_blank'>youtube-dl</a>, a nifty little command line utility, this problem has been solved. However, every once in awhile it throws errors and i wanted a wrapper bash script to take care of this and some other processing. I'll briefly go over the code. </p>

why we need more james polks
25 september 2012 | politics

James J. Polk expanded the territory of the United States by about one-third during his tenure. A remarkable feat. Not only that, but i[...]t was done through an astonishing three ways: territorial conquest, gold and negotiation.

Some thoughts on why we should demand less rhetoric and more pragmatism/details from our presidents.

week 2 | day and night
19 june 2012 | singapore

This pace can't keep up. Saw some crazy animals at a night safari, wandered around NUS's campus late at night (the place is chaotic, beauti[...]ful and slightly eerie at night), met with SUTD's president and more. It seems that staying in Singapore for most of the summer is a go, there is so much here to find out and explore. That might also be because I severely underestimated the size of the country (e.g. don't glance at a map's scale and assume 20 miles = 2 miles, haha). Anyhow, here goes, another eventful week summarized in small snippets.

Donald J. Trump Campaign Poster 2016
09 February 2016 | designs

Wanted to make a series of 2016 presidential candidate posters like I did for the 2012 race. The first is for Donald Trump and focused on e[...]mphasizing his general goals rather than specifics. They were also left appropriately vague to allow the viewer to interpret them through their own lens/opinions on the issues.

Biafra Ahanonu, PhD

home

about

contact [at] bahanonu.com

stanford

linkden

github

goodreads

medium

twitter

publications

talks

ciatah

articles

graduate school resources

Stanford Biosciences Student Association

list of post tags

all articles - with pictures

all articles - text form

favorite posts

favorite short stories

short stories

spanish short stories

singapore

teaching

reading

current reading + ratings

full reviews

designs

neuroscience

blog

resources

technologies

abiogenesis

search

feeds

main website

brain initiative notes

next»

«previous

random!