r functions: pca

Summary

Started a github repository to put online R functions I've create for some common types of analysis and plots. Aim to have a core set of functions to make figures look prettier, even on preliminary analysis. A couple examples are included focusing on the first function: principal component analysis.

A GitHub repository can be found at: R plotting functions.

For those who want to dive right in, the git repository (includes a readme):

R plotting functions

Update: The PCA script presented here could be greatly simplified using ggplot, a very useful graphics program in R. I'll leave implementation for a future post.


Been writing a lot of R functions and trying to make them generalizable and accepting of multiple input types. Since I was helped by others who posted code and other useful information about R online, thought I should contribute back.

There are a multitude of packages to help create useful and pretty plots with R. But sometimes it is also helpful to have functions that combine features of these packages into one nice example. That is what I hope to achieve with this repository. Each type of plot or analysis will have its own functions, examples and plots. This will allow users to verify that the functions work. Further, I hope those who just want to see a particular R feature implemented within a wider, working function can benefit from this.

left: USA Crime PCA: high population urban centers cluster in this high-dimension analysis.

I have included two example images created with my first function, a script to do principal component analysis on arbitrary datasets. By getting the scores from the PCA object R creates, I can create a plot that is softer on the eyes that biplot or other standard functions. In addition, it allows me to input any arbitrary list and have the function highlight the subset of items in that list on the component graph. This allow easier visualization and understanding for human readers.

The first example looks at multiple crime statistics in the USA across states. Analyzing each individually might not tell us much about crime in the US at it relates to each state, but by doing PCA we see that there is some relation between these variables and that states with large urban populations group together, seen by looking at the clustering of the 70th percentile states.

Next, I included some preliminary data, mostly uninformative to the uninitiated but visually nice, looking at biophysical protein properties across the entire yeast genome and then highlight the kinases to show this analysis can properly group related protein subsets. Obviously this is a rough first-analysis, but you get the picture.

S. cer protein properties: yeast kinases group together when analyzing several biophysical protein properties.

Alright, this was supposed to be short, so I'll end it here. In the future I'll include code and explain the thought process behind it.

-biafra
bahanonu [at] alum.mit.edu

other entires to explore:

quantized art
28 may 2012 | essay

Quantized art. The idea came about while reading how the music industry assembles top-liners, producers, artists, performers, etc. to [...]create top 40 hits. For example, there has been a recent trend in pop music to use 'drops', when the song builds to a crescendo and then a crazy, catchy bass line is released that causes everyone to dance. This has been perfected to the point where even an okay song can become popular because the producers know when to build, at what moment to intersperse catchy, meaningless lyrics and how to end the song on a high. I like the idea that art (as in paintings, drawings, etc.) can be dissected and quantified.

My first pass at developing an algorithm to break art down to its details and then use this knowledge to generate art that people would consider 'great'. We'll see how this evolves.

state of sbsa: a review of 2017 and thoughts on future directions
04 june 2017 | sbsa

I spent the past year leading the Stanford Biosciences Student Association (SBSA) as President. This post consist of the letter to the comm[...]unity I sent out at the end of my term giving some highlights of the past year, those who have helped out, and thoughts on future directions.

bio42: notes
12 may 2013 | teaching

While teaching bio42 (cell biology and animal physiology) I created weekly notes to help students in my section study and focus on the impo[...]rtant materials presented in the class. I built off of the latex boilerplate that I have been improving over time to create weekly notes. This highlights why I love LaTeX so much, especially for larger projects that are heavily linked—it allows easy annotation, indexing, creation of new document styles, and other related processes rapidly and consistently. Plus, separating content and style is always a plus and images stay uncoupled from a propriety source (e.g. Word files).

I really love the resulting notes and student feedback was quite positive. I thought sharing them might be useful for others in the future. The source latex files and raw images can be sent upon request (I'm considering making a Github repository in the future). I'll briefly talk about the document below and certain decisions that were made to get it to its current state.

©2006-2025 | Site created & coded by Biafra Ahanonu | Updated 21 October 2024
biafra ahanonu