My Experience with GSOC and R

It all began when I started searching for a Google Summer of Code project last year (November, 2015) . While I was searching through the web found this page that suggested of a project idea. I didn’t have a complete understanding about the problem but I contacted the mentors and familiarized myself with R and the theoretical background of the package. In March, I submitted a project proposal and got selected for GSOC finally. (Of course I had to submit a test to the mentors first)

The actual challenge began only then. My mentor, Professor Gregory Nuel sent me a paper he had presented which included a detailed explanation about the change point model and some code examples. I must say that it took a while to understand the document completely with the different notations used. I always kept bothering my mentors for further details whenever I had a question. A significant proportion of the complexity of the project was due to the theoretical background required to the project.

I started with simple steps. Model specific implementation required to choose a few models and finalize the calculations for the log evidence. I was able to finalize this with the help of my mentors. The core implementation of the forward backward algorithm was done in C++. I had several issues interfacing it with the R code using Rcpp.  I was able to rectify them by adding useDynLib() in the NAMESPACE.

useDynLib(postCP)

After the core forward backward algorithms were interfaced, I could proceed with the core model part. Following the change point model, I coded to find the initial regression coefficients using standard R functions. The result was then used to calculate model specific log evidences which were input to the forward backward algorithm to compute posterior probability distributions using HMM. Finally the posterior change point probability distribution code be obtained.

Using the change point probability distribution, the parameters and the log evidence was updated. Finishing off the core implementation part of the package.

However at that time I found out that I hadn’t followed a proper coding style guideline. Therefore I referred to “Hadley Wickam R style” and “R style. An Rchaeological Commentary”. Well, all of this would be an utter waste if the user did not have any source to understand how to use the functions. Therefore I spent time on improving function documentation as well.

To provide an insight to the theoretical background behind the package, I added Vignettes which is a long form documentation in R packages.

Throughout the project I learned new things and I always tried to develop the postCP package in the most R compliant way. All the requirements have been met and the project has been successfully finished.

Here is the source folder which contains the improvements I made to the package.

https://drive.google.com/drive/folders/0B8xO3Cc0h6rIbXNDTUM1TDlMbDQ?usp=sharing

Here are the commits done throughout the project.

https://github.com/malithj/postCP_Improvement/commits/feature-glmsyntax?author=malithj
https://github.com/malithj/postCP_Improvement/commits/feature-model?author=malithj

The final package can be found in the link below.

https://github.com/malithj/postCP_Improvement

 

Leave a Reply