Posts

Final Project Package

Image
OncoMarker: Targeted Genomic Biomarker Discovery in Breast Cancer Introduction High-throughput sequencing has transformed oncology, producing massive datasets that describe gene expression, mutations, and epigenetic alterations. While this wealth of information has propelled research, whole-genome analyses often overwhelm computational pipelines and slow clinical translation. Cancer research generates massive genomic data, but analyzing all ~20,000 genes is computationally heavy and hard to interpret.  OncoMarker addresses this challenge by providing a streamlined R framework for analyzing targeted gene panels , enabling researchers and clinicians to quickly identify differential expression patterns, visualize results, and stratify patients based on biomarker risk. The package is designed to be accessible, operating on pre-processed expression matrices rather than raw sequencing data. Ideology The philosophy of OncoMarker is simple: "The Simple Twist" . Instead of starting w...

Assignment 12

Image
  Assignment 12:  My R Markdown Primer: Bioinformatics Workflow Objectives I explored R Markdown and its capabilities for creating reproducible reports, particularly in bioinformatics . R Markdown allows you to combine narrative text, code, and output in a single document, which is especially useful for RNA-Seq analyses where workflows can be complex and data-intensive. I practiced: Writing narrative text with headings, bullet points, and emphasis. Using inline and displayed LaTeX equations , for example, the Hardy-Weinberg equilibrium : Inline: $p^2 + 2pq + q^2 = 1$ Displayed: p 2 + 2 p q + q 2 = 1 p^2 + 2pq + q^2 = 1 This helped me document mathematical models used in bioinformatics clearly. Integration of Code and Narrative I loaded the airway RNA-Seq dataset from Bioconductor. Converted it into a DESeq2 object and performed variance stabilizing transformation (VST). Conducted PCA analysis to visualize sample clustering. All of this was embedd...

Assignment 11

Image
  Assignment 11: Debugging Turkey Outlier Function in R Objectives For this assignment, I debugged an R function called tukey_multiple() that was supposed to identify rows in a numeric matrix whose values are outliers in every column according to the Tukey rule (1.5 × IQR). The function initially contained a deliberate bug involving a logical operator that prevented it from running correctly. The steps included: Reproducing the error. Diagnosing the bug. Fixing the code. Validating the fix. Adding defensive programming checks. Documenting the debugging workflow. R Code Reproduce the Error Error Message Diagnosing the Bug && is a scalar (short-circuit) operator In R, && only evaluates the first element of each vector. Its output is a single TRUE or FALSE, not a vector . outliers[, j] and tukey.outlier(x[, j]) are vectors In the test case, each column has length 10. Using && tries to combine two vectors into a s...

Assignment 10

Image
  Assignment 10: Building Your Own R Package Objectives The Premitha package is an R package designed to simplify common data analysis and visualization tasks. It is intended for students, researchers, and analysts who want to quickly explore datasets, summarize numeric variables, and create basic plots without writing repetitive code. R Code # Step 0: Install tools install.packages("devtools") install.packages("roxygen2") library(devtools) library(roxygen2) # Step 1: Create your package setwd("/Users/premithapagadala/Documents/R_Programming_Fall2025_Pagadala_Premitha/Assignments/Assignment_10_Create_Package") create("Premitha") setwd("Premitha") #working directory inside your package # Step 2: Write DESCRIPTION file desc_lines <- ' Package: Premitha Title: Simple Tools for Streamlined Data Analysis Version: 0.0.0.9000 Authors@R:      person("Premitha", "Pagadala",             email = "premithapagadala@g...

Assignment 9

Image
  Assignment 9: Visualization in R – Base Graphics, Lattice, and ggplot2 Objectives Compare three major visualization systems in R: Base graphics , Lattice , and ggplot2 . Apply all three to the same dataset to observe syntactic , conceptual , and visual differences . Produce clear, reproducible code and interpret key findings. Dataset Overview For this assignment, I used the Breast Cancer Wisconsin Diagnostic dataset ( brca ) from the dslabs package — a common bioinformatics dataset used to classify tumor samples as Benign (B) or Malignant (M) based on cell nucleus measurements. This dataset contains features computed from breast mass cell nuclei. brca$y = Diagnosis (Benign or Malignant) brca$x = 30 numeric features describing cell characteristics Load and explore the data # Load libraries and dataset install.packages("dslabs") library(dslabs) data("brca") # 'brca$y' contains the diagnosis (Benign/ Malignant) # 'brca$x' is a matrix of 30...

Assignment 8

Image
  Assignment 8 — Input/Output, String Manipulation, and the plyr Package Objectives This assignment focused on using R to handle input/output operations , perform data summarization using the plyr package, and apply string manipulation techniques. By completing this exercise, I learned how to import data files, compute grouped statistics, filter data based on character patterns, and export processed results in multiple formats. R Code # Step 1: Import dataset into R # Choose file interactively or specify directly student6 <- read.table(   file.choose(),       header = TRUE,   sep = ",",   stringsAsFactors = FALSE ) # Step 2: Install and load plyr package install.packages("plyr") library(plyr) # Compute mean Grade by Sex gender_mean <- ddply(   student6,   "Sex",   summarise,   GradeAverage = mean(Grade, na.rm = TRUE) ) # Step 3: Write the grouped means to a text file write.table(   gender_mean,   file = "gend...

Assignment 7

Image
  Assignment 7: Exploring R’s Object-Oriented Systems (S3 & S4) In this assignment, I explored R's object-oriented systems, focusing on S3 and S4 classes. The objective was to understand how R dispatches generic functions based on object class and to practice creating and manipulating S3 and S4 objects. Objectives The goal of this assignment was to explore R’s two primary object-oriented systems— S3 and S4 —and understand how generic functions work with different object types. Specifically, we: Selected and inspected a dataset. Tested base R generic functions. Created custom S3 and S4 objects. Compared the behavior and structure of both systems. R Code Step 1: Loading the Dataset We begin by using one of R’s built-in datasets, PlantGrowth , which measures the effect of different treatments on plant growth. # Load dataset data("PlantGrowth") head(PlantGrowth) # Describe structure str(PlantGrowth) Output:  Step 2: Test Generic Functions # Summarize datas...