Assignment 4

 

 Assignment 4: Visualizing and Interpreting Hospital Patient Data




In this assignment, I explored a small hospital dataset to visualize and interpret patient vitals, particularly blood pressure, alongside physician assessments. The goals were to practice data cleaning, handle missing values, create boxplots and histograms, and interpret trends in patient data. 

R Code

1. Data Preparation and Cleaning

# Define vectors

Frequency     <- c(0.6, 0.3, 0.4, 0.4, 0.2, 0.6, 0.3, 0.4, 0.9, 0.2)

BloodPressure <- c(103, 87, 32, 42, 59, 109, 78, 205, 135, 176)

FirstAssess   <- c(1, 1, 1, 1, 0, 0, 0, 0, NA, 1)    # bad=1, good=0

SecondAssess  <- c(0, 0, 1, 1, 0, 0, 1, 1, 1, 1)    # low=0, high=1

FinalDecision <- c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1)    # low=0, high=1

# Create dataframe

df_hosp <- data.frame(

  Frequency, BloodPressure, FirstAssess,

  SecondAssess, FinalDecision, stringsAsFactors = FALSE

)

# Inspect and handle NA:

df_hosp <- na.omit(df_hosp)

summary(df_hosp)

2. Generate Basic Visualizations : A. Side-by-Side Boxplots

    # Boxplot: Blood Pressure by First MD Assessment

    boxplot(

      BloodPressure ~ FirstAssess,

      data = df_hosp,

      names = c("Good","Bad"),

      ylab = "Blood Pressure",

      main = "BP by First MD Assessment"

    )

    # Boxplot: Blood Pressure by Second MD Assessment

    boxplot(

      BloodPressure ~ SecondAssess,

      data = df_hosp,

      names = c("Low","High"),

      ylab = "Blood Pressure",

      main = "BP by Second MD Assessment"

    )

    # Boxplot: Blood Pressure by Final Decision

    boxplot(

      BloodPressure ~ FinalDecision,

      data = df_hosp,

      names = c("Low","High"),

      ylab = "Blood Pressure",

      main = "BP by Final Decision"

    )

    B. Histograms of Frequency and Blood Pressure

    # Histogram of Visit Frequency

    hist(

      df_hosp$Frequency,

      breaks = seq(0, 1, by = 0.1),

      xlab = "Visit Frequency",

      main = "Histogram of Visit Frequency"

    )

    # Histogram of Blood Pressure

    hist(

      df_hosp$BloodPressure,

      breaks = 8,

      xlab = "Blood Pressure",

      main = "Histogram of Blood Pressure"

    )

    Output

    Summary

    Before Cleaning : 
    After Cleaning:   




    Plots

    Figure 1: Blood pressure grouped by first physician assessment.

    Figure 2: Blood pressure grouped by second physician assessment.

    Figure 3: Blood pressure grouped by final decision.

    Observations:

    • Higher blood pressure is observed in “Bad” or “High” categories.

    • The final decision “High” corresponds to higher median BP values.

    • Outliers (e.g., BP = 205) strongly influence boxplot whiskers.

    Figure 4: Distribution of patient visit frequencies.

    Figure 5: Distribution of blood pressure values.

    Observations:

    • Most visit frequencies fall between 0.2–0.6; one extreme at 0.9.

    • Blood pressure distribution is skewed with extremes (32, 42, 176, 205).

    • Outliers impact both mean and variability. 


    How BP relates to each assessment and final decision

    Patients labeled “bad” by the first doctor (1) and “high” by the second doctor (1) generally have higher blood pressure than those labeled “good” or “low.” The boxplots show that the “bad”/“high” groups have higher medians and wider ranges, while the “good”/“low” groups have lower medians and smaller ranges. The final decision of “high” also matches higher blood pressure. Overall, the doctors’ assessments and the final decisions generally match the measured blood pressure in this dataset.

    Notable patterns, outliers, and clinical implications

    The histograms show that most visit frequencies are between 0.2 and 0.6, with one very high value at 0.9. Blood pressure values vary widely: some very high (135, 176, 205) and some very low (32, 42). These extreme values affect the boxplots and make the data look more spread out. Clinically, it makes sense that higher blood pressure matches “high” assessments, but the extremes could be errors, unusual cases, or real health emergencies. Since the dataset is small and made-up, we should not overgeneralize.

    Limitations and NA handling

    The raw dataset contained 10 patients, but one patient had a missing value in FirstAssess (Frequency = 0.9, BP = 135). The summary statistics before cleaning reflected this NA. After applying na.omit(), the dataset was reduced to 9 patients. This removal had noticeable effects on the distributions: the maximum visit frequency dropped from 0.9 to 0.6, the median blood pressure decreased from 95 to 87, and the mean BP declined from 102.6 to 99. Quartiles also shifted downward, showing a narrower distribution without the moderately high BP case of 135. 

    As a result, the cleaned dataset shows slightly lower mean and median blood pressure, a narrower frequency range, and reduced variability in the boxplots. Importantly, the overall relationship between high blood pressure and “bad/high” assessments remained the same. This highlights how even a single missing value can shift distributions and visuals, underscoring the importance of carefully considering how to handle missing data.

    Comments

    Popular posts from this blog

    Assignment 5

    Assignment 6

    Assignment 2