Assignment 4
Assignment 4: Visualizing and Interpreting Hospital Patient Data
Assignment 4: Visualizing and Interpreting Hospital Patient Data
In this assignment, I explored a small hospital dataset to visualize and interpret patient vitals, particularly blood pressure, alongside physician assessments. The goals were to practice data cleaning, handle missing values, create boxplots and histograms, and interpret trends in patient data.
R Code
1. Data Preparation and Cleaning
# Define vectors
Frequency <- c(0.6, 0.3, 0.4, 0.4, 0.2, 0.6, 0.3, 0.4, 0.9, 0.2)
BloodPressure <- c(103, 87, 32, 42, 59, 109, 78, 205, 135, 176)
FirstAssess <- c(1, 1, 1, 1, 0, 0, 0, 0, NA, 1) # bad=1, good=0
SecondAssess <- c(0, 0, 1, 1, 0, 0, 1, 1, 1, 1) # low=0, high=1
FinalDecision <- c(0, 1, 0, 1, 0, 1, 0, 1, 1, 1) # low=0, high=1
# Create dataframe
df_hosp <- data.frame(
Frequency, BloodPressure, FirstAssess,
SecondAssess, FinalDecision, stringsAsFactors = FALSE
)
# Inspect and handle NA:
df_hosp <- na.omit(df_hosp)
summary(df_hosp)
2. Generate Basic Visualizations : A. Side-by-Side Boxplots
# Boxplot: Blood Pressure by First MD Assessment
boxplot(
BloodPressure ~ FirstAssess,
data = df_hosp,
names = c("Good","Bad"),
ylab = "Blood Pressure",
main = "BP by First MD Assessment"
)
# Boxplot: Blood Pressure by Second MD Assessment
boxplot(
BloodPressure ~ SecondAssess,
data = df_hosp,
names = c("Low","High"),
ylab = "Blood Pressure",
main = "BP by Second MD Assessment"
)
# Boxplot: Blood Pressure by Final Decision
boxplot(
BloodPressure ~ FinalDecision,
data = df_hosp,
names = c("Low","High"),
ylab = "Blood Pressure",
main = "BP by Final Decision"
)
B. Histograms of Frequency and Blood Pressure
# Histogram of Visit Frequency
hist(
df_hosp$Frequency,
breaks = seq(0, 1, by = 0.1),
xlab = "Visit Frequency",
main = "Histogram of Visit Frequency"
)
# Histogram of Blood Pressure
hist(
df_hosp$BloodPressure,
breaks = 8,
xlab = "Blood Pressure",
main = "Histogram of Blood Pressure"
)
Output
Summary
Plots
Observations:
-
Higher blood pressure is observed in “Bad” or “High” categories.
-
The final decision “High” corresponds to higher median BP values.
-
Outliers (e.g., BP = 205) strongly influence boxplot whiskers.
Observations:
-
Most visit frequencies fall between 0.2–0.6; one extreme at 0.9.
-
Blood pressure distribution is skewed with extremes (32, 42, 176, 205).
-
Outliers impact both mean and variability.
How BP relates to each assessment and final decision
Patients labeled “bad” by the first doctor (1) and “high” by the second doctor (1) generally have higher blood pressure than those labeled “good” or “low.” The boxplots show that the “bad”/“high” groups have higher medians and wider ranges, while the “good”/“low” groups have lower medians and smaller ranges. The final decision of “high” also matches higher blood pressure. Overall, the doctors’ assessments and the final decisions generally match the measured blood pressure in this dataset.
Notable patterns, outliers, and clinical implications
Limitations and NA handling
The raw dataset contained 10 patients, but one patient had a missing value in FirstAssess (Frequency = 0.9, BP = 135). The summary statistics before cleaning reflected this NA. After applying na.omit(), the dataset was reduced to 9 patients. This removal had noticeable effects on the distributions: the maximum visit frequency dropped from 0.9 to 0.6, the median blood pressure decreased from 95 to 87, and the mean BP declined from 102.6 to 99. Quartiles also shifted downward, showing a narrower distribution without the moderately high BP case of 135.
As a result, the cleaned dataset shows slightly lower mean and median blood pressure, a narrower frequency range, and reduced variability in the boxplots. Importantly, the overall relationship between high blood pressure and “bad/high” assessments remained the same. This highlights how even a single missing value can shift distributions and visuals, underscoring the importance of carefully considering how to handle missing data.
Comments
Post a Comment