Assignment 11
Assignment 11: Debugging Turkey Outlier Function in R
Objectives
For this assignment, I debugged an R function called tukey_multiple() that was supposed to identify rows in a numeric matrix whose values are outliers in every column according to the Tukey rule (1.5 × IQR). The function initially contained a deliberate bug involving a logical operator that prevented it from running correctly.
The steps included:
-
Reproducing the error.
-
Diagnosing the bug.
-
Fixing the code.
-
Validating the fix.
-
Adding defensive programming checks.
-
Documenting the debugging workflow.
R Code
Reproduce the Error
Error Message
-
&&is a scalar (short-circuit) operator-
In R,
&&only evaluates the first element of each vector. -
Its output is a single TRUE or FALSE, not a vector.
-
-
outliers[, j]andtukey.outlier(x[, j])are vectors-
In the test case, each column has length 10.
-
Using
&&tries to combine two vectors into a single logical value.
-
-
Assignment mismatch
-
The left-hand side
outliers[, j]expects a vector of length 10. -
The right-hand side produces a single TRUE/FALSE.
-
R cannot assign a single value to a vector of length >1 in this context, leading to the error.
-
The correct operator is
&-
&performs element-wise logical AND between vectors. -
This allows each element of
outliers[, j]to be combined with the corresponding element oftukey.outlier(x[, j]).
Fixing the Function
Output
Returns a logical vector of length 10 (number of rows in
test_mat).-
Each element indicates whether the row contains outliers in all columns.
-
Function executes without error.
Test Matrix Data:
-
A random 10 × 5 numeric matrix (
test_mat) was used to reproduce the bug. -
Each cell contained a normally distributed value (
rnorm) and was used to test the function’s ability to detect outliers.
Conclusion
Debugging the tukey_multiple() function highlighted the importance of understanding vectorized operations in R. The original error was caused by using && instead of &, which led to a length mismatch. After replacing it with the element-wise & and adding checks to ensure the input is a numeric matrix, the function now runs correctly and robustly. Console outputs and the test matrix confirm that the corrected function reliably identifies rows where all values are outliers, making the code both functional and user-friendly.
Comments
Post a Comment