Assignment 11

 

Assignment 11: Debugging Turkey Outlier Function in R

Objectives

For this assignment, I debugged an R function called tukey_multiple() that was supposed to identify rows in a numeric matrix whose values are outliers in every column according to the Tukey rule (1.5 × IQR). The function initially contained a deliberate bug involving a logical operator that prevented it from running correctly.

The steps included:

  1. Reproducing the error.

  2. Diagnosing the bug.

  3. Fixing the code.

  4. Validating the fix.

  5. Adding defensive programming checks.

  6. Documenting the debugging workflow.

R Code

Reproduce the Error

Error Message

Diagnosing the Bug

  1. && is a scalar (short-circuit) operator

    • In R, && only evaluates the first element of each vector.

    • Its output is a single TRUE or FALSE, not a vector.

  2. outliers[, j] and tukey.outlier(x[, j]) are vectors

    • In the test case, each column has length 10.

    • Using && tries to combine two vectors into a single logical value.

  3. Assignment mismatch

    • The left-hand side outliers[, j] expects a vector of length 10.

    • The right-hand side produces a single TRUE/FALSE.

    • R cannot assign a single value to a vector of length >1 in this context, leading to the error.

  1. The correct operator is &

    • & performs element-wise logical AND between vectors.

    • This allows each element of outliers[, j] to be combined with the corresponding element of tukey.outlier(x[, j]).

Fixing the Function

### Corrected Tukey multiple outlier function
tukey.outlier <- function(v) {
  q1 <- quantile(v, 0.25)
  q3 <- quantile(v, 0.75)
  iqr <- q3 - q1
  lower <- q1 - 1.5 * iqr
  upper <- q3 + 1.5 * iqr
  v < lower | v > upper
}
corrected_tukey <- function(x) {
  
  # Defensive programming checks
  if (!is.matrix(x)) {
    stop("Input must be a matrix.")
  }
  if (!is.numeric(x)) {
    stop("Matrix must be numeric.")
  }
  outliers <- array(TRUE, dim = dim(x))
  for (j in seq_len(ncol(x))) {
    # FIXED: element-wise AND (&) instead of scalar AND (&&)
    outliers[, j] <- outliers[, j] & tukey.outlier(x[, j])
  }
  outlier.vec <- logical(nrow(x))
  for (i in seq_len(nrow(x))) {
    outlier.vec[i] <- all(outliers[i, ])
  }
  return(outlier.vec)
}

# Test matrix
set.seed(123)
test_mat <- matrix(rnorm(50), nrow = 10)

# Running this version works
corrected_tukey(test_mat)

Output

  • Returns a logical vector of length 10 (number of rows in test_mat).

  • Each element indicates whether the row contains outliers in all columns.

  • Function executes without error.

Test Matrix Data:

  • A random 10 × 5 numeric matrix (test_mat) was used to reproduce the bug.

  • Each cell contained a normally distributed value (rnorm) and was used to test the function’s ability to detect outliers.


Conclusion

Debugging the tukey_multiple() function highlighted the importance of understanding vectorized operations in R. The original error was caused by using && instead of &, which led to a length mismatch. After replacing it with the element-wise & and adding checks to ensure the input is a numeric matrix, the function now runs correctly and robustly. Console outputs and the test matrix confirm that the corrected function reliably identifies rows where all values are outliers, making the code both functional and user-friendly.

Comments

Popular posts from this blog

Assignment 5

Assignment 6

Assignment 2