Assignment 3

September 13, 2025

Assignment 3: Analyzing 2016 data “Poll” Data in R

In this assignment, I analyze a small dataset of polling results using R. The dataset compares two fictional polls, one from ABC and one from CBS, for seven political candidates. The purpose is to practice data wrangling, visualization, and interpretation with ggplot2 while also reflecting on how to properly use polling data.

R Code

# Step 1: Define data

Name <- c("Jeb", "Donald", "Ted", "Marco", "Carly", "Hillary", "Bernie")

ABC_poll <- c( 4, 62, 51, 21, 2, 14, 15)

CBS_poll <- c( 12, 75, 43, 19, 1, 21, 19)

# Step 2: Create data frame

df_polls <- data.frame(Name, ABC_poll, CBS_poll)

# Step 3: Inspect data structure and first few rows

str(df_polls)

head(df_polls)

# Step 4: Summary statistics

mean(df_polls$ABC_poll) # Mean ABC poll

median(df_polls$CBS_poll) # Median CBS poll

range(df_polls[, c("ABC_poll","CBS_poll")]) # Range for both polls

# Step 5: Add difference column

df_polls$Diff <- df_polls$CBS_poll - df_polls$ABC_pol

df_polls

# Step 6: Visualization with ggplot2

install.packages("ggplot2")

install.packages("tidyr")

library(tidyr)

df_long <- pivot_longer(df_polls, cols = c("ABC_poll", "CBS_poll"),

names_to = "Poll", values_to = "Value")

library(ggplot2)

ggplot(df_long, aes(x = Name, y = Value, fill = Poll)) + geom_col(position = "dodge") +

labs(title = "Comparison of 2016 Poll Results", x = "Candidate", y = "Poll Percentage",

fill = "Poll Source") + theme_minimal()

Output

1. Structure of DataFrame ( str() ):

2. First Rows ( head() )

3. Summary Statistics

Mean (average) of ABC poll values = 24.14286
Median (middle value) of CBS poll values = 19
Range of poll scores = 1 to 75

4. Data with Differences

Table showing each candidate’s ABC poll, CBS poll, and the calculated difference (CBS - ABC).

A ggplot2 bar chart comparing ABC and CBS values side by side for each candidate.

Figure 1: Bar Chart

Key Patterns in the Data

The ABC and CBS results show clear differences for some candidates. For example, Donald’s score is 62 in the ABC poll but 75 in the CBS poll, a gap of 13 points. Hillary also shows a difference, with 14 in ABC and 21 in CBS. In contrast, Carly and Marco’s results are much closer, suggesting more consistency between the two polls.

The summary statistics add another view. The mean score in the ABC poll is about 24, while the median score in the CBS poll is 19. This suggests CBS reported slightly higher values for certain candidates. The bar chart (Figure 1) makes these differences easy to compare side by side.

Limitations of Using Made-Up Data

This dataset is fictional, so the numbers do not represent real voter behavior. Made-up data is useful for practicing R and learning visualization, but it has no real-world meaning. The differences we observe are only examples created for training. In real analysis, using fabricated data without clear labeling could be misleading. Readers might think the numbers reflect actual public opinion. That is why it is important to state clearly that this dataset is for practice only.

Collecting and Validating Real Poll Data

For meaningful results, poll data should come from reliable sources like FiveThirtyEight, Pew Research, or Gallup. These groups provide details such as sample size, margin of error, and methodology, which help assess the trustworthiness of the numbers. To validate results, analysts should compare multiple polls, check when the surveys were conducted, and review whether questions were unbiased. Data should also be cleaned and checked for outliers before making graphs. This ensures that the analysis is accurate and reliable.

Search This Blog

R Programming Journal – Premitha Pagadala

Assignment 3

Assignment 3: Analyzing 2016 data “Poll” Data in R

R Code

Output

1. Structure of DataFrame ( str() ):

2. First Rows ( head() )

3. Summary Statistics

4. Data with Differences

Key Patterns in the Data

Limitations of Using Made-Up Data

Collecting and Validating Real Poll Data

Comments

Post a Comment

Popular posts from this blog

Assignment 5

Assignment 6

Assignment 2