Social Media
In the assignment I analyzed data from the 2016 GSS sample data, using it to estimate values of population parameters of interest about US adults. The GSS sample data file has 2867 observations of 935 variables, but I was only interested in very few of these variables and I am using a smaller file.
The General Social Survey (GSS) gathers data on American society in order to monitor and explain trends in attitudes, behaviors, and attributes. Many trends have been tracked for decades, so one can see the evolution of attitudes, etc in American Society.
Let’s first load the libraries which we will need to work on this assignment:
#loading libraries
library(tidyverse) # Load ggplot2, dplyr, and all the other tidyverse packages
library(mosaic)
library(ggthemes)
library(lubridate)
library(here)
library(skimr)
library(janitor)
library(httr)
library(readxl)
library(vroom)
I prefer to first to look at the data before starting with my analysis:
#read data
gss <- read_csv(here::here("data", "smallgss2016.csv"),
na = c("", "Don't know",
"No answer", "Not applicable"))
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## emailmin = col_character(),
## emailhr = col_character(),
## snapchat = col_character(),
## instagrm = col_character(),
## twitter = col_character(),
## sex = col_character(),
## degree = col_character()
## )
glimpse(gss)
## Rows: 2,867
## Columns: 7
## $ emailmin <chr> "0", "30", "NA", "10", "NA", "0", "0", "NA", "0", "NA", "0", …
## $ emailhr <chr> "12", "0", "NA", "0", "NA", "2", "40", "NA", "0", "NA", "2", …
## $ snapchat <chr> "NA", "No", "No", "NA", "Yes", "No", "NA", "Yes", "NA", "No",…
## $ instagrm <chr> "NA", "No", "No", "NA", "Yes", "Yes", "NA", "Yes", "NA", "No"…
## $ twitter <chr> "NA", "No", "No", "NA", "No", "No", "NA", "No", "NA", "No", "…
## $ sex <chr> "Male", "Male", "Male", "Female", "Female", "Female", "Male",…
## $ degree <chr> "Bachelor", "High school", "Bachelor", "High school", "Gradua…
Instagram and Snapchat, by sex
These are the relevant steps to calculate the population proportion of Snapchat or Instagram users in 2016:
Create a new variable, snap_insta that is Yes if the respondent reported using any of Snapchat (snapchat) or Instagram (instagrm), and No if not. For reported NA values, the value in the new created variable is also NA.
#Creating a new variable 'Snap_Insta'
snap_insta <- gss %>%
mutate(snap_insta = if_else(snapchat == "NA" & instagrm == "NA", "NA",
if_else(snapchat=="Yes" | instagrm == "Yes", "Yes", "No")))
snap_insta
## # A tibble: 2,867 x 8
## emailmin emailhr snapchat instagrm twitter sex degree snap_insta
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 0 12 NA NA NA Male Bachelor NA
## 2 30 0 No No No Male High school No
## 3 NA NA No No No Male Bachelor No
## 4 10 0 NA NA NA Female High school NA
## 5 NA NA Yes Yes No Female Graduate Yes
## 6 0 2 No Yes No Female Junior college Yes
## 7 0 40 NA NA NA Male High school NA
## 8 NA NA Yes Yes No Female High school Yes
## 9 0 0 NA NA NA Male High school NA
## 10 NA NA No No No Male Junior college No
## # … with 2,857 more rows
2.Calculate the proportion of Yes’s for snap_insta among those who answered the question, i.e. excluding NAs.
#Calculating proportion of 'snap_insta' users
snap_insta %>%
filter(snap_insta != "NA") %>%
summarize(
Proportion_Insta_Snap = count(snap_insta =="Yes")/ n())
## # A tibble: 1 x 1
## Proportion_Insta_Snap
## <dbl>
## 1 0.375
- Using the CI formula for proportions and thus constructing 95% CIs for men and women who used either Snapchat or Instagram
# CI for Male population
male_proportion <- snap_insta %>%
filter(sex == "Male", snap_insta != "NA") %>%
summarize(
proportion = count(snap_insta == "Yes")/n(),
se = sqrt((proportion*(1 - proportion)/n())),
lower_ci = proportion - 1.96*se, #we are using normal distribution to approximate
#binomial distribution and directly use 1.96 as the critical value
upper_ci = proportion + 1.96*se) %>%
knitr::kable(caption = "95% CI for men who used either Snapchat or Instagram") %>%
kableExtra::kable_styling()
# CI for Female population
female_proportion <- snap_insta %>%
filter(sex == "Female", snap_insta != "NA") %>%
summarize(
proportion = count(snap_insta == "Yes")/n(),
se = sqrt((proportion*(1 - proportion)/n())),
lower_ci = proportion - 1.96*se,
upper_ci = proportion + 1.96*se) %>%
knitr::kable(caption = "95% CI for women who used either Snapchat or Instagram") %>%
kableExtra::kable_styling()
#print
male_proportion
| proportion | se | lower_ci | upper_ci |
|---|---|---|---|
| 0.318408 | 0.0189712 | 0.2812243 | 0.3555916 |
female_proportion
| proportion | se | lower_ci | upper_ci |
|---|---|---|---|
| 0.4187256 | 0.0177907 | 0.3838559 | 0.4535953 |
Looking at the 95% CI for both men and women, we can conclude that there is no overlap between them. However, we can see portions of female using either snapchat or insta is more compared to that of women.