Social Media

In the assignment I analyzed data from the 2016 GSS sample data, using it to estimate values of population parameters of interest about US adults. The GSS sample data file has 2867 observations of 935 variables, but I was only interested in very few of these variables and I am using a smaller file.

The General Social Survey (GSS) gathers data on American society in order to monitor and explain trends in attitudes, behaviors, and attributes. Many trends have been tracked for decades, so one can see the evolution of attitudes, etc in American Society.

Let’s first load the libraries which we will need to work on this assignment:

#loading libraries
library(tidyverse)  # Load ggplot2, dplyr, and all the other tidyverse packages
library(mosaic)
library(ggthemes)
library(lubridate)
library(here)
library(skimr)
library(janitor)
library(httr)
library(readxl)
library(vroom)

I prefer to first to look at the data before starting with my analysis:

#read data
gss <- read_csv(here::here("data", "smallgss2016.csv"), 
                na = c("", "Don't know",
                       "No answer", "Not applicable"))

## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   emailmin = col_character(),
##   emailhr = col_character(),
##   snapchat = col_character(),
##   instagrm = col_character(),
##   twitter = col_character(),
##   sex = col_character(),
##   degree = col_character()
## )

glimpse(gss)

## Rows: 2,867
## Columns: 7
## $ emailmin <chr> "0", "30", "NA", "10", "NA", "0", "0", "NA", "0", "NA", "0", …
## $ emailhr  <chr> "12", "0", "NA", "0", "NA", "2", "40", "NA", "0", "NA", "2", …
## $ snapchat <chr> "NA", "No", "No", "NA", "Yes", "No", "NA", "Yes", "NA", "No",…
## $ instagrm <chr> "NA", "No", "No", "NA", "Yes", "Yes", "NA", "Yes", "NA", "No"…
## $ twitter  <chr> "NA", "No", "No", "NA", "No", "No", "NA", "No", "NA", "No", "…
## $ sex      <chr> "Male", "Male", "Male", "Female", "Female", "Female", "Male",…
## $ degree   <chr> "Bachelor", "High school", "Bachelor", "High school", "Gradua…

Instagram and Snapchat, by sex

These are the relevant steps to calculate the population proportion of Snapchat or Instagram users in 2016:

Create a new variable, snap_insta that is Yes if the respondent reported using any of Snapchat (snapchat) or Instagram (instagrm), and No if not. For reported NA values, the value in the new created variable is also NA.

#Creating a new variable 'Snap_Insta'
snap_insta <- gss %>%
  mutate(snap_insta = if_else(snapchat == "NA" & instagrm == "NA", "NA", 
                              if_else(snapchat=="Yes" | instagrm == "Yes", "Yes", "No")))

snap_insta

## # A tibble: 2,867 x 8
##    emailmin emailhr snapchat instagrm twitter sex    degree         snap_insta
##    <chr>    <chr>   <chr>    <chr>    <chr>   <chr>  <chr>          <chr>     
##  1 0        12      NA       NA       NA      Male   Bachelor       NA        
##  2 30       0       No       No       No      Male   High school    No        
##  3 NA       NA      No       No       No      Male   Bachelor       No        
##  4 10       0       NA       NA       NA      Female High school    NA        
##  5 NA       NA      Yes      Yes      No      Female Graduate       Yes       
##  6 0        2       No       Yes      No      Female Junior college Yes       
##  7 0        40      NA       NA       NA      Male   High school    NA        
##  8 NA       NA      Yes      Yes      No      Female High school    Yes       
##  9 0        0       NA       NA       NA      Male   High school    NA        
## 10 NA       NA      No       No       No      Male   Junior college No        
## # … with 2,857 more rows

2.Calculate the proportion of Yes’s for snap_insta among those who answered the question, i.e. excluding NAs.

#Calculating proportion of 'snap_insta' users
snap_insta %>%
  filter(snap_insta != "NA") %>%
  summarize(
    Proportion_Insta_Snap = count(snap_insta =="Yes")/ n())

## # A tibble: 1 x 1
##   Proportion_Insta_Snap
##                   <dbl>
## 1                 0.375

Using the CI formula for proportions and thus constructing 95% CIs for men and women who used either Snapchat or Instagram

# CI for Male population
male_proportion <- snap_insta %>%
  filter(sex == "Male", snap_insta != "NA") %>%
  summarize(
    proportion = count(snap_insta == "Yes")/n(),
    se = sqrt((proportion*(1 - proportion)/n())),
    lower_ci = proportion - 1.96*se, #we are using normal distribution to approximate
                                     #binomial distribution and directly use 1.96 as the critical value
    upper_ci = proportion + 1.96*se) %>% 
  knitr::kable(caption = "95% CI for men who used either Snapchat or Instagram") %>%
  kableExtra::kable_styling()

# CI for Female population
female_proportion <- snap_insta %>%
  filter(sex == "Female", snap_insta != "NA") %>%
  summarize(
    proportion = count(snap_insta == "Yes")/n(),
    se = sqrt((proportion*(1 - proportion)/n())),
    lower_ci = proportion - 1.96*se,
    upper_ci = proportion + 1.96*se) %>% 
  knitr::kable(caption = "95% CI for women who used either Snapchat or Instagram") %>%
  kableExtra::kable_styling()

#print
male_proportion

(#tab:calculating 95% CI)95% CI for men who used either Snapchat or Instagram
proportion	se	lower_ci	upper_ci
0.318408	0.0189712	0.2812243	0.3555916

female_proportion

(#tab:calculating 95% CI)95% CI for women who used either Snapchat or Instagram
proportion	se	lower_ci	upper_ci
0.4187256	0.0177907	0.3838559	0.4535953

Looking at the 95% CI for both men and women, we can conclude that there is no overlap between them. However, we can see portions of female using either snapchat or insta is more compared to that of women.