Last updated: 2024-03-02

Checks: 5 1

Knit directory: ~/OneDrive - University of Iowa/PhD portfolio/8. Teaching Assistant/5. Spring 2024 (Lesa Hoffman)/PSQF 6270 Generalized Linear Models/3. Examples/PSQF6270_Logistic_Regression_Demonstration/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Tracking code development and connecting the code version to the results is critical for reproducibility. To start using Git, open the Terminal and type git init in your project directory.


This project is not being versioned with Git. To obtain the full reproducibility benefits of using workflowr, please see ?wflow_start.


Prologue

This is a research project I worked on in 2021. This is an updated version (2024) that adds some new ideas and R-code to the original material. This document is intended to show some basic options for manipulating data and analyzing and presenting results using the R environment. Hopefully you can use it as a reference in the future!

About Rmarkdown Files

This is an Rmarkdown file, which means that we can use the R environment to create different types of files: html, pdf, or word. There are countless options for customizing Rmarkdown files. A great advantage of these files is that the code is reproducible. If you are interested in creating your own Rmarkdown files, visit this website. Here are some basic instructions for the html file that you are reading:

# Rmarkdown setup

# if (!require("klippy")) install.packages("klippy")
library(klippy) # To visualize code in the file
klippy(position = c('top', 'right'), color = 'darkred', tooltip_message = 'Click to copy')

library(knitr) # To set the characteristics of the chunks of code in the file
opts_chunk$set(echo = TRUE, # The chunk will be displayed, not only the results
               warning = FALSE, # Not to show warning messages
               message = FALSE, # Not to show messages about packages
               cache = FALSE) # No to rerun code which results are already in the environment

# biblio <- bibtex::read.bib("Ref1.bib") # List of references included in the document

library(multcomp)     # To perform linear combinations
library(readr)        # To load excel files into R environment
library(dplyr)        # To manipulate data
library(forcats)      # To manipulate factors
library(kableExtra)   # To create and customize tables for html files
library(ggplot2)      # To graph results
library(broom)        # To visualize model results in a 'tidy' way
library(flextable)    # To create and customize tables for word files
library(SurrogateRsq) # # To compute Surrogate R2. For addition info, you can visit https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12289
library(DiagrammeR)   # To create flow diagrams
library(patchwork)    # To rearrange plots
library(dvmisc)       # To estimate probabilities from logit

Introduction

This document contains details about the procedure carried out in the research Impact of Academic and Socioeconomic Factors on University Admission in Chile. In particular, the objective of this material is to show some basic functions for manipulating, analyzing and visualizing data in the R environment. Throughout this example, we will be using functions from the dplyr package, which greatly facilitate the actions necessary to clean and organize our data.

For pedagogical purposes, along with the dplyr package functions we will be using base functions of the R environment. Personally, I recommend using the base functions as much as possible, since when using package functions we are depending on the specific version we used. Also, the need for packages can affect the reproducibility of our work.

If you are interested in learning how to use the functions of the dplyr package and the R environment in general, I recommend you to visit the web page of this [course] (https://psqf6250.brandonlebeau.org/). It is Computer Packages for Statistical Analysis (PSQF:6250:0EXW), a course taught by Professor Brandon Lebeau (EMS program) that will be available next fall semester 2024 (highly recommended if you are just starting to learn how to use the R environment.).

Data sets

# Loading each dataset

registration <- read_csv2("C:/Users/geral/OneDrive - University of Iowa/CIDCIE/1. Bases de Datos/PSU/PSU 2019 - 2021/A_Inscritos_puntajes/A_INSCRITOS_PUNTAJES_PDT_2021_PUB_MRUN.csv")

socioeconomic <- read_csv2("C:/Users/geral/OneDrive - University of Iowa/CIDCIE/1. Bases de Datos/PSU/PSU 2019 - 2021/B_Socioeconomico/B_SOCIOECONOMICO_DOMICILIO_PDT_2021_PUB_MRUN.csv")

application <-read_csv2("C:/Users/geral/OneDrive - University of Iowa/CIDCIE/1. Bases de Datos/PSU/PSU 2019 - 2021/C_Postulacion_Seleccion/C_POSTULANTES_SELECCION_PDT_2021_PUB_MRUN.csv")

admission <- read_csv2("C:/Users/geral/OneDrive - University of Iowa/CIDCIE/1. Bases de Datos/PSU/PSU 2019 - 2021/D_Matricula_MRUN/D_MATRICULA_PDT__2021_PUB_MRUN.csv")

Data munging

One of the most important steps in any research is to prepare our data for analysis. Here are some ways to filter and select the data needed for our study.

Note: %>% this symbol is known as ‘pipe’, and it allows us to chain operations, which helps to reduce the code needed to perform some actions. Most of the functions or ‘verbs’ you will see in data manipulation, such as select, filter, group_by, summarise, mutate, or arrange, are part of the dplyr package.

From each dataset, we will extract student and school characterization variables. In addition, we will filter our data to include only those students who are first-time participants in the admissions process and who have learned from the general national curriculum. Finally, we will create some variables with the information we want in our analysis.

# Select important variables from first stage ----------------------------------
dat_reg <- registration %>%
  
  dplyr::select(MRUN,                # Student ID
         COD_SEXO,            # Biological Sex (1 = Male; 2 = Female)
         RBD,                 # School ID
         CODIGO_ENS,          # Type of Curriculum (310, 410, 510, and 610 correspond to National Curriculum)
         RAMA_EDUCACIONAL,    # School Type (H = Humanist; T = Technical)
         DEPENDENCIA,         # School Sector (1,2,5 = Public; 3 = Priv Subsidized; 4 = Private)
         PROMEDIO_NOTAS,      # GPA
         PTJE_NEM,            # Standardized GPA score (350-850)
         PTJE_RANKING,        # Ranking score based on student GPA (350-850)
         CLEC_ACTUAL,         # Language Admission Test score (350-850)
         MATE_ACTUAL,         # Math Admission Test score (350-850)
         PUNTAJES_PROCESO) %>% # Year of Admission (1 = 2021, 2 = 2020, 3 = other)
  
  filter(PUNTAJES_PROCESO == 1 & CODIGO_ENS %in% c('310', '410', '510', '610')) %>% #Filtering data
  
  dplyr::select(MRUN, COD_SEXO, RBD, RAMA_EDUCACIONAL, DEPENDENCIA, PROMEDIO_NOTAS,
         PTJE_NEM, PTJE_RANKING,CLEC_ACTUAL,MATE_ACTUAL) %>% # Once we filter the data, we select the variables of interest.
  
  mutate(                                             # Now we can create new variables
    gender = ifelse(COD_SEXO == 2, 1,0),              # For biological sex, Male is the reference
    type_hs = ifelse(RAMA_EDUCACIONAL == 'H1', 0,1),  # For school type, Humanist is the reference
    sector_sub = ifelse(DEPENDENCIA == 3, 1, 0),      # For school sector, Public is the reference
    sector_priv = ifelse(DEPENDENCIA == 4, 1, 0)) %>% # For school sector, Public is the reference
  
  dplyr::select(MRUN,RBD,PROMEDIO_NOTAS, PTJE_NEM, PTJE_RANKING, CLEC_ACTUAL,MATE_ACTUAL,
         gender, type_hs, sector_sub, sector_priv) # Selecting our new variables of interest

# Giving new names to the variables in our database
colnames(dat_reg) <- c('MRUN', 'RBD', 'gpa', 'NEM', 'RKG', 'LEN', 'MATH', 'gender', 'type_hs','sector_sub', 'sector_priv')

rbind(
  head(dat_reg[order(dat_reg$RBD),]),
  tail(dat_reg[order(dat_reg$RBD),]))
# A tibble: 12 × 11
       MRUN   RBD   gpa   NEM   RKG   LEN  MATH gender type_hs sector_sub
      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>   <dbl>      <dbl>
 1 23290465     1  5.43   504   504   377   431      0       1          0
 2  9350214     1  4.9    382   382    NA    NA      0       1          0
 3  5305385     1  5.03   412   412    NA    NA      0       1          0
 4 18264046     1  5.18   446   446    NA    NA      0       1          0
 5 17963535     1  5.3    474   474   283   334      1       1          0
 6 15899462     1  6.68   771   850   515   203      1       1          0
 7  3836426 41702  6.48   727   770   551   563      1       0          0
 8  1063824 41702  6.08   641   659   515   529      0       0          0
 9 18359771 41702  5.53   523   523   490   493      1       0          0
10 20895102 41780  5.78   577   590   629   570      1       0          0
11  5939548 99999  6.15   656   693   675   706      0       0          0
12 21086254 99999 NA        0     0   596   475      0       0          0
# ℹ 1 more variable: sector_priv <dbl>
dat_reg <- dat_reg %>% filter(RBD != 99999)

# Select important variables from second stage ---------------------------------
dat_soc <- socioeconomic %>%
  dplyr::select(MRUN,                       # School ID
         INGRESO_PERCAPITA_GRUPO_FA, # Family Income Decile
         EDUCACION_MADRE,            # Mother Education 
         EDUCACION_PADRE)            # Father Education 

dat_reg_soc <- left_join(dat_reg, dat_soc, by = 'MRUN') # Merging socioeconomic and academic information

dat_reg_soc <- dat_reg_soc %>%
  mutate(
    inc = INGRESO_PERCAPITA_GRUPO_FA,
    ed_m = EDUCACION_MADRE,
    ed_d = EDUCACION_PADRE,
    ed_dad = ifelse(ed_d >= 0 & ed_d <=5, 0,
                    ifelse(ed_d >= 6 & ed_d <= 9, 1,
                           ifelse(ed_d == 13, NA, 2))), # Father Education (1 = Primary; 2 = Secondary; 3 = Higher Education)
    ed_mom = ifelse(ed_m >= 1 & ed_m <= 5, 0,
                    ifelse(ed_m >= 6 & ed_m <= 9, 1,
                           ifelse(ed_m == 13, NA, 2))), # Mother Education (1 = Primary; 2 = Secondary; 3 = Higher Education)
    income = ifelse(inc == 1|inc == 2, 0,
                    ifelse(inc == 3|inc == 4, 1,
                           ifelse(inc == 5|inc == 6, 2,
                                  ifelse(inc == 7|inc == 8, 3, 4))))) %>% # Family Income Quintile
  
  dplyr::select('MRUN', 'gpa', 'NEM', 'RKG', 'LEN', 'MATH',
         'gender', 'type_hs', 'sector_sub', 'sector_priv', 'ed_dad', 'ed_mom', 'income') # Selecting our new set of variables

# Select important variables from third stage ----------------------------------
dat_app <- application %>%
  dplyr::select(MRUN,                         # School ID
         SITUACION_POSTULANTE) %>%     # Application Status (1 = valid, 2 = invalid)
  
  mutate(
    appl = ifelse(SITUACION_POSTULANTE == 'P', 1,0)) %>%
  
  dplyr::select(MRUN, appl) # Variables of interest

dat_reg_soc_app <- left_join(dat_reg_soc, dat_app, by = 'MRUN') # Merging socioeconomic, academic, and application information

# Select important variables from fourth stage ---------------------------------
dat_adm <- admission %>%
  dplyr::select(MRUN, VIA_ADMISION) %>%
  
  mutate(
    admitted = ifelse(VIA_ADMISION > 0, 1, 0)) %>% # Dependent variable (Admitted = 1: Not admitted = 0)
  filter(!is.na(MRUN)) %>% # Removing NAs
  dplyr::select(MRUN, admitted) %>%
  unique() # Filtering unique IDs

dat_four_stages <- left_join(dat_reg_soc_app, dat_adm, by = 'MRUN') # Merging socioeconomic, academic, application, and admission information

# Giving new names to the variables in our database
colnames(dat_four_stages) <- c('ID', 'GPA', 'NEM', 'RKG', 'LEN', 'MATH', 'GENDER', 'TYPE',
                               'SECTOR_SUB', 'SECTOR_PRIV', 'ED_DAD', 'ED_MOM', 'INCOME', 'APPLY', 'ADM')

# New variables for the coming analysis
dat_four_stages <- dat_four_stages %>%
  mutate(
    NEM = ifelse(NEM == 0, NA, NEM),
    RKG = ifelse(RKG == 0, NA, RKG),
    LEN = ifelse(LEN == 0, NA, LEN),
    MATH = ifelse(MATH == 0, NA, MATH),
    ED_SECONDARY = ifelse(ED_MOM == 1, 1, 0),
    ED_TERTIARY =   ifelse(ED_MOM == 2, 1, 0),
    INCOME_Q2 = ifelse(INCOME == 1, 1,0),
    INCOME_Q3 = ifelse(INCOME == 2, 1,0),
    INCOME_Q4 = ifelse(INCOME == 3, 1,0),
    INCOME_Q5 = ifelse(INCOME == 4, 1,0),
    APPLY = ifelse(is.na(APPLY), 0, APPLY),
    ADM = ifelse(is.na(ADM), 0, 1))

# Labels for columns
dat_analysis <- dat_four_stages %>%
  dplyr::select('GPA', 'NEM', 'RKG', 'LEN', 'MATH',
         'GENDER', 'TYPE', 'SECTOR_SUB', 'SECTOR_PRIV',
         'INCOME_Q2', 'INCOME_Q3', 'INCOME_Q4', 'INCOME_Q5',
         'APPLY', 'ADM', 'ED_SECONDARY', 'ED_TERTIARY', 'INCOME') %>%
  filter(LEN > 0 & MATH > 0)

# Some variables must be factors to be included in the ggplot plots. We did this after realizing the usefulness of creating variables with certain characteristics instead of doing it in the ggplot function.
dat_analysis <- dat_analysis %>% 
  mutate(
    Sector = ifelse(SECTOR_SUB == 1, 1,
                    ifelse(SECTOR_PRIV == 1,2,0)),
    Sector = factor(Sector),
    Sector = fct_recode(Sector,
                        'Public' = "0",
                        'Subsidized Private' = "1",
                        'Private'= "2"),
    Fam_Income = factor(INCOME),
    Fam_Income = fct_recode(Fam_Income,
                            'Income Q1' = "0",
                            'Income Q2' = "1",
                            'Income Q3'= "2",
                            'Income Q4' = '3',
                            'Income Q5' = '4'),
    Mother_Ed = ifelse(ED_SECONDARY == 1, 1,
                       ifelse(ED_TERTIARY == 1,2,0)),
    Mother_Ed = factor(Mother_Ed),
    Mother_Ed = fct_recode(Mother_Ed,
                           'Primary Education' = "0",
                           'Secondary Education' = "1",
                           'Tertiary Education'= "2"))

# write.table(dat_analysis, file = "dat_analysis.csv", sep = ",") # Ee save the latest data set to be loaded easily
# remove(list=ls()) # Once the data manipulation is done, we can remove the objects in the R environment that are not necessary
# dat_analysis <- read_csv("C:/Users/geral/OneDrive - University of Iowa/.../dat_analysis.csv") # we reload our dataset

kable_styling(kable(data.frame(head(dat_analysis, 10)), digits = 3,
                    caption = 'Table 1: Dataset'),
              bootstrap_options = c('hover', 'responsive', 'bordered'),
              full_width = F, position = 'center', font_size =12) %>% 
  scroll_box(height = '250px')
Table 1: Dataset
GPA NEM RKG LEN MATH GENDER TYPE SECTOR_SUB SECTOR_PRIV INCOME_Q2 INCOME_Q3 INCOME_Q4 INCOME_Q5 APPLY ADM ED_SECONDARY ED_TERTIARY INCOME Sector Fam_Income Mother_Ed
5.40 496 497 675 674 0 0 1 0 1 0 0 0 0 0 1 0 1 Subsidized Private Income Q2 Secondary Education
5.43 504 504 377 431 0 1 0 0 0 0 1 0 0 0 0 0 3 Public Income Q4 Primary Education
5.70 559 568 304 475 1 0 0 0 0 0 0 0 0 0 1 0 0 Public Income Q1 Secondary Education
5.65 550 553 562 535 0 1 0 0 0 0 0 1 1 1 0 0 4 Public Income Q5 Primary Education
5.83 589 595 366 319 1 1 0 0 0 0 0 0 0 0 0 0 0 Public Income Q1 Primary Education
4.95 391 391 466 443 1 0 0 0 0 0 1 0 1 0 0 0 3 Public Income Q4 Primary Education
6.83 803 850 495 540 1 0 0 0 0 1 0 0 0 0 1 0 2 Public Income Q3 Secondary Education
5.70 559 574 355 509 0 0 0 0 1 0 0 0 0 0 1 0 1 Public Income Q2 Secondary Education
6.03 631 631 440 595 1 0 0 1 0 0 0 1 1 1 0 1 4 Private Income Q5 Tertiary Education
5.48 513 531 209 516 0 0 0 0 1 0 0 0 0 0 0 0 1 Public Income Q2 Primary Education

Sample description

When we were analyzing the data, we realized that many students were not applying to selective programs, even though they had taken the entrance exams and had valid scores. In order not to overestimate the probability of not being admitted, we eliminated students who did not apply to a selective program.

grViz("digraph flowchart {

label= 'Flowchart Admission process 2021';
labelloc=top;
labeljust=left;

# node definitions with substituted label text
node [fontname = Arial, shape = rectangle, fontsize = 10]        
tab1 [label = '@@1']
tab2 [label = '@@2']
tab3 [label = '@@3']
tab4 [label = '@@4']
tab5 [label = '@@5']

# edge definitions with the node IDs
tab1 -> tab2;
tab2 -> tab3;
tab3 -> tab4;
tab4 -> tab5
}

[1]: 'Registration (n = 276,059)'
[2]: 'Participants 2021 graduates of regular education levels (n = 211,985)'
[3]: 'Students taking the PSU Tests (n = 184,221)'
[4]: 'Students who applied to a university (n = 90,120)'
[5]: 'Admitted students (75,544)'
")
# We decided to remove students who did not apply to any selective program so as not to overestimate the probability of not bein admitted to a selective program.
dat_analysis <- dat_analysis %>%
  filter(APPLY == 1)

# Given that the relative fit of the models requires the same sample size between competitive models, we removed incomplete cases. 
dat_analysis <- dat_analysis %>%
  filter(complete.cases(.)) # Final dataset

# Academic predictors had a scale between 150 and 850 points. To avoid small coefficients, we divided these variables into 10. In addition, we centered these predictors on 550 (55), which represents average performance.
dat_analysis <-  dat_analysis %>% 
  mutate(
  NEM_55 = ((NEM/10) - 55),
  RKG_55 = ((RKG/10) - 55),
  LEN_55 = ((LEN/10) - 55),
  MATH_55 = ((MATH/10) - 55)) # We centered academic predictors at 550 points, given that this score represents an average performance

After filtering our data, we can analyze some characteristics of the sample.

prop_inc <- dat_analysis %>%
  count(Fam_Income) %>%
  mutate(prop = (n/sum(n))*100) %>%
  dplyr::select(Fam_Income, prop)%>%
  rename(var = Fam_Income)

prop_sector <- dat_analysis %>%
  count(Sector) %>%
  mutate(prop = (n/sum(n))*100) %>%
  dplyr::select(Sector, prop) %>%
  rename(var = Sector)

prop_type <- dat_analysis %>%
  count(TYPE) %>%
  mutate(prop = (n/sum(n))*100) %>%
  dplyr::select(TYPE, prop) %>%
  rename(var = TYPE)%>%
    mutate(var = ifelse(var == 0, 'Humanist', 'Technical school'))

prop_mom <- dat_analysis %>%
  count(Mother_Ed) %>%
  mutate(prop = (n/sum(n))*100) %>%
  dplyr::select(Mother_Ed, prop) %>%
  rename(var = Mother_Ed)

prop_gen <- dat_analysis %>%
  count(GENDER) %>%
  mutate(prop = (n/sum(n))*100) %>%
  dplyr::select(GENDER, prop) %>%
  rename(var = GENDER) %>%
    mutate(var = ifelse(var == 1, 'Female', 'Male'))

sample <- dplyr::bind_rows(
  prop_inc,
  prop_sector,
  prop_type,
  prop_mom,
  prop_gen)
sample
# A tibble: 15 × 2
   var                  prop
   <chr>               <dbl>
 1 Income Q1            24.7
 2 Income Q2            23.1
 3 Income Q3            15.0
 4 Income Q4            15.2
 5 Income Q5            21.9
 6 Public               27.1
 7 Subsidized Private   53.1
 8 Private              19.8
 9 Humanist             86.5
10 Technical school     13.5
11 Primary Education    11.6
12 Secondary Education  47.1
13 Tertiary Education   41.3
14 Male                 44.7
15 Female               55.3

Table 1

Sample Description (n = 87,995)

kable(sample,
      digits = 2,
      col.names = c('Variable', '%'),
      align = 'lc')%>%
  kable_styling(bootstrap_options = c('striped','condensed', 'bordered'),
                full_width = F, position = 'center', font_size =12)%>%
  column_spec(1, italic = T)%>%
  row_spec(0, bold = T)%>%
  kableExtra::group_rows('Family Income Quintile', 1, 5) %>%
  kableExtra::group_rows('School Sector', 6, 8) %>%
  kableExtra::group_rows('School Type', 9, 10)%>%
  kableExtra::group_rows('Education Mother', 11, 13) %>%
  kableExtra::group_rows('Biological Sex', 14, 15)
Variable %
Family Income Quintile
Income Q1 24.74
Income Q2 23.14
Income Q3 14.95
Income Q4 15.22
Income Q5 21.94
School Sector
Public 27.11
Subsidized Private 53.13
Private 19.76
School Type
Humanist 86.50
Technical school 13.50
Education Mother
Primary Education 11.61
Secondary Education 47.11
Tertiary Education 41.28
Biological Sex
Male 44.65
Female 55.35

Results: Bivariate Analysis

Table 2

Academic Factor Scores by Socioeconomic Attributes (n = 87,995)

gender_means <- dat_analysis %>%
  group_by(GENDER)%>%
  summarise(
    mean_len = mean(LEN),
    sd_len = sd(LEN),
    mean_math = mean(MATH),
    sd_math = sd(MATH),
    mean_nem = mean(NEM),
    sd_nem = sd(NEM),
    mean_rkg = mean(RKG),
    sd_rkg = sd(RKG))%>%
  rename(var = GENDER)%>%
  mutate(var = ifelse(var == 0, 'Male', 'Female'))

type_means <- dat_analysis %>%
  group_by(TYPE)%>%
  summarise(
    mean_len = mean(LEN),
    sd_len = sd(LEN),
    mean_math = mean(MATH),
    sd_math = sd(MATH),
    mean_nem = mean(NEM),
    sd_nem = sd(NEM),
    mean_rkg = mean(RKG),
    sd_rkg = sd(RKG))%>%
  rename(var = TYPE)%>%
  mutate(var = ifelse(var == 0, 'Humanist', 'Technical'))

sector_means <- dat_analysis %>%
  group_by(Sector)%>%
  summarise(
    mean_len = mean(LEN),
    sd_len = sd(LEN),
    mean_math = mean(MATH),
    sd_math = sd(MATH),
    mean_nem = mean(NEM),
    sd_nem = sd(NEM),
    mean_rkg = mean(RKG),
    sd_rkg = sd(RKG))%>%
  rename(var = Sector)

parents_means <- dat_analysis %>%
  group_by(Mother_Ed)%>%
  summarise(
    mean_len = mean(LEN),
    sd_len = sd(LEN),
    mean_math = mean(MATH),
    sd_math = sd(MATH),
    mean_nem = mean(NEM),
    sd_nem = sd(NEM),
    mean_rkg = mean(RKG),
    sd_rkg = sd(RKG))%>%
  rename(var = Mother_Ed)

income_means <- dat_analysis %>%
  group_by(Fam_Income)%>%
  summarise(
    mean_len = mean(LEN),
    sd_len = sd(LEN),
    mean_math = mean(MATH),
    sd_math = sd(MATH),
    mean_nem = mean(NEM),
    sd_nem = sd(NEM),
    mean_rkg = mean(RKG),
    sd_rkg = sd(RKG))%>%
  rename(var = Fam_Income)

means <- dat_analysis %>%
  summarise(
    mean_len = mean(LEN),
    sd_len = sd(LEN),
    mean_math = mean(MATH),
    sd_math = sd(MATH),
    mean_nem = mean(NEM),
    sd_nem = sd(NEM),
    mean_rkg = mean(RKG),
    sd_rkg = sd(RKG))
means <- cbind(var = 'Avg',means)

data_means <- dplyr::bind_rows(
  gender_means,
  type_means,
  sector_means,
  parents_means,
  income_means,
  means)

kable(data_means,
      digits = 2,
      col.names = c('Variable', 'Mean', 'SD','Mean', 'SD','Mean', 'SD','Mean', 'SD'),
      align = 'lcccccccc')%>%
  kable_styling(bootstrap_options = c('striped','condensed', 'bordered'),
                full_width = F, position = 'center', font_size =12)%>%
  column_spec(1, italic = T)%>%
  row_spec(0, bold = T)%>%
  add_header_above(c('', 'Language' = 2, 'Math' = 2, 'NEM' = 2, 'Ranking' = 2))%>%
  kableExtra::group_rows('Biological Sex', 1, 2) %>%
  kableExtra::group_rows('School Type', 3, 4) %>%
  kableExtra::group_rows('School Sector', 5,7)%>%
  kableExtra::group_rows('Education Mother', 8,10) %>%
  kableExtra::group_rows('Family Income Quintile', 11, 15) %>%
  kableExtra::group_rows('General', 16,16)
Language
Math
NEM
Ranking
Variable Mean SD Mean SD Mean SD Mean SD
Biological Sex
Male 564.39 84.50 580.21 89.13 629.37 100.26 653.70 117.11
Female 558.57 83.79 548.34 86.51 653.26 96.17 682.67 113.15
School Type
Humanist 567.97 83.49 571.80 87.98 646.74 97.62 671.16 113.58
Technical 517.58 74.85 503.43 71.96 616.04 101.64 660.60 128.97
School Sector
Public 544.55 82.40 532.50 84.06 638.60 99.20 669.42 118.34
Subsidized Private 553.75 79.35 551.85 78.65 629.85 97.98 658.96 118.00
Private 603.93 85.09 632.66 85.98 682.37 89.31 699.12 100.36
Education Mother
Primary Education 528.40 76.88 521.20 78.02 627.13 99.43 661.54 121.81
Secondary Education 549.84 79.86 545.99 80.99 632.38 98.07 661.67 117.65
Tertiary Education 583.32 85.31 593.13 91.15 658.60 97.04 681.23 110.95
Family Income Quintile
Income Q1 540.51 78.77 532.29 79.43 627.62 98.91 658.73 119.86
Income Q2 552.90 80.21 547.55 81.27 634.51 98.57 664.48 118.50
Income Q3 559.74 81.74 557.29 82.36 638.18 96.88 666.21 115.43
Income Q4 564.07 83.93 565.67 84.86 640.66 97.86 666.47 114.75
Income Q5 592.15 86.82 614.03 92.44 672.39 94.27 692.35 105.99
General
Avg 561.17 84.16 562.57 89.11 642.60 98.73 669.73 115.83

Figure 1

Average Scores in the Admission Factors by Mother Education (n = 87,995)

plotmathsec <- ggplot(dat_analysis, aes(x=Mother_Ed, y = MATH, fill = Mother_Ed))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Education", y = "Math", fill = "Education")+
  scale_x_discrete(labels = c('','',''))+
  scale_fill_discrete(guide = "none")+
  scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
  theme(axis.title.x=element_blank())

plotlensec <- ggplot(dat_analysis, aes(x=Mother_Ed, y = LEN, fill = Mother_Ed))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Education", y = "Language", fill = "Education")+
  scale_fill_discrete(guide = 'none')+
  scale_x_discrete(labels = c('','',''))+
  scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
  theme(axis.title.x=element_blank())

plotrkgsec <- ggplot(dat_analysis, aes(x=Mother_Ed, y = RKG, fill = Mother_Ed))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Education", y = "Ranking", fill = "Education")+
  scale_x_discrete(labels = c('','',''))+
  scale_fill_discrete(guide = 'none')+
  scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
  theme(axis.title.x=element_blank())

plotnemsec <- ggplot(dat_analysis, aes(x=Mother_Ed, y = NEM, fill = Mother_Ed))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Education", y = "NEM", fill = "Education")+
  scale_x_discrete(labels = c('','',''))+
  scale_fill_discrete(guide = "none")+
  scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
  theme(axis.title.x=element_blank())

(plotmathsec+plotlensec)/(plotrkgsec+plotnemsec)

Figure 2

Average Scores in the Admission Factors by Family Income Quintile (n = 87,995)

plotmathsec <- ggplot(dat_analysis, aes(x=Fam_Income, y = MATH, fill = Fam_Income))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Family Income", y = "Mathematics", fill = "Family Income")+
  scale_x_discrete(labels = c('','','','',''))+
  scale_fill_discrete(guide = "none")+
  scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
  theme(axis.title.x=element_blank())

plotlensec <- ggplot(dat_analysis, aes(x=Fam_Income, y = LEN, fill = Fam_Income))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Family Income", y = "Language", fill = "Family Income")+
  scale_fill_discrete(guide = 'none')+
  scale_x_discrete(labels = c('','','','',''))+
  scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
  theme(axis.title.x=element_blank())

plotrkgsec <- ggplot(dat_analysis, aes(x=Fam_Income, y = RKG, fill = Fam_Income))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Family Income", y = "Ranking", fill = "Family Income")+
  scale_x_discrete(labels = c('','','','',''))+
  scale_fill_discrete(guide = "none")+
  scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
  theme(axis.title.x=element_blank())

plotnemsec <- ggplot(dat_analysis, aes(x=Fam_Income, y = NEM, fill = Fam_Income))+
  geom_boxplot()+
  theme_bw()+
  labs(x = "Family Income", y = "NEM", fill = "Family Income")+
  scale_x_discrete(labels = c('','','','',''))+
  #scale_fill_discrete(guide = "none")+
  scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
  theme(axis.title.x=element_blank())

(plotmathsec+plotlensec)/(plotrkgsec+plotnemsec)

Binary Logistic Regression

# First, we fit an empty or null model to check what the model is fitting
Empty_mod<- glm(ADM ~ 1, family = binomial(link = 'logit'), data = dat_analysis)
summary(Empty_mod)

tidy(Empty_mod) # From broom, 

# The model is predicting the logit of the probability of being admitted
round(exp(coef(Empty_mod)) / (1 + exp(coef(Empty_mod))),4)
round(prop.table(table(dat_analysis$ADM)),4)

Call:
glm(formula = ADM ~ 1, family = binomial(link = "logit"), data = dat_analysis)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.7333   0.7097   0.7097   0.7097   0.7097  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 1.250361   0.008103   154.3   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 93314  on 87994  degrees of freedom
Residual deviance: 93314  on 87994  degrees of freedom
AIC: 93316

Number of Fisher Scoring iterations: 4

# A tibble: 1 × 5
  term        estimate std.error statistic p.value
  <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 (Intercept)     1.25   0.00810      154.       0
(Intercept) 
     0.7774 

     0      1 
0.2226 0.7774 
# Now, we fit and compare different model versions until a satisfactory solution is found

# Full model
adm_mod1 <- glm(
  formula = ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 + 
    GENDER + INCOME_Q2 + INCOME_Q3 + INCOME_Q4 + INCOME_Q5 + 
    ED_SECONDARY + ED_TERTIARY + SECTOR_SUB + SECTOR_PRIV + TYPE,
  family = binomial(link = 'logit'), 
  data = dat_analysis)

summary(adm_mod1)

# Second version of the model (- School Type)
adm_mod2 <- glm(
  formula = ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 + 
    GENDER + INCOME_Q2 + INCOME_Q3 + INCOME_Q4 + INCOME_Q5 + 
    ED_SECONDARY + ED_TERTIARY + SECTOR_SUB + SECTOR_PRIV, 
  family = binomial(link = 'logit'), 
  data = dat_analysis)

summary(adm_mod2)

# LRT
DevTest = -2*(logLik(adm_mod2) - logLik(adm_mod1))
RegPvalue = pchisq((DevTest), df=1, lower.tail=FALSE)
MixPvalue = RegPvalue/2
DevTest; RegPvalue; MixPvalue

# Third version of the model ( - Education Mother)
adm_mod3 <- glm(
  formula = ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 + 
    GENDER + INCOME_Q2 + INCOME_Q3 + INCOME_Q4 + INCOME_Q5 + 
    SECTOR_SUB + SECTOR_PRIV, 
  family = binomial(link = 'logit'), 
  data = dat_analysis)

summary(adm_mod3)

# LRT comparing the second vs the third version
DevTest = -2*(logLik(adm_mod3) - logLik(adm_mod2))
RegPvalue = pchisq((DevTest), df=2, lower.tail=FALSE)
MixPvalue = RegPvalue/2
DevTest; RegPvalue; MixPvalue

wald_adm_mod3 = glht(adm_mod2, linfct = c('NEM_55 = 0', 'RKG_55 = 0', 'LEN_55 = 0', 'MATH_55 = 0',
                                          'GENDER = 0', 'INCOME_Q2 = 0', 'INCOME_Q3 = 0', 'INCOME_Q4 = 0', 'INCOME_Q5 = 0',
                                          'SECTOR_SUB = 0' , 'SECTOR_PRIV = 0'))
summary(wald_adm_mod3, test = Chisqtest()) # Joint chi-square test

adm_mod3_1 <- glm(
  formula = ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 + 
    GENDER + INCOME_Q2 + INCOME_Q3 + INCOME_Q4 + INCOME_Q5 + 
    SECTOR_SUB + SECTOR_PRIV, 
  family = binomial(link = 'probit'), 
  data = dat_analysis)

According to the results of the relative fit between models, mother education and school type were statistically non-significant predictors, so they were removed. Our final results were reported in Table 3.

Table 3

Binary Logistic Regression Results (n = 87,995)

mod_results <- tidy(adm_mod3)
mod_results <- round(mod_results[2:5],3)
coef <- data.frame(adm_mod3$coefficients)
coef_names <- rownames(coef)
flextable(cbind(coef_names,mod_results)) # Coefficients in Logit scale

coef_names

estimate

std.error

statistic

p.value

(Intercept)

1.352

0.024

55.270

0.000

NEM_55

-0.011

0.004

-2.593

0.010

RKG_55

0.022

0.003

6.247

0.000

LEN_55

0.024

0.001

20.901

0.000

MATH_55

0.036

0.001

29.528

0.000

GENDER

-0.335

0.018

-18.917

0.000

INCOME_Q2

-0.064

0.023

-2.765

0.006

INCOME_Q3

-0.075

0.027

-2.790

0.005

INCOME_Q4

-0.108

0.027

-4.007

0.000

INCOME_Q5

-0.023

0.029

-0.775

0.438

SECTOR_SUB

-0.071

0.019

-3.655

0.000

SECTOR_PRIV

0.165

0.032

5.116

0.000

surr_rsq(adm_mod3_1, adm_mod3_1, dat_analysis, avg.num = 100) # Surrogate R2
------------------------------------------------------------------------ 
The surrogate R-squared of the model 
------------------------------------------------------------------------ 
ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 + GENDER + INCOME_Q2 +  
    INCOME_Q3 + INCOME_Q4 + INCOME_Q5 + SECTOR_SUB + SECTOR_PRIV 
------------------------------------------------------------------------ 
is: 
[1]  0.10161
exp(adm_mod3$coefficients) # Computing Odds Ratio for each predictor
(Intercept)      NEM_55      RKG_55      LEN_55     MATH_55      GENDER 
  3.8659858   0.9890920   1.0219394   1.0243255   1.0366175   0.7152494 
  INCOME_Q2   INCOME_Q3   INCOME_Q4   INCOME_Q5  SECTOR_SUB SECTOR_PRIV 
  0.9376850   0.9281447   0.8974563   0.9775428   0.9318013   1.1795144 

Figure 3

Binary Logistic Regression Results (n = 87,995)

dat <- read.table(header=TRUE, text='
Fixed_effects               Odds_Ratio
NEM_550                     0.989
Ranking_550                 1.022
Language_550                1.024
Mathematics_550             1.037
Sex_Female                  0.715
Income_Q2                     0.938
Income_Q3                     0.928
Income_Q4                     0.897
Income_Q5                     0.978
Sector_Private_Subsidized     0.932
Sector_Private              1.180')

acad_factors <- ggplot(dat[c(1:4),], aes(x = Odds_Ratio, y = Fixed_effects))+
  geom_point(size = 2.5)+
  geom_vline(xintercept = 1, linetype="dashed", 
             color = "blue", size=1) +
  scale_x_continuous(limits = c(.70, 1.25), breaks = seq(.70, 1.25, 0.1))+
  labs(x = 'Odds Ratio', y = 'Predictors')+
  scale_y_discrete(limits=rev)+
  theme_bw(base_size = 14)+
  ggtitle("Academic factors")+
  theme(axis.title = element_text(size = 12), plot.title = element_text(size = 12))

nonacad_factors <- ggplot(dat[c(5:11),], aes(x = Odds_Ratio, y = Fixed_effects))+
  geom_point(size = 2.5)+
  geom_vline(xintercept = 1, linetype="dashed", 
             color = "blue", size=1) +
  scale_x_continuous(limits = c(.70, 1.25), breaks = seq(.70, 1.25, 0.1))+
  labs(x = 'Odds Ratio', y = '')+
  scale_y_discrete(limits=rev)+
  theme_bw(base_size = 14)+
  ggtitle("Non-academic factors")+
  theme(axis.title = element_text(size = 12), plot.title = element_text(size = 12))

acad_factors+nonacad_factors

Figure 4

Estimated Probabilities of Being Admitted (n = 87,995)

prob_mod <- dat_analysis %>%
  dplyr::select(ADM,NEM_55,RKG_55,LEN_55,MATH_55,GENDER,INCOME_Q2,INCOME_Q3,INCOME_Q4,INCOME_Q5,SECTOR_SUB,SECTOR_PRIV) %>%
  mutate(probability = predict(adm_mod3, type = 'response'))
probs <- (summary(glht(model=adm_mod3, linfct=rbind(
  "Males_Pub_Q1" = c(1,0,0,0,0,0,0,0,0,0,0,0),
  "Males_Pub_Q2" = c(1,0,0,0,0,0,1,0,0,0,0,0),
  "Males_Pub_Q3" = c(1,0,0,0,0,0,0,1,0,0,0,0),
  "Males_Pub_Q4" = c(1,0,0,0,0,0,0,0,1,0,0,0),
  "Males_Pub_Q5" = c(1,0,0,0,0,0,0,0,0,1,0,0),
  "Males_Sub_Q1" = c(1,0,0,0,0,0,0,0,0,0,1,0),
  "Males_Sub_Q2" = c(1,0,0,0,0,0,1,0,0,0,1,0),
  "Males_Sub_Q3" = c(1,0,0,0,0,0,0,1,0,0,1,0),
  "Males_Sub_Q4" = c(1,0,0,0,0,0,0,0,1,0,1,0),
  "Males_Sub_Q5" = c(1,0,0,0,0,0,0,0,0,1,1,0),
  "Males_Pri_Q1" = c(1,0,0,0,0,0,0,0,0,0,0,1),
  "Males_Pri_Q2" = c(1,0,0,0,0,0,1,0,0,0,0,1),
  "Males_Pri_Q3" = c(1,0,0,0,0,0,0,1,0,0,0,1),
  "Males_Pri_Q4" = c(1,0,0,0,0,0,0,0,1,0,0,1),
  "Males_Pri_Q5" = c(1,0,0,0,0,0,0,0,0,1,0,1),
  "Females_Pub_Q1" = c(1,0,0,0,0,1,0,0,0,0,0,0),
  "Females_Pub_Q2" = c(1,0,0,0,0,1,1,0,0,0,0,0),
  "Females_Pub_Q3" = c(1,0,0,0,0,1,0,1,0,0,0,0),
  "Females_Pub_Q4" = c(1,0,0,0,0,1,0,0,1,0,0,0),
  "Females_Pub_Q5" = c(1,0,0,0,0,1,0,0,0,1,0,0),
  "Females_Sub_Q1" = c(1,0,0,0,0,1,0,0,0,0,1,0),
  "Females_Sub_Q2" = c(1,0,0,0,0,1,1,0,0,0,1,0),
  "Females_Sub_Q3" = c(1,0,0,0,0,1,0,1,0,0,1,0),
  "Females_Sub_Q4" = c(1,0,0,0,0,1,0,0,1,0,1,0),
  "Females_Sub_Q5" = c(1,0,0,0,0,1,0,0,0,1,1,0),
  "Females_Pri_Q1" = c(1,0,0,0,0,1,0,0,0,0,0,1),
  "Females_Pri_Q2" = c(1,0,0,0,0,1,1,0,0,0,0,1),
  "Females_Pri_Q3" = c(1,0,0,0,0,1,0,1,0,0,0,1),
  "Females_Pri_Q4" = c(1,0,0,0,0,1,0,0,1,0,0,1),
  "Females_Pri_Q5" = c(1,0,0,0,0,1,0,0,0,1,0,1))),test=adjusted("none")))

logit_pr <- logit_prob(probs[["test"]][["coefficients"]]) # Inverse logit link function
logit_pr <- data.frame(logit_pr)
logit_pr
                logit_pr
Males_Pub_Q1   0.7944918
Males_Pub_Q2   0.7837874
Males_Pub_Q3   0.7820493
Males_Pub_Q4   0.7762640
Males_Pub_Q5   0.7907585
Males_Sub_Q1   0.7827188
Males_Sub_Q2   0.7715774
Males_Sub_Q3   0.7697701
Males_Sub_Q4   0.7637572
Males_Sub_Q5   0.7788312
Males_Pri_Q1   0.8201434
Males_Pri_Q2   0.8104564
Males_Pri_Q3   0.8088804
Males_Pri_Q4   0.8036285
Males_Pri_Q5   0.8167687
Females_Pub_Q1 0.7344059
Females_Pub_Q2 0.7216682
Females_Pub_Q3 0.7196094
Females_Pub_Q4 0.7127754
Females_Pub_Q5 0.7299521
Females_Sub_Q1 0.7204021
Females_Sub_Q2 0.7072605
Females_Sub_Q3 0.7051387
Females_Sub_Q4 0.6980999
Females_Sub_Q5 0.7158043
Females_Pri_Q1 0.7653423
Females_Pri_Q2 0.7535905
Females_Pri_Q3 0.7516866
Females_Pri_Q4 0.7453577
Females_Pri_Q5 0.7612386
tab_prob <- cbind(
a = data.frame(logit_pr[1:5,]),
b = data.frame(logit_pr[6:10,]),
c = data.frame(logit_pr[11:15,]),
d = data.frame(logit_pr[16:20,]),
e = data.frame(logit_pr[21:25,]),
f = data.frame(logit_pr[26:30,]))

colnames(tab_prob) <- c('male_pub', 'male_sub', 'male_priv','fem_pub', 'fem_sub', 'fem_priv')
rownames(tab_prob) <- c('Q1', 'Q2', 'Q3','Q4', 'Q5')
flextable(cbind(x = c('Q1', 'Q2', 'Q3','Q4', 'Q5'), round(tab_prob,3)))

x

male_pub

male_sub

male_priv

fem_pub

fem_sub

fem_priv

Q1

0.794

0.783

0.820

0.734

0.720

0.765

Q2

0.784

0.772

0.810

0.722

0.707

0.754

Q3

0.782

0.770

0.809

0.720

0.705

0.752

Q4

0.776

0.764

0.804

0.713

0.698

0.745

Q5

0.791

0.779

0.817

0.730

0.716

0.761

Figure 5

Estimate Probabilities of Admission According to Biological Sex, School Sector, and Family Income Quintile (n = 87,995)

data_prob <- data.frame(Quintil = c('Q1', 'Q2', 'Q3', 'Q4', 'Q5'), tab_prob)

males_probs <- ggplot(data_prob, aes(x = Quintil, group=1))+
  geom_point(aes(y = male_pub))+
  geom_line(aes(y = male_pub), color = 'red', size = 1)+
  geom_point(aes(y = male_sub))+
  geom_line(aes(y = male_sub), color = 'green', size = 1)+
  geom_point(aes(y = male_priv))+
  geom_line(aes(y = male_priv), color = 'blue', size = 1)+
  scale_y_continuous(limits = c(.6, .9), breaks = seq(.6, .9, .1))+
  labs(x = 'Quintile family income', y = 'Probability of admission')+
  theme_bw(base_size = 14)+
  ggtitle("Males")+
  theme(axis.title = element_text(size = 12), plot.title = element_text(size = 12))

females_probs <- ggplot(data_prob, aes(x = Quintil, group=1))+
  geom_point(aes(y = fem_pub))+
  geom_line(aes(y = fem_pub), color = 'red', size = 1)+
  geom_point(aes(y = fem_sub))+
  geom_line(aes(y = fem_sub), color = 'green', size = 1)+
  geom_point(aes(y = fem_priv))+
  geom_line(aes(y = fem_priv), color = 'blue', size = 1)+
  scale_y_continuous(limits = c(.6, .9), breaks = seq(.6, .9, .1))+
  labs(x = 'Quintile family income', y = '')+
  theme_bw(base_size = 14)+
  ggtitle("Females")+
  theme(axis.title = element_text(size = 12), plot.title = element_text(size = 12))

males_probs+females_probs


sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8 
[2] LC_CTYPE=English_United States.utf8   
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] dvmisc_1.1.4       rbenchmark_1.0.0   patchwork_1.2.0    DiagrammeR_1.0.11 
 [5] SurrogateRsq_0.2.0 scales_1.3.0       progress_1.2.2     PAsso_0.1.10      
 [9] DescTools_0.99.50  flextable_0.9.3    broom_1.0.5        ggplot2_3.5.0     
[13] kableExtra_1.3.4   forcats_1.0.0      dplyr_1.1.3        readr_2.1.4       
[17] multcomp_1.4-25    TH.data_1.1-2      MASS_7.3-58.2      survival_3.5-7    
[21] mvtnorm_1.2-3      knitr_1.43         klippy_0.0.0.9500 

loaded via a namespace (and not attached):
  [1] readxl_1.4.3            uuid_1.1-1              backports_1.4.1        
  [4] copula_1.1-3            workflowr_1.7.1         VGAM_1.1-9             
  [7] systemfonts_1.0.4       plyr_1.8.8              lazyeval_0.2.2         
 [10] copBasic_2.2.3          splines_4.2.3           pspline_1.0-19         
 [13] digest_0.6.33           foreach_1.5.2           htmltools_0.5.6        
 [16] fansi_1.0.4             magrittr_2.0.3          tzdb_0.4.0             
 [19] vroom_1.6.3             officer_0.6.2           stabledist_0.7-1       
 [22] sandwich_3.0-2          svglite_2.1.1           askpass_1.2.0          
 [25] prettyunits_1.1.1       gfonts_0.2.0            colorspace_2.1-0       
 [28] rvest_1.0.3             mitools_2.4             textshaping_0.3.6      
 [31] xfun_0.40               crayon_1.5.2            jsonlite_1.8.7         
 [34] Exact_3.2               zoo_1.8-12              iterators_1.0.14       
 [37] glue_1.6.2              gtable_0.3.4            webshot_0.5.5          
 [40] lmomco_2.4.14           fontquiver_0.2.1        DBI_1.1.3              
 [43] GGally_2.2.1            Rcpp_1.0.11             viridisLite_0.4.2      
 [46] xtable_1.8-4            bit_4.0.5               randtoolbox_2.0.4      
 [49] proxy_0.4-27            survey_4.2-1            stats4_4.2.3           
 [52] fontLiberation_0.1.0    htmlwidgets_1.6.2       httr_1.4.7             
 [55] RColorBrewer_1.1-3      ellipsis_0.3.2          farver_2.1.1           
 [58] pkgconfig_2.0.3         sass_0.4.7              utf8_1.2.3             
 [61] crul_1.4.0              tidyselect_1.2.0        rlang_1.1.1            
 [64] later_1.3.1             tab_5.1.1               visNetwork_2.1.2       
 [67] munsell_0.5.0           cellranger_1.1.0        tools_4.2.3            
 [70] cachem_1.0.8            cli_3.6.1               generics_0.1.3         
 [73] evaluate_0.21           stringr_1.5.0           fastmap_1.1.1          
 [76] yaml_2.3.7              ragg_1.2.5              goftest_1.2-3          
 [79] bit64_4.0.5             fs_1.6.3                zip_2.3.0              
 [82] purrr_1.0.2             rootSolve_1.8.2.4       mime_0.12              
 [85] rngWELL_0.10-9          xml2_1.3.5              compiler_4.2.3         
 [88] rstudioapi_0.15.0       plotly_4.10.3           curl_5.0.2             
 [91] e1071_1.7-13            tibble_3.2.1            bslib_0.5.1            
 [94] pcaPP_2.0-4             stringi_1.7.12          gsl_2.1-8              
 [97] highr_0.10              gdtools_0.3.3           lattice_0.20-45        
[100] Matrix_1.6-1            fontBitstreamVera_0.1.1 vctrs_0.6.3            
[103] pillar_1.9.0            lifecycle_1.0.3         ADGofTest_0.3          
[106] jquerylib_0.1.4         data.table_1.14.8       lmom_3.0               
[109] httpuv_1.6.11           R6_2.5.1                promises_1.2.1         
[112] gridExtra_2.3           gld_2.6.6               codetools_0.2-19       
[115] Lmoments_1.3-1          ggstats_0.5.1           boot_1.3-28.1          
[118] assertthat_0.2.1        openssl_2.1.0           rprojroot_2.0.3        
[121] withr_2.5.0             httpcode_0.3.0          parallel_4.2.3         
[124] expm_0.999-7            hms_1.1.3               grid_4.2.3             
[127] tidyr_1.3.0             class_7.3-21            rmarkdown_2.24         
[130] git2r_0.33.0            numDeriv_2016.8-1.1     shiny_1.7.5