Last updated: 2024-03-02
Checks: 5 1
Knit directory: ~/OneDrive - University of
Iowa/PhD portfolio/8. Teaching Assistant/5. Spring 2024 (Lesa
Hoffman)/PSQF 6270 Generalized Linear Models/3.
Examples/PSQF6270_Logistic_Regression_Demonstration/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(12345)
was run prior to running the
code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Tracking code development and connecting the code version to the
results is critical for reproducibility. To start using Git, open the
Terminal and type git init
in your project directory.
This project is not being versioned with Git. To obtain the full
reproducibility benefits of using workflowr, please see
?wflow_start
.
This is a research project I worked on in 2021. This is an updated version (2024) that adds some new ideas and R-code to the original material. This document is intended to show some basic options for manipulating data and analyzing and presenting results using the R environment. Hopefully you can use it as a reference in the future!
This is an Rmarkdown file, which means that we can use the R environment to create different types of files: html, pdf, or word. There are countless options for customizing Rmarkdown files. A great advantage of these files is that the code is reproducible. If you are interested in creating your own Rmarkdown files, visit this website. Here are some basic instructions for the html file that you are reading:
# Rmarkdown setup
# if (!require("klippy")) install.packages("klippy")
library(klippy) # To visualize code in the file
klippy(position = c('top', 'right'), color = 'darkred', tooltip_message = 'Click to copy')
library(knitr) # To set the characteristics of the chunks of code in the file
opts_chunk$set(echo = TRUE, # The chunk will be displayed, not only the results
warning = FALSE, # Not to show warning messages
message = FALSE, # Not to show messages about packages
cache = FALSE) # No to rerun code which results are already in the environment
# biblio <- bibtex::read.bib("Ref1.bib") # List of references included in the document
library(multcomp) # To perform linear combinations
library(readr) # To load excel files into R environment
library(dplyr) # To manipulate data
library(forcats) # To manipulate factors
library(kableExtra) # To create and customize tables for html files
library(ggplot2) # To graph results
library(broom) # To visualize model results in a 'tidy' way
library(flextable) # To create and customize tables for word files
library(SurrogateRsq) # # To compute Surrogate R2. For addition info, you can visit https://bpspsychub.onlinelibrary.wiley.com/doi/epdf/10.1111/bmsp.12289
library(DiagrammeR) # To create flow diagrams
library(patchwork) # To rearrange plots
library(dvmisc) # To estimate probabilities from logit
This document contains details about the procedure carried out in the
research Impact of Academic and Socioeconomic Factors on University
Admission in Chile. In particular, the objective of this material
is to show some basic functions for manipulating, analyzing and
visualizing data in the R environment. Throughout this example, we will
be using functions from the dplyr
package, which greatly
facilitate the actions necessary to clean and organize our data.
For pedagogical purposes, along with the dplyr
package
functions we will be using base functions of the R environment.
Personally, I recommend using the base functions as much as possible,
since when using package functions we are depending on the specific
version we used. Also, the need for packages can affect the
reproducibility of our work.
If you are interested in learning how to use the functions of the dplyr package and the R environment in general, I recommend you to visit the web page of this [course] (https://psqf6250.brandonlebeau.org/). It is Computer Packages for Statistical Analysis (PSQF:6250:0EXW), a course taught by Professor Brandon Lebeau (EMS program) that will be available next fall semester 2024 (highly recommended if you are just starting to learn how to use the R environment.).
# Loading each dataset
registration <- read_csv2("C:/Users/geral/OneDrive - University of Iowa/CIDCIE/1. Bases de Datos/PSU/PSU 2019 - 2021/A_Inscritos_puntajes/A_INSCRITOS_PUNTAJES_PDT_2021_PUB_MRUN.csv")
socioeconomic <- read_csv2("C:/Users/geral/OneDrive - University of Iowa/CIDCIE/1. Bases de Datos/PSU/PSU 2019 - 2021/B_Socioeconomico/B_SOCIOECONOMICO_DOMICILIO_PDT_2021_PUB_MRUN.csv")
application <-read_csv2("C:/Users/geral/OneDrive - University of Iowa/CIDCIE/1. Bases de Datos/PSU/PSU 2019 - 2021/C_Postulacion_Seleccion/C_POSTULANTES_SELECCION_PDT_2021_PUB_MRUN.csv")
admission <- read_csv2("C:/Users/geral/OneDrive - University of Iowa/CIDCIE/1. Bases de Datos/PSU/PSU 2019 - 2021/D_Matricula_MRUN/D_MATRICULA_PDT__2021_PUB_MRUN.csv")
One of the most important steps in any research is to prepare our data for analysis. Here are some ways to filter and select the data needed for our study.
Note: %>% this symbol is known as ‘pipe’, and it
allows us to chain operations, which helps to reduce the code needed to
perform some actions. Most of the functions or ‘verbs’ you will see in
data manipulation, such as select
, filter
,
group_by
, summarise
, mutate
, or
arrange
, are part of the dplyr package.
From each dataset, we will extract student and school characterization variables. In addition, we will filter our data to include only those students who are first-time participants in the admissions process and who have learned from the general national curriculum. Finally, we will create some variables with the information we want in our analysis.
# Select important variables from first stage ----------------------------------
dat_reg <- registration %>%
dplyr::select(MRUN, # Student ID
COD_SEXO, # Biological Sex (1 = Male; 2 = Female)
RBD, # School ID
CODIGO_ENS, # Type of Curriculum (310, 410, 510, and 610 correspond to National Curriculum)
RAMA_EDUCACIONAL, # School Type (H = Humanist; T = Technical)
DEPENDENCIA, # School Sector (1,2,5 = Public; 3 = Priv Subsidized; 4 = Private)
PROMEDIO_NOTAS, # GPA
PTJE_NEM, # Standardized GPA score (350-850)
PTJE_RANKING, # Ranking score based on student GPA (350-850)
CLEC_ACTUAL, # Language Admission Test score (350-850)
MATE_ACTUAL, # Math Admission Test score (350-850)
PUNTAJES_PROCESO) %>% # Year of Admission (1 = 2021, 2 = 2020, 3 = other)
filter(PUNTAJES_PROCESO == 1 & CODIGO_ENS %in% c('310', '410', '510', '610')) %>% #Filtering data
dplyr::select(MRUN, COD_SEXO, RBD, RAMA_EDUCACIONAL, DEPENDENCIA, PROMEDIO_NOTAS,
PTJE_NEM, PTJE_RANKING,CLEC_ACTUAL,MATE_ACTUAL) %>% # Once we filter the data, we select the variables of interest.
mutate( # Now we can create new variables
gender = ifelse(COD_SEXO == 2, 1,0), # For biological sex, Male is the reference
type_hs = ifelse(RAMA_EDUCACIONAL == 'H1', 0,1), # For school type, Humanist is the reference
sector_sub = ifelse(DEPENDENCIA == 3, 1, 0), # For school sector, Public is the reference
sector_priv = ifelse(DEPENDENCIA == 4, 1, 0)) %>% # For school sector, Public is the reference
dplyr::select(MRUN,RBD,PROMEDIO_NOTAS, PTJE_NEM, PTJE_RANKING, CLEC_ACTUAL,MATE_ACTUAL,
gender, type_hs, sector_sub, sector_priv) # Selecting our new variables of interest
# Giving new names to the variables in our database
colnames(dat_reg) <- c('MRUN', 'RBD', 'gpa', 'NEM', 'RKG', 'LEN', 'MATH', 'gender', 'type_hs','sector_sub', 'sector_priv')
rbind(
head(dat_reg[order(dat_reg$RBD),]),
tail(dat_reg[order(dat_reg$RBD),]))
# A tibble: 12 × 11
MRUN RBD gpa NEM RKG LEN MATH gender type_hs sector_sub
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 23290465 1 5.43 504 504 377 431 0 1 0
2 9350214 1 4.9 382 382 NA NA 0 1 0
3 5305385 1 5.03 412 412 NA NA 0 1 0
4 18264046 1 5.18 446 446 NA NA 0 1 0
5 17963535 1 5.3 474 474 283 334 1 1 0
6 15899462 1 6.68 771 850 515 203 1 1 0
7 3836426 41702 6.48 727 770 551 563 1 0 0
8 1063824 41702 6.08 641 659 515 529 0 0 0
9 18359771 41702 5.53 523 523 490 493 1 0 0
10 20895102 41780 5.78 577 590 629 570 1 0 0
11 5939548 99999 6.15 656 693 675 706 0 0 0
12 21086254 99999 NA 0 0 596 475 0 0 0
# ℹ 1 more variable: sector_priv <dbl>
dat_reg <- dat_reg %>% filter(RBD != 99999)
# Select important variables from second stage ---------------------------------
dat_soc <- socioeconomic %>%
dplyr::select(MRUN, # School ID
INGRESO_PERCAPITA_GRUPO_FA, # Family Income Decile
EDUCACION_MADRE, # Mother Education
EDUCACION_PADRE) # Father Education
dat_reg_soc <- left_join(dat_reg, dat_soc, by = 'MRUN') # Merging socioeconomic and academic information
dat_reg_soc <- dat_reg_soc %>%
mutate(
inc = INGRESO_PERCAPITA_GRUPO_FA,
ed_m = EDUCACION_MADRE,
ed_d = EDUCACION_PADRE,
ed_dad = ifelse(ed_d >= 0 & ed_d <=5, 0,
ifelse(ed_d >= 6 & ed_d <= 9, 1,
ifelse(ed_d == 13, NA, 2))), # Father Education (1 = Primary; 2 = Secondary; 3 = Higher Education)
ed_mom = ifelse(ed_m >= 1 & ed_m <= 5, 0,
ifelse(ed_m >= 6 & ed_m <= 9, 1,
ifelse(ed_m == 13, NA, 2))), # Mother Education (1 = Primary; 2 = Secondary; 3 = Higher Education)
income = ifelse(inc == 1|inc == 2, 0,
ifelse(inc == 3|inc == 4, 1,
ifelse(inc == 5|inc == 6, 2,
ifelse(inc == 7|inc == 8, 3, 4))))) %>% # Family Income Quintile
dplyr::select('MRUN', 'gpa', 'NEM', 'RKG', 'LEN', 'MATH',
'gender', 'type_hs', 'sector_sub', 'sector_priv', 'ed_dad', 'ed_mom', 'income') # Selecting our new set of variables
# Select important variables from third stage ----------------------------------
dat_app <- application %>%
dplyr::select(MRUN, # School ID
SITUACION_POSTULANTE) %>% # Application Status (1 = valid, 2 = invalid)
mutate(
appl = ifelse(SITUACION_POSTULANTE == 'P', 1,0)) %>%
dplyr::select(MRUN, appl) # Variables of interest
dat_reg_soc_app <- left_join(dat_reg_soc, dat_app, by = 'MRUN') # Merging socioeconomic, academic, and application information
# Select important variables from fourth stage ---------------------------------
dat_adm <- admission %>%
dplyr::select(MRUN, VIA_ADMISION) %>%
mutate(
admitted = ifelse(VIA_ADMISION > 0, 1, 0)) %>% # Dependent variable (Admitted = 1: Not admitted = 0)
filter(!is.na(MRUN)) %>% # Removing NAs
dplyr::select(MRUN, admitted) %>%
unique() # Filtering unique IDs
dat_four_stages <- left_join(dat_reg_soc_app, dat_adm, by = 'MRUN') # Merging socioeconomic, academic, application, and admission information
# Giving new names to the variables in our database
colnames(dat_four_stages) <- c('ID', 'GPA', 'NEM', 'RKG', 'LEN', 'MATH', 'GENDER', 'TYPE',
'SECTOR_SUB', 'SECTOR_PRIV', 'ED_DAD', 'ED_MOM', 'INCOME', 'APPLY', 'ADM')
# New variables for the coming analysis
dat_four_stages <- dat_four_stages %>%
mutate(
NEM = ifelse(NEM == 0, NA, NEM),
RKG = ifelse(RKG == 0, NA, RKG),
LEN = ifelse(LEN == 0, NA, LEN),
MATH = ifelse(MATH == 0, NA, MATH),
ED_SECONDARY = ifelse(ED_MOM == 1, 1, 0),
ED_TERTIARY = ifelse(ED_MOM == 2, 1, 0),
INCOME_Q2 = ifelse(INCOME == 1, 1,0),
INCOME_Q3 = ifelse(INCOME == 2, 1,0),
INCOME_Q4 = ifelse(INCOME == 3, 1,0),
INCOME_Q5 = ifelse(INCOME == 4, 1,0),
APPLY = ifelse(is.na(APPLY), 0, APPLY),
ADM = ifelse(is.na(ADM), 0, 1))
# Labels for columns
dat_analysis <- dat_four_stages %>%
dplyr::select('GPA', 'NEM', 'RKG', 'LEN', 'MATH',
'GENDER', 'TYPE', 'SECTOR_SUB', 'SECTOR_PRIV',
'INCOME_Q2', 'INCOME_Q3', 'INCOME_Q4', 'INCOME_Q5',
'APPLY', 'ADM', 'ED_SECONDARY', 'ED_TERTIARY', 'INCOME') %>%
filter(LEN > 0 & MATH > 0)
# Some variables must be factors to be included in the ggplot plots. We did this after realizing the usefulness of creating variables with certain characteristics instead of doing it in the ggplot function.
dat_analysis <- dat_analysis %>%
mutate(
Sector = ifelse(SECTOR_SUB == 1, 1,
ifelse(SECTOR_PRIV == 1,2,0)),
Sector = factor(Sector),
Sector = fct_recode(Sector,
'Public' = "0",
'Subsidized Private' = "1",
'Private'= "2"),
Fam_Income = factor(INCOME),
Fam_Income = fct_recode(Fam_Income,
'Income Q1' = "0",
'Income Q2' = "1",
'Income Q3'= "2",
'Income Q4' = '3',
'Income Q5' = '4'),
Mother_Ed = ifelse(ED_SECONDARY == 1, 1,
ifelse(ED_TERTIARY == 1,2,0)),
Mother_Ed = factor(Mother_Ed),
Mother_Ed = fct_recode(Mother_Ed,
'Primary Education' = "0",
'Secondary Education' = "1",
'Tertiary Education'= "2"))
# write.table(dat_analysis, file = "dat_analysis.csv", sep = ",") # Ee save the latest data set to be loaded easily
# remove(list=ls()) # Once the data manipulation is done, we can remove the objects in the R environment that are not necessary
# dat_analysis <- read_csv("C:/Users/geral/OneDrive - University of Iowa/.../dat_analysis.csv") # we reload our dataset
kable_styling(kable(data.frame(head(dat_analysis, 10)), digits = 3,
caption = 'Table 1: Dataset'),
bootstrap_options = c('hover', 'responsive', 'bordered'),
full_width = F, position = 'center', font_size =12) %>%
scroll_box(height = '250px')
GPA | NEM | RKG | LEN | MATH | GENDER | TYPE | SECTOR_SUB | SECTOR_PRIV | INCOME_Q2 | INCOME_Q3 | INCOME_Q4 | INCOME_Q5 | APPLY | ADM | ED_SECONDARY | ED_TERTIARY | INCOME | Sector | Fam_Income | Mother_Ed |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5.40 | 496 | 497 | 675 | 674 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | Subsidized Private | Income Q2 | Secondary Education |
5.43 | 504 | 504 | 377 | 431 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 3 | Public | Income Q4 | Primary Education |
5.70 | 559 | 568 | 304 | 475 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | Public | Income Q1 | Secondary Education |
5.65 | 550 | 553 | 562 | 535 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 4 | Public | Income Q5 | Primary Education |
5.83 | 589 | 595 | 366 | 319 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | Public | Income Q1 | Primary Education |
4.95 | 391 | 391 | 466 | 443 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 3 | Public | Income Q4 | Primary Education |
6.83 | 803 | 850 | 495 | 540 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 2 | Public | Income Q3 | Secondary Education |
5.70 | 559 | 574 | 355 | 509 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | Public | Income Q2 | Secondary Education |
6.03 | 631 | 631 | 440 | 595 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 4 | Private | Income Q5 | Tertiary Education |
5.48 | 513 | 531 | 209 | 516 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | Public | Income Q2 | Primary Education |
When we were analyzing the data, we realized that many students were not applying to selective programs, even though they had taken the entrance exams and had valid scores. In order not to overestimate the probability of not being admitted, we eliminated students who did not apply to a selective program.
grViz("digraph flowchart {
label= 'Flowchart Admission process 2021';
labelloc=top;
labeljust=left;
# node definitions with substituted label text
node [fontname = Arial, shape = rectangle, fontsize = 10]
tab1 [label = '@@1']
tab2 [label = '@@2']
tab3 [label = '@@3']
tab4 [label = '@@4']
tab5 [label = '@@5']
# edge definitions with the node IDs
tab1 -> tab2;
tab2 -> tab3;
tab3 -> tab4;
tab4 -> tab5
}
[1]: 'Registration (n = 276,059)'
[2]: 'Participants 2021 graduates of regular education levels (n = 211,985)'
[3]: 'Students taking the PSU Tests (n = 184,221)'
[4]: 'Students who applied to a university (n = 90,120)'
[5]: 'Admitted students (75,544)'
")
# We decided to remove students who did not apply to any selective program so as not to overestimate the probability of not bein admitted to a selective program.
dat_analysis <- dat_analysis %>%
filter(APPLY == 1)
# Given that the relative fit of the models requires the same sample size between competitive models, we removed incomplete cases.
dat_analysis <- dat_analysis %>%
filter(complete.cases(.)) # Final dataset
# Academic predictors had a scale between 150 and 850 points. To avoid small coefficients, we divided these variables into 10. In addition, we centered these predictors on 550 (55), which represents average performance.
dat_analysis <- dat_analysis %>%
mutate(
NEM_55 = ((NEM/10) - 55),
RKG_55 = ((RKG/10) - 55),
LEN_55 = ((LEN/10) - 55),
MATH_55 = ((MATH/10) - 55)) # We centered academic predictors at 550 points, given that this score represents an average performance
After filtering our data, we can analyze some characteristics of the sample.
prop_inc <- dat_analysis %>%
count(Fam_Income) %>%
mutate(prop = (n/sum(n))*100) %>%
dplyr::select(Fam_Income, prop)%>%
rename(var = Fam_Income)
prop_sector <- dat_analysis %>%
count(Sector) %>%
mutate(prop = (n/sum(n))*100) %>%
dplyr::select(Sector, prop) %>%
rename(var = Sector)
prop_type <- dat_analysis %>%
count(TYPE) %>%
mutate(prop = (n/sum(n))*100) %>%
dplyr::select(TYPE, prop) %>%
rename(var = TYPE)%>%
mutate(var = ifelse(var == 0, 'Humanist', 'Technical school'))
prop_mom <- dat_analysis %>%
count(Mother_Ed) %>%
mutate(prop = (n/sum(n))*100) %>%
dplyr::select(Mother_Ed, prop) %>%
rename(var = Mother_Ed)
prop_gen <- dat_analysis %>%
count(GENDER) %>%
mutate(prop = (n/sum(n))*100) %>%
dplyr::select(GENDER, prop) %>%
rename(var = GENDER) %>%
mutate(var = ifelse(var == 1, 'Female', 'Male'))
sample <- dplyr::bind_rows(
prop_inc,
prop_sector,
prop_type,
prop_mom,
prop_gen)
sample
# A tibble: 15 × 2
var prop
<chr> <dbl>
1 Income Q1 24.7
2 Income Q2 23.1
3 Income Q3 15.0
4 Income Q4 15.2
5 Income Q5 21.9
6 Public 27.1
7 Subsidized Private 53.1
8 Private 19.8
9 Humanist 86.5
10 Technical school 13.5
11 Primary Education 11.6
12 Secondary Education 47.1
13 Tertiary Education 41.3
14 Male 44.7
15 Female 55.3
Table 1
Sample Description (n = 87,995)
kable(sample,
digits = 2,
col.names = c('Variable', '%'),
align = 'lc')%>%
kable_styling(bootstrap_options = c('striped','condensed', 'bordered'),
full_width = F, position = 'center', font_size =12)%>%
column_spec(1, italic = T)%>%
row_spec(0, bold = T)%>%
kableExtra::group_rows('Family Income Quintile', 1, 5) %>%
kableExtra::group_rows('School Sector', 6, 8) %>%
kableExtra::group_rows('School Type', 9, 10)%>%
kableExtra::group_rows('Education Mother', 11, 13) %>%
kableExtra::group_rows('Biological Sex', 14, 15)
Variable | % |
---|---|
Family Income Quintile | |
Income Q1 | 24.74 |
Income Q2 | 23.14 |
Income Q3 | 14.95 |
Income Q4 | 15.22 |
Income Q5 | 21.94 |
School Sector | |
Public | 27.11 |
Subsidized Private | 53.13 |
Private | 19.76 |
School Type | |
Humanist | 86.50 |
Technical school | 13.50 |
Education Mother | |
Primary Education | 11.61 |
Secondary Education | 47.11 |
Tertiary Education | 41.28 |
Biological Sex | |
Male | 44.65 |
Female | 55.35 |
Table 2
Academic Factor Scores by Socioeconomic Attributes (n = 87,995)
gender_means <- dat_analysis %>%
group_by(GENDER)%>%
summarise(
mean_len = mean(LEN),
sd_len = sd(LEN),
mean_math = mean(MATH),
sd_math = sd(MATH),
mean_nem = mean(NEM),
sd_nem = sd(NEM),
mean_rkg = mean(RKG),
sd_rkg = sd(RKG))%>%
rename(var = GENDER)%>%
mutate(var = ifelse(var == 0, 'Male', 'Female'))
type_means <- dat_analysis %>%
group_by(TYPE)%>%
summarise(
mean_len = mean(LEN),
sd_len = sd(LEN),
mean_math = mean(MATH),
sd_math = sd(MATH),
mean_nem = mean(NEM),
sd_nem = sd(NEM),
mean_rkg = mean(RKG),
sd_rkg = sd(RKG))%>%
rename(var = TYPE)%>%
mutate(var = ifelse(var == 0, 'Humanist', 'Technical'))
sector_means <- dat_analysis %>%
group_by(Sector)%>%
summarise(
mean_len = mean(LEN),
sd_len = sd(LEN),
mean_math = mean(MATH),
sd_math = sd(MATH),
mean_nem = mean(NEM),
sd_nem = sd(NEM),
mean_rkg = mean(RKG),
sd_rkg = sd(RKG))%>%
rename(var = Sector)
parents_means <- dat_analysis %>%
group_by(Mother_Ed)%>%
summarise(
mean_len = mean(LEN),
sd_len = sd(LEN),
mean_math = mean(MATH),
sd_math = sd(MATH),
mean_nem = mean(NEM),
sd_nem = sd(NEM),
mean_rkg = mean(RKG),
sd_rkg = sd(RKG))%>%
rename(var = Mother_Ed)
income_means <- dat_analysis %>%
group_by(Fam_Income)%>%
summarise(
mean_len = mean(LEN),
sd_len = sd(LEN),
mean_math = mean(MATH),
sd_math = sd(MATH),
mean_nem = mean(NEM),
sd_nem = sd(NEM),
mean_rkg = mean(RKG),
sd_rkg = sd(RKG))%>%
rename(var = Fam_Income)
means <- dat_analysis %>%
summarise(
mean_len = mean(LEN),
sd_len = sd(LEN),
mean_math = mean(MATH),
sd_math = sd(MATH),
mean_nem = mean(NEM),
sd_nem = sd(NEM),
mean_rkg = mean(RKG),
sd_rkg = sd(RKG))
means <- cbind(var = 'Avg',means)
data_means <- dplyr::bind_rows(
gender_means,
type_means,
sector_means,
parents_means,
income_means,
means)
kable(data_means,
digits = 2,
col.names = c('Variable', 'Mean', 'SD','Mean', 'SD','Mean', 'SD','Mean', 'SD'),
align = 'lcccccccc')%>%
kable_styling(bootstrap_options = c('striped','condensed', 'bordered'),
full_width = F, position = 'center', font_size =12)%>%
column_spec(1, italic = T)%>%
row_spec(0, bold = T)%>%
add_header_above(c('', 'Language' = 2, 'Math' = 2, 'NEM' = 2, 'Ranking' = 2))%>%
kableExtra::group_rows('Biological Sex', 1, 2) %>%
kableExtra::group_rows('School Type', 3, 4) %>%
kableExtra::group_rows('School Sector', 5,7)%>%
kableExtra::group_rows('Education Mother', 8,10) %>%
kableExtra::group_rows('Family Income Quintile', 11, 15) %>%
kableExtra::group_rows('General', 16,16)
Variable | Mean | SD | Mean | SD | Mean | SD | Mean | SD |
---|---|---|---|---|---|---|---|---|
Biological Sex | ||||||||
Male | 564.39 | 84.50 | 580.21 | 89.13 | 629.37 | 100.26 | 653.70 | 117.11 |
Female | 558.57 | 83.79 | 548.34 | 86.51 | 653.26 | 96.17 | 682.67 | 113.15 |
School Type | ||||||||
Humanist | 567.97 | 83.49 | 571.80 | 87.98 | 646.74 | 97.62 | 671.16 | 113.58 |
Technical | 517.58 | 74.85 | 503.43 | 71.96 | 616.04 | 101.64 | 660.60 | 128.97 |
School Sector | ||||||||
Public | 544.55 | 82.40 | 532.50 | 84.06 | 638.60 | 99.20 | 669.42 | 118.34 |
Subsidized Private | 553.75 | 79.35 | 551.85 | 78.65 | 629.85 | 97.98 | 658.96 | 118.00 |
Private | 603.93 | 85.09 | 632.66 | 85.98 | 682.37 | 89.31 | 699.12 | 100.36 |
Education Mother | ||||||||
Primary Education | 528.40 | 76.88 | 521.20 | 78.02 | 627.13 | 99.43 | 661.54 | 121.81 |
Secondary Education | 549.84 | 79.86 | 545.99 | 80.99 | 632.38 | 98.07 | 661.67 | 117.65 |
Tertiary Education | 583.32 | 85.31 | 593.13 | 91.15 | 658.60 | 97.04 | 681.23 | 110.95 |
Family Income Quintile | ||||||||
Income Q1 | 540.51 | 78.77 | 532.29 | 79.43 | 627.62 | 98.91 | 658.73 | 119.86 |
Income Q2 | 552.90 | 80.21 | 547.55 | 81.27 | 634.51 | 98.57 | 664.48 | 118.50 |
Income Q3 | 559.74 | 81.74 | 557.29 | 82.36 | 638.18 | 96.88 | 666.21 | 115.43 |
Income Q4 | 564.07 | 83.93 | 565.67 | 84.86 | 640.66 | 97.86 | 666.47 | 114.75 |
Income Q5 | 592.15 | 86.82 | 614.03 | 92.44 | 672.39 | 94.27 | 692.35 | 105.99 |
General | ||||||||
Avg | 561.17 | 84.16 | 562.57 | 89.11 | 642.60 | 98.73 | 669.73 | 115.83 |
Figure 1
Average Scores in the Admission Factors by Mother Education (n = 87,995)
plotmathsec <- ggplot(dat_analysis, aes(x=Mother_Ed, y = MATH, fill = Mother_Ed))+
geom_boxplot()+
theme_bw()+
labs(x = "Education", y = "Math", fill = "Education")+
scale_x_discrete(labels = c('','',''))+
scale_fill_discrete(guide = "none")+
scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
theme(axis.title.x=element_blank())
plotlensec <- ggplot(dat_analysis, aes(x=Mother_Ed, y = LEN, fill = Mother_Ed))+
geom_boxplot()+
theme_bw()+
labs(x = "Education", y = "Language", fill = "Education")+
scale_fill_discrete(guide = 'none')+
scale_x_discrete(labels = c('','',''))+
scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
theme(axis.title.x=element_blank())
plotrkgsec <- ggplot(dat_analysis, aes(x=Mother_Ed, y = RKG, fill = Mother_Ed))+
geom_boxplot()+
theme_bw()+
labs(x = "Education", y = "Ranking", fill = "Education")+
scale_x_discrete(labels = c('','',''))+
scale_fill_discrete(guide = 'none')+
scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
theme(axis.title.x=element_blank())
plotnemsec <- ggplot(dat_analysis, aes(x=Mother_Ed, y = NEM, fill = Mother_Ed))+
geom_boxplot()+
theme_bw()+
labs(x = "Education", y = "NEM", fill = "Education")+
scale_x_discrete(labels = c('','',''))+
scale_fill_discrete(guide = "none")+
scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
theme(axis.title.x=element_blank())
(plotmathsec+plotlensec)/(plotrkgsec+plotnemsec)
Figure 2
Average Scores in the Admission Factors by Family Income Quintile (n = 87,995)
plotmathsec <- ggplot(dat_analysis, aes(x=Fam_Income, y = MATH, fill = Fam_Income))+
geom_boxplot()+
theme_bw()+
labs(x = "Family Income", y = "Mathematics", fill = "Family Income")+
scale_x_discrete(labels = c('','','','',''))+
scale_fill_discrete(guide = "none")+
scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
theme(axis.title.x=element_blank())
plotlensec <- ggplot(dat_analysis, aes(x=Fam_Income, y = LEN, fill = Fam_Income))+
geom_boxplot()+
theme_bw()+
labs(x = "Family Income", y = "Language", fill = "Family Income")+
scale_fill_discrete(guide = 'none')+
scale_x_discrete(labels = c('','','','',''))+
scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
theme(axis.title.x=element_blank())
plotrkgsec <- ggplot(dat_analysis, aes(x=Fam_Income, y = RKG, fill = Fam_Income))+
geom_boxplot()+
theme_bw()+
labs(x = "Family Income", y = "Ranking", fill = "Family Income")+
scale_x_discrete(labels = c('','','','',''))+
scale_fill_discrete(guide = "none")+
scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
theme(axis.title.x=element_blank())
plotnemsec <- ggplot(dat_analysis, aes(x=Fam_Income, y = NEM, fill = Fam_Income))+
geom_boxplot()+
theme_bw()+
labs(x = "Family Income", y = "NEM", fill = "Family Income")+
scale_x_discrete(labels = c('','','','',''))+
#scale_fill_discrete(guide = "none")+
scale_y_continuous(limits = c(150, 850), breaks = seq(400, 800, 200)) +
theme(axis.title.x=element_blank())
(plotmathsec+plotlensec)/(plotrkgsec+plotnemsec)
# First, we fit an empty or null model to check what the model is fitting
Empty_mod<- glm(ADM ~ 1, family = binomial(link = 'logit'), data = dat_analysis)
summary(Empty_mod)
tidy(Empty_mod) # From broom,
# The model is predicting the logit of the probability of being admitted
round(exp(coef(Empty_mod)) / (1 + exp(coef(Empty_mod))),4)
round(prop.table(table(dat_analysis$ADM)),4)
Call:
glm(formula = ADM ~ 1, family = binomial(link = "logit"), data = dat_analysis)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7333 0.7097 0.7097 0.7097 0.7097
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.250361 0.008103 154.3 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 93314 on 87994 degrees of freedom
Residual deviance: 93314 on 87994 degrees of freedom
AIC: 93316
Number of Fisher Scoring iterations: 4
# A tibble: 1 × 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 1.25 0.00810 154. 0
(Intercept)
0.7774
0 1
0.2226 0.7774
# Now, we fit and compare different model versions until a satisfactory solution is found
# Full model
adm_mod1 <- glm(
formula = ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 +
GENDER + INCOME_Q2 + INCOME_Q3 + INCOME_Q4 + INCOME_Q5 +
ED_SECONDARY + ED_TERTIARY + SECTOR_SUB + SECTOR_PRIV + TYPE,
family = binomial(link = 'logit'),
data = dat_analysis)
summary(adm_mod1)
# Second version of the model (- School Type)
adm_mod2 <- glm(
formula = ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 +
GENDER + INCOME_Q2 + INCOME_Q3 + INCOME_Q4 + INCOME_Q5 +
ED_SECONDARY + ED_TERTIARY + SECTOR_SUB + SECTOR_PRIV,
family = binomial(link = 'logit'),
data = dat_analysis)
summary(adm_mod2)
# LRT
DevTest = -2*(logLik(adm_mod2) - logLik(adm_mod1))
RegPvalue = pchisq((DevTest), df=1, lower.tail=FALSE)
MixPvalue = RegPvalue/2
DevTest; RegPvalue; MixPvalue
# Third version of the model ( - Education Mother)
adm_mod3 <- glm(
formula = ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 +
GENDER + INCOME_Q2 + INCOME_Q3 + INCOME_Q4 + INCOME_Q5 +
SECTOR_SUB + SECTOR_PRIV,
family = binomial(link = 'logit'),
data = dat_analysis)
summary(adm_mod3)
# LRT comparing the second vs the third version
DevTest = -2*(logLik(adm_mod3) - logLik(adm_mod2))
RegPvalue = pchisq((DevTest), df=2, lower.tail=FALSE)
MixPvalue = RegPvalue/2
DevTest; RegPvalue; MixPvalue
wald_adm_mod3 = glht(adm_mod2, linfct = c('NEM_55 = 0', 'RKG_55 = 0', 'LEN_55 = 0', 'MATH_55 = 0',
'GENDER = 0', 'INCOME_Q2 = 0', 'INCOME_Q3 = 0', 'INCOME_Q4 = 0', 'INCOME_Q5 = 0',
'SECTOR_SUB = 0' , 'SECTOR_PRIV = 0'))
summary(wald_adm_mod3, test = Chisqtest()) # Joint chi-square test
adm_mod3_1 <- glm(
formula = ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 +
GENDER + INCOME_Q2 + INCOME_Q3 + INCOME_Q4 + INCOME_Q5 +
SECTOR_SUB + SECTOR_PRIV,
family = binomial(link = 'probit'),
data = dat_analysis)
According to the results of the relative fit between models,
mother education
and school type
were
statistically non-significant predictors, so they were removed. Our
final results were reported in Table 3.
Table 3
Binary Logistic Regression Results (n = 87,995)
mod_results <- tidy(adm_mod3)
mod_results <- round(mod_results[2:5],3)
coef <- data.frame(adm_mod3$coefficients)
coef_names <- rownames(coef)
flextable(cbind(coef_names,mod_results)) # Coefficients in Logit scale
coef_names | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.352 | 0.024 | 55.270 | 0.000 |
NEM_55 | -0.011 | 0.004 | -2.593 | 0.010 |
RKG_55 | 0.022 | 0.003 | 6.247 | 0.000 |
LEN_55 | 0.024 | 0.001 | 20.901 | 0.000 |
MATH_55 | 0.036 | 0.001 | 29.528 | 0.000 |
GENDER | -0.335 | 0.018 | -18.917 | 0.000 |
INCOME_Q2 | -0.064 | 0.023 | -2.765 | 0.006 |
INCOME_Q3 | -0.075 | 0.027 | -2.790 | 0.005 |
INCOME_Q4 | -0.108 | 0.027 | -4.007 | 0.000 |
INCOME_Q5 | -0.023 | 0.029 | -0.775 | 0.438 |
SECTOR_SUB | -0.071 | 0.019 | -3.655 | 0.000 |
SECTOR_PRIV | 0.165 | 0.032 | 5.116 | 0.000 |
surr_rsq(adm_mod3_1, adm_mod3_1, dat_analysis, avg.num = 100) # Surrogate R2
------------------------------------------------------------------------
The surrogate R-squared of the model
------------------------------------------------------------------------
ADM ~ 1 + NEM_55 + RKG_55 + LEN_55 + MATH_55 + GENDER + INCOME_Q2 +
INCOME_Q3 + INCOME_Q4 + INCOME_Q5 + SECTOR_SUB + SECTOR_PRIV
------------------------------------------------------------------------
is:
[1] 0.10161
exp(adm_mod3$coefficients) # Computing Odds Ratio for each predictor
(Intercept) NEM_55 RKG_55 LEN_55 MATH_55 GENDER
3.8659858 0.9890920 1.0219394 1.0243255 1.0366175 0.7152494
INCOME_Q2 INCOME_Q3 INCOME_Q4 INCOME_Q5 SECTOR_SUB SECTOR_PRIV
0.9376850 0.9281447 0.8974563 0.9775428 0.9318013 1.1795144
Figure 3
Binary Logistic Regression Results (n = 87,995)
dat <- read.table(header=TRUE, text='
Fixed_effects Odds_Ratio
NEM_550 0.989
Ranking_550 1.022
Language_550 1.024
Mathematics_550 1.037
Sex_Female 0.715
Income_Q2 0.938
Income_Q3 0.928
Income_Q4 0.897
Income_Q5 0.978
Sector_Private_Subsidized 0.932
Sector_Private 1.180')
acad_factors <- ggplot(dat[c(1:4),], aes(x = Odds_Ratio, y = Fixed_effects))+
geom_point(size = 2.5)+
geom_vline(xintercept = 1, linetype="dashed",
color = "blue", size=1) +
scale_x_continuous(limits = c(.70, 1.25), breaks = seq(.70, 1.25, 0.1))+
labs(x = 'Odds Ratio', y = 'Predictors')+
scale_y_discrete(limits=rev)+
theme_bw(base_size = 14)+
ggtitle("Academic factors")+
theme(axis.title = element_text(size = 12), plot.title = element_text(size = 12))
nonacad_factors <- ggplot(dat[c(5:11),], aes(x = Odds_Ratio, y = Fixed_effects))+
geom_point(size = 2.5)+
geom_vline(xintercept = 1, linetype="dashed",
color = "blue", size=1) +
scale_x_continuous(limits = c(.70, 1.25), breaks = seq(.70, 1.25, 0.1))+
labs(x = 'Odds Ratio', y = '')+
scale_y_discrete(limits=rev)+
theme_bw(base_size = 14)+
ggtitle("Non-academic factors")+
theme(axis.title = element_text(size = 12), plot.title = element_text(size = 12))
acad_factors+nonacad_factors
Figure 4
Estimated Probabilities of Being Admitted (n = 87,995)
prob_mod <- dat_analysis %>%
dplyr::select(ADM,NEM_55,RKG_55,LEN_55,MATH_55,GENDER,INCOME_Q2,INCOME_Q3,INCOME_Q4,INCOME_Q5,SECTOR_SUB,SECTOR_PRIV) %>%
mutate(probability = predict(adm_mod3, type = 'response'))
probs <- (summary(glht(model=adm_mod3, linfct=rbind(
"Males_Pub_Q1" = c(1,0,0,0,0,0,0,0,0,0,0,0),
"Males_Pub_Q2" = c(1,0,0,0,0,0,1,0,0,0,0,0),
"Males_Pub_Q3" = c(1,0,0,0,0,0,0,1,0,0,0,0),
"Males_Pub_Q4" = c(1,0,0,0,0,0,0,0,1,0,0,0),
"Males_Pub_Q5" = c(1,0,0,0,0,0,0,0,0,1,0,0),
"Males_Sub_Q1" = c(1,0,0,0,0,0,0,0,0,0,1,0),
"Males_Sub_Q2" = c(1,0,0,0,0,0,1,0,0,0,1,0),
"Males_Sub_Q3" = c(1,0,0,0,0,0,0,1,0,0,1,0),
"Males_Sub_Q4" = c(1,0,0,0,0,0,0,0,1,0,1,0),
"Males_Sub_Q5" = c(1,0,0,0,0,0,0,0,0,1,1,0),
"Males_Pri_Q1" = c(1,0,0,0,0,0,0,0,0,0,0,1),
"Males_Pri_Q2" = c(1,0,0,0,0,0,1,0,0,0,0,1),
"Males_Pri_Q3" = c(1,0,0,0,0,0,0,1,0,0,0,1),
"Males_Pri_Q4" = c(1,0,0,0,0,0,0,0,1,0,0,1),
"Males_Pri_Q5" = c(1,0,0,0,0,0,0,0,0,1,0,1),
"Females_Pub_Q1" = c(1,0,0,0,0,1,0,0,0,0,0,0),
"Females_Pub_Q2" = c(1,0,0,0,0,1,1,0,0,0,0,0),
"Females_Pub_Q3" = c(1,0,0,0,0,1,0,1,0,0,0,0),
"Females_Pub_Q4" = c(1,0,0,0,0,1,0,0,1,0,0,0),
"Females_Pub_Q5" = c(1,0,0,0,0,1,0,0,0,1,0,0),
"Females_Sub_Q1" = c(1,0,0,0,0,1,0,0,0,0,1,0),
"Females_Sub_Q2" = c(1,0,0,0,0,1,1,0,0,0,1,0),
"Females_Sub_Q3" = c(1,0,0,0,0,1,0,1,0,0,1,0),
"Females_Sub_Q4" = c(1,0,0,0,0,1,0,0,1,0,1,0),
"Females_Sub_Q5" = c(1,0,0,0,0,1,0,0,0,1,1,0),
"Females_Pri_Q1" = c(1,0,0,0,0,1,0,0,0,0,0,1),
"Females_Pri_Q2" = c(1,0,0,0,0,1,1,0,0,0,0,1),
"Females_Pri_Q3" = c(1,0,0,0,0,1,0,1,0,0,0,1),
"Females_Pri_Q4" = c(1,0,0,0,0,1,0,0,1,0,0,1),
"Females_Pri_Q5" = c(1,0,0,0,0,1,0,0,0,1,0,1))),test=adjusted("none")))
logit_pr <- logit_prob(probs[["test"]][["coefficients"]]) # Inverse logit link function
logit_pr <- data.frame(logit_pr)
logit_pr
logit_pr
Males_Pub_Q1 0.7944918
Males_Pub_Q2 0.7837874
Males_Pub_Q3 0.7820493
Males_Pub_Q4 0.7762640
Males_Pub_Q5 0.7907585
Males_Sub_Q1 0.7827188
Males_Sub_Q2 0.7715774
Males_Sub_Q3 0.7697701
Males_Sub_Q4 0.7637572
Males_Sub_Q5 0.7788312
Males_Pri_Q1 0.8201434
Males_Pri_Q2 0.8104564
Males_Pri_Q3 0.8088804
Males_Pri_Q4 0.8036285
Males_Pri_Q5 0.8167687
Females_Pub_Q1 0.7344059
Females_Pub_Q2 0.7216682
Females_Pub_Q3 0.7196094
Females_Pub_Q4 0.7127754
Females_Pub_Q5 0.7299521
Females_Sub_Q1 0.7204021
Females_Sub_Q2 0.7072605
Females_Sub_Q3 0.7051387
Females_Sub_Q4 0.6980999
Females_Sub_Q5 0.7158043
Females_Pri_Q1 0.7653423
Females_Pri_Q2 0.7535905
Females_Pri_Q3 0.7516866
Females_Pri_Q4 0.7453577
Females_Pri_Q5 0.7612386
tab_prob <- cbind(
a = data.frame(logit_pr[1:5,]),
b = data.frame(logit_pr[6:10,]),
c = data.frame(logit_pr[11:15,]),
d = data.frame(logit_pr[16:20,]),
e = data.frame(logit_pr[21:25,]),
f = data.frame(logit_pr[26:30,]))
colnames(tab_prob) <- c('male_pub', 'male_sub', 'male_priv','fem_pub', 'fem_sub', 'fem_priv')
rownames(tab_prob) <- c('Q1', 'Q2', 'Q3','Q4', 'Q5')
flextable(cbind(x = c('Q1', 'Q2', 'Q3','Q4', 'Q5'), round(tab_prob,3)))
x | male_pub | male_sub | male_priv | fem_pub | fem_sub | fem_priv |
---|---|---|---|---|---|---|
Q1 | 0.794 | 0.783 | 0.820 | 0.734 | 0.720 | 0.765 |
Q2 | 0.784 | 0.772 | 0.810 | 0.722 | 0.707 | 0.754 |
Q3 | 0.782 | 0.770 | 0.809 | 0.720 | 0.705 | 0.752 |
Q4 | 0.776 | 0.764 | 0.804 | 0.713 | 0.698 | 0.745 |
Q5 | 0.791 | 0.779 | 0.817 | 0.730 | 0.716 | 0.761 |
Figure 5
Estimate Probabilities of Admission According to Biological Sex, School Sector, and Family Income Quintile (n = 87,995)
data_prob <- data.frame(Quintil = c('Q1', 'Q2', 'Q3', 'Q4', 'Q5'), tab_prob)
males_probs <- ggplot(data_prob, aes(x = Quintil, group=1))+
geom_point(aes(y = male_pub))+
geom_line(aes(y = male_pub), color = 'red', size = 1)+
geom_point(aes(y = male_sub))+
geom_line(aes(y = male_sub), color = 'green', size = 1)+
geom_point(aes(y = male_priv))+
geom_line(aes(y = male_priv), color = 'blue', size = 1)+
scale_y_continuous(limits = c(.6, .9), breaks = seq(.6, .9, .1))+
labs(x = 'Quintile family income', y = 'Probability of admission')+
theme_bw(base_size = 14)+
ggtitle("Males")+
theme(axis.title = element_text(size = 12), plot.title = element_text(size = 12))
females_probs <- ggplot(data_prob, aes(x = Quintil, group=1))+
geom_point(aes(y = fem_pub))+
geom_line(aes(y = fem_pub), color = 'red', size = 1)+
geom_point(aes(y = fem_sub))+
geom_line(aes(y = fem_sub), color = 'green', size = 1)+
geom_point(aes(y = fem_priv))+
geom_line(aes(y = fem_priv), color = 'blue', size = 1)+
scale_y_continuous(limits = c(.6, .9), breaks = seq(.6, .9, .1))+
labs(x = 'Quintile family income', y = '')+
theme_bw(base_size = 14)+
ggtitle("Females")+
theme(axis.title = element_text(size = 12), plot.title = element_text(size = 12))
males_probs+females_probs
sessionInfo()
R version 4.2.3 (2023-03-15 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dvmisc_1.1.4 rbenchmark_1.0.0 patchwork_1.2.0 DiagrammeR_1.0.11
[5] SurrogateRsq_0.2.0 scales_1.3.0 progress_1.2.2 PAsso_0.1.10
[9] DescTools_0.99.50 flextable_0.9.3 broom_1.0.5 ggplot2_3.5.0
[13] kableExtra_1.3.4 forcats_1.0.0 dplyr_1.1.3 readr_2.1.4
[17] multcomp_1.4-25 TH.data_1.1-2 MASS_7.3-58.2 survival_3.5-7
[21] mvtnorm_1.2-3 knitr_1.43 klippy_0.0.0.9500
loaded via a namespace (and not attached):
[1] readxl_1.4.3 uuid_1.1-1 backports_1.4.1
[4] copula_1.1-3 workflowr_1.7.1 VGAM_1.1-9
[7] systemfonts_1.0.4 plyr_1.8.8 lazyeval_0.2.2
[10] copBasic_2.2.3 splines_4.2.3 pspline_1.0-19
[13] digest_0.6.33 foreach_1.5.2 htmltools_0.5.6
[16] fansi_1.0.4 magrittr_2.0.3 tzdb_0.4.0
[19] vroom_1.6.3 officer_0.6.2 stabledist_0.7-1
[22] sandwich_3.0-2 svglite_2.1.1 askpass_1.2.0
[25] prettyunits_1.1.1 gfonts_0.2.0 colorspace_2.1-0
[28] rvest_1.0.3 mitools_2.4 textshaping_0.3.6
[31] xfun_0.40 crayon_1.5.2 jsonlite_1.8.7
[34] Exact_3.2 zoo_1.8-12 iterators_1.0.14
[37] glue_1.6.2 gtable_0.3.4 webshot_0.5.5
[40] lmomco_2.4.14 fontquiver_0.2.1 DBI_1.1.3
[43] GGally_2.2.1 Rcpp_1.0.11 viridisLite_0.4.2
[46] xtable_1.8-4 bit_4.0.5 randtoolbox_2.0.4
[49] proxy_0.4-27 survey_4.2-1 stats4_4.2.3
[52] fontLiberation_0.1.0 htmlwidgets_1.6.2 httr_1.4.7
[55] RColorBrewer_1.1-3 ellipsis_0.3.2 farver_2.1.1
[58] pkgconfig_2.0.3 sass_0.4.7 utf8_1.2.3
[61] crul_1.4.0 tidyselect_1.2.0 rlang_1.1.1
[64] later_1.3.1 tab_5.1.1 visNetwork_2.1.2
[67] munsell_0.5.0 cellranger_1.1.0 tools_4.2.3
[70] cachem_1.0.8 cli_3.6.1 generics_0.1.3
[73] evaluate_0.21 stringr_1.5.0 fastmap_1.1.1
[76] yaml_2.3.7 ragg_1.2.5 goftest_1.2-3
[79] bit64_4.0.5 fs_1.6.3 zip_2.3.0
[82] purrr_1.0.2 rootSolve_1.8.2.4 mime_0.12
[85] rngWELL_0.10-9 xml2_1.3.5 compiler_4.2.3
[88] rstudioapi_0.15.0 plotly_4.10.3 curl_5.0.2
[91] e1071_1.7-13 tibble_3.2.1 bslib_0.5.1
[94] pcaPP_2.0-4 stringi_1.7.12 gsl_2.1-8
[97] highr_0.10 gdtools_0.3.3 lattice_0.20-45
[100] Matrix_1.6-1 fontBitstreamVera_0.1.1 vctrs_0.6.3
[103] pillar_1.9.0 lifecycle_1.0.3 ADGofTest_0.3
[106] jquerylib_0.1.4 data.table_1.14.8 lmom_3.0
[109] httpuv_1.6.11 R6_2.5.1 promises_1.2.1
[112] gridExtra_2.3 gld_2.6.6 codetools_0.2-19
[115] Lmoments_1.3-1 ggstats_0.5.1 boot_1.3-28.1
[118] assertthat_0.2.1 openssl_2.1.0 rprojroot_2.0.3
[121] withr_2.5.0 httpcode_0.3.0 parallel_4.2.3
[124] expm_0.999-7 hms_1.1.3 grid_4.2.3
[127] tidyr_1.3.0 class_7.3-21 rmarkdown_2.24
[130] git2r_0.33.0 numDeriv_2016.8-1.1 shiny_1.7.5