Skip to content

Commit 0c65393

Browse files
aoliveramCopilotgvegayon
authored
Add WKU Epi Game dataset files (#70)
* Fix round_to_seq example: change plot(w,x) to plot(w) Co-authored-by: gvegayon <893619+gvegayon@users.noreply.github.com> * Bump version to 1.24.1, update NEWS.md and README.md Co-authored-by: gvegayon <893619+gvegayon@users.noreply.github.com> * Add WKU Epi Game dataset files * Add data generation scripts for epigames dataset * Add and compile Roxygen2 documentation for epigames/wku dataset * Add collapse_timeframes utility * Add epigames hourly raw dataset * Add epigames diffnet object pipeline * Add epigames dataset documentation and update namespace * Remove old unused variables, datasets * Use usethis::use_data() in data-raw scripts * Consolidate epigames docs in data.r, remove standalone epigames.R * Rename epigames_raw -> epigames, regenerate rda files * Regenerate Rd files: add wku_diffnet, fix epigames aliases * Add tests for collapse_timeframes * Set compress='xz' in use_data calls * Fix epigames data: add TOA, fix ID mapping, remove right-censoring * Bump version to 1.24.2, update NEWS, refine tests * feat: add binarize, cumulative and symmetric params to collapse_timeframes * test: add tests for new collapse_timeframes parameters * refactor: update data-raw scripts to use new hourly data and params * data: update epigames datasets with 594-node hourly source * rem: remove wku_diffnet.RData and its documentation * chore: bump version to 1.25.0 * doc: consolidate news for version 1.25.0 * style: cleanup and formatting in news, data docs and collapse_timeframes --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: gvegayon <893619+gvegayon@users.noreply.github.com>
1 parent 91a6c71 commit 0c65393

28 files changed

Lines changed: 762 additions & 38 deletions

DESCRIPTION

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
Package: netdiffuseR
22
Title: Analysis of Diffusion and Contagion Processes on Networks
3-
Version: 1.24.0
3+
Version: 1.25.0
44
Authors@R: c(
55
person("George", "Vega Yon", email="g.vegayon@gmail.com", role=c("aut", "cre"),
66
comment=c(ORCID = "0000-0002-3171-0844", what="Rewrite functions with Rcpp, plus new features")
77
),
88
person("Thomas", "Valente", email="tvalente@usc.edu", role=c("aut", "cph"),
99
comment=c(ORCID="0000-0002-8824-5816", what="R original code")),
1010
person("Anibal", "Olivera Morales", role = c("aut", "ctb"),
11-
comment=c(ORCID="0009-0000-3736-7939", what="Multi-diffusion version")),
11+
comment=c(ORCID="0009-0000-3736-7939", what="Developer from V1.23.0")),
1212
person("Stephanie", "Dyal", email="stepharp@usc.edu", role=c("ctb"), comment="Package's first version"),
1313
person("Timothy", "Hayes", email="timothybhayes@gmail.com", role=c("ctb"), comment="Package's first version")
1414
)
@@ -21,7 +21,7 @@ Description: Empirical statistical analysis, visualization and simulation of
2121
9781881303213>, Myers (2000) <DOI:10.1086/303110>, Iyengar and others (2011)
2222
<DOI:10.1287/mksc.1100.0566>, Burt (1987) <DOI:10.1086/228667>; among others.
2323
Depends:
24-
R (>= 3.1.1)
24+
R (>= 3.5)
2525
License: MIT + file LICENSE
2626
LazyData: true
2727
Imports:
@@ -65,6 +65,7 @@ Collate:
6565
'bass.r'
6666
'bootnet.r'
6767
'citer_environment.R'
68+
'collapse_timeframes.R'
6869
'data.r'
6970
'degree_adoption_diagnostic.R'
7071
'diffnet-c.R'

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,7 @@ export(bootnet)
9898
export(classify)
9999
export(classify_adopters)
100100
export(classify_graph)
101+
export(collapse_timeframes)
101102
export(compare_matrix)
102103
export(cumulative_adopt_count)
103104
export(degree_adoption_diagnostic)

NEWS.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
1+
# Changes in netdiffuseR version 1.25.0 (2026-03-14)
2+
3+
* New function `collapse_timeframes()`: aggregates high-resolution or
4+
continuous-time longitudinal edgelists into discrete time windows, ready
5+
for use with `edgelist_to_adjmat()` or `as_diffnet()`. The function contains
6+
parameters such as `binarize`, `cumulative`, and `symmetric` for better control
7+
over the aggregation process.
8+
9+
* New dataset `epigames` and `epigamesDiffNet`: a simulated epidemic game
10+
network with 594 nodes and 15 time periods from the WKU Epi Games study.
11+
12+
* Fixed CRAN example error in `round_to_seq()`: `plot(w, x)` replaced with
13+
`plot(w)` to avoid `%||%` operator issue in R 4.4.0+'s `formula.default`
14+
when called via `plot.data.frame()`.
15+
116
# Changes in netdiffuseR version 1.24.0 (2025-12-09)
217

318
* New function `degree_adoption_diagnostic()` analyzes the correlation between network

R/collapse_timeframes.R

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
#' Collapse Timeframes in a Longitudinal Edgelist
2+
#'
3+
#' @description
4+
#' Allows users to take a high-resolution or continuous-time longitudinal
5+
#' edgelist and dynamically collapse or discretize it into larger time windows.
6+
#' The output is a shorter, aggregated edgelist ready to be passed into
7+
#' \code{[edgelist_to_adjmat]} or \code{[as_diffnet]}.
8+
#'
9+
#' @param edgelist A \code{data.frame} representing the longitudinal edgelist.
10+
#' @param ego Character scalar. Name of the column representing the ego (sender).
11+
#' @param alter Character scalar. Name of the column representing the alter (receiver).
12+
#' @param timevar Character scalar. Name of the column representing the time variable.
13+
#' @param weightvar Character scalar or \code{NULL}. Name of the column representing
14+
#' the edge weight. If \code{NULL}, the function tallies the number of interactions
15+
#' within the time window as the weight.
16+
#' @param window_size Numeric scalar. The size of the time window to collapse into.
17+
#' @param time_format Character scalar or \code{NULL}. If the time variable is a
18+
#' character or factor, the format passed to \code{as.POSIXct}.
19+
#' For example, \code{"\%d-\%m-\%Y \%H:\%M"}.
20+
#' @param relative_time Logical scalar. If \code{TRUE}, normalizes the binned
21+
#' times into a strict integer sequence starting at 1 (1, 2, 3...).
22+
#' @param binarize Logical scalar. If \code{TRUE}, sets all resulting edge weights to 1.
23+
#' @param cumulative Logical scalar. If \code{TRUE}, edges from previous time windows
24+
#' are carried over to subsequent windows.
25+
#' @param symmetric Logical scalar. If \code{TRUE}, the resulting graph will be
26+
#' symmetrized (i.e., if an edge A->B exists, an edge B->A is added).
27+
#'
28+
#' @return A \code{data.frame} with 4 columns: the ego, the alter, the new collapsed
29+
#' discrete time, and the aggregated weight.
30+
#'
31+
#' @export
32+
#' @examples
33+
#' \dontrun{
34+
#' # Load the package's hourly dataset
35+
#' load(system.file("data/epigames_raw.rda", package = "netdiffuseR"))
36+
#'
37+
#' # Collapse the hourly edgelist into a daily edgelist (window_size = 24)
38+
#' daily_edgelist <- collapse_timeframes(
39+
#' edgelist = epigames_raw$edgelist,
40+
#' timevar = "time",
41+
#' weightvar = "weight",
42+
#' window_size = 24
43+
#' )
44+
#' head(daily_edgelist)
45+
#' }
46+
collapse_timeframes <- function(
47+
edgelist,
48+
ego = "sender",
49+
alter = "receiver",
50+
timevar = "time",
51+
weightvar = NULL,
52+
window_size = 1,
53+
time_format = NULL,
54+
relative_time = TRUE,
55+
binarize = FALSE,
56+
cumulative = FALSE,
57+
symmetric = FALSE) {
58+
# Step 1: Time Column Parsing
59+
time_raw <- edgelist[[timevar]]
60+
61+
if (is.character(time_raw) || is.factor(time_raw)) {
62+
if (!is.null(time_format)) {
63+
time_raw <- as.numeric(as.POSIXct(as.character(time_raw), format = time_format))
64+
} else {
65+
time_raw <- as.numeric(as.POSIXct(as.character(time_raw)))
66+
}
67+
} else if (!is.numeric(time_raw) && !is.integer(time_raw)) {
68+
# e.g., Date or POSIXct already
69+
time_raw <- as.numeric(time_raw)
70+
}
71+
72+
# Check for NAs after conversion
73+
if (any(is.na(time_raw))) {
74+
warning("There are NA values in the parsed time variable.")
75+
}
76+
77+
# Step 2: Binning / Window Creation
78+
t_min <- min(time_raw, na.rm = TRUE)
79+
# Adding a tiny offset so min time doesn't fall out of bounds or shift unnecessarily
80+
discrete_time <- ceiling((time_raw - t_min + 1e-9) / window_size)
81+
# Ensure the minimum index is 1 at this stage
82+
min_dt <- min(discrete_time, na.rm = TRUE)
83+
if (min_dt < 1) {
84+
discrete_time <- discrete_time - min_dt + 1
85+
}
86+
87+
# Step 3: Handling relative_time
88+
if (relative_time) { # e.g. strict sequence 1, 2, 3
89+
sorted_unique_times <- sort(unique(discrete_time[!is.na(discrete_time)]))
90+
time_map <- stats::setNames(seq_along(sorted_unique_times), sorted_unique_times)
91+
discrete_time <- unname(time_map[as.character(discrete_time)])
92+
}
93+
94+
# Create a working data frame to hold things
95+
work_df <- data.frame(
96+
ego_col = edgelist[[ego]],
97+
alter_col = edgelist[[alter]],
98+
time_col = discrete_time
99+
)
100+
101+
# Step 4: Aggregation
102+
if (is.null(weightvar)) {
103+
work_df$weight_col <- 1
104+
} else {
105+
work_df$weight_col <- edgelist[[weightvar]]
106+
}
107+
108+
# Remove rows with NAs in essential grouping variables
109+
work_df <- work_df[!is.na(work_df$ego_col) & !is.na(work_df$alter_col) & !is.na(work_df$time_col), ]
110+
111+
agg_df <- stats::aggregate(
112+
weight_col ~ ego_col + alter_col + time_col,
113+
data = work_df,
114+
FUN = sum,
115+
na.rm = TRUE
116+
)
117+
118+
# Step 5: Output with 4 clean columns
119+
weight_col_name <- if (is.null(weightvar)) "weight" else weightvar
120+
colnames(agg_df) <- c(ego, alter, timevar, weight_col_name)
121+
122+
# Step 6: Post-aggregation processing
123+
124+
# 6.1 Binarize
125+
if (binarize) {
126+
agg_df[[weight_col_name]] <- 1
127+
}
128+
129+
# 6.2 Symmetrize
130+
if (symmetric) {
131+
rev_df <- agg_df
132+
rev_df[[ego]] <- agg_df[[alter]]
133+
rev_df[[alter]] <- agg_df[[ego]]
134+
135+
# Combine and de-duplicate (in case they already existed symmetrically)
136+
agg_df <- unique(rbind(agg_df, rev_df))
137+
}
138+
139+
# 6.3 Cumulative
140+
if (cumulative) {
141+
all_periods <- sort(unique(agg_df[[timevar]]))
142+
if (length(all_periods) > 1) {
143+
cumulative_el <- agg_df[agg_df[[timevar]] == all_periods[1], ]
144+
for (t_idx in 2:length(all_periods)) {
145+
t <- all_periods[t_idx]
146+
current <- agg_df[agg_df[[timevar]] == t, ]
147+
prev <- cumulative_el[cumulative_el[[timevar]] == all_periods[t_idx - 1], ]
148+
if (nrow(prev) > 0) {
149+
prev[[timevar]] <- t
150+
}
151+
# Combine current window with previous accumulated edges and de-duplicate
152+
combined <- unique(rbind(current, prev))
153+
cumulative_el <- rbind(cumulative_el, combined)
154+
}
155+
agg_df <- cumulative_el
156+
}
157+
}
158+
159+
# Apply standard sort for consistent outputs: time, ego, alter
160+
order_idx <- order(agg_df[[timevar]], agg_df[[ego]], agg_df[[alter]])
161+
agg_df <- agg_df[order_idx, ]
162+
rownames(agg_df) <- NULL
163+
164+
return(agg_df)
165+
}

R/data.r

Lines changed: 66 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -778,23 +778,23 @@ NULL # "medInnovationsDiffNet"
778778
#' the Brazilian Farmers collected as part of the three country study implemented
779779
#' by Everett Rogers (Rogers, Ascroft, & Röling, 1970), and Korean Family Planning
780780
#' data collected by researchers at the Seoul National University's School of
781-
#' Public (Rogers & Kincaid, 1981). The table below summarizes the three datasets:
782-
#'
783-
#' \tabular{lccc}{
784-
#' \tab \bold{Medical Innovation} \tab \bold{Brazilian Farmers} \tab \bold{Korean Family Planning} \cr
785-
#' \emph{Country} \tab USA \tab Brazil \tab Korean \cr
786-
#' \emph{# Respondents} \tab 125 Doctors \tab 692 Farmers \tab 1,047 Women \cr
787-
#' \emph{# Communities} \tab 4 \tab 11 \tab 25 \cr
788-
#' \emph{Innovation} \tab Tetracycline \tab Hybrid Corn Seed \tab Family Planning \cr
789-
#' \emph{Time for Diffusion} \tab 18 Months \tab 20 Years \tab 11 Years \cr
790-
#' \emph{Year Data Collected} \tab 1955-1956 \tab 1966 \tab 1973 \cr
791-
#' \emph{Ave. Time to 50\%} \tab 6 \tab 16 \tab 7 \cr
792-
#' \emph{Highest Saturation} \tab 0.89 \tab 0.98 \tab 0.83 \cr
793-
#' \emph{Lowest Saturation} \tab 0.81 \tab 0.29 \tab 0.44 \cr
794-
#' \emph{Citation} \tab Coleman et al (1966) \tab Rogers et al (1970) \tab Rogers & Kincaid (1981) \cr
795-
#' }
796-
#'
797-
#' All datasets include a column called \emph{study} which is coded as
781+
#' Public (Rogers & Kincaid, 1981). The table below summarizes the datasets:
782+
#'
783+
#' \tabular{lcccc}{
784+
#' \tab \bold{Medical Innovation} \tab \bold{Brazilian Farmers} \tab \bold{Korean Family Planning} \tab \bold{WKU Epi Games} \cr
785+
#' \emph{Country} \tab USA \tab Brazil \tab Korean \tab USA \cr
786+
#' \emph{# Respondents} \tab 125 Doctors \tab 692 Farmers \tab 1,047 Women \tab 594 Students \cr
787+
#' \emph{# Communities} \tab 4 \tab 11 \tab 25 \tab Multiple groups \cr
788+
#' \emph{Innovation} \tab Tetracycline \tab Hybrid Corn Seed \tab Family Planning \tab Masks/Medicine \cr
789+
#' \emph{Time for Diffusion} \tab 18 Months \tab 20 Years \tab 11 Years \tab 15 Periods \cr
790+
#' \emph{Year Data Collected} \tab 1955-1956 \tab 1966 \tab 1973 \tab Recent \cr
791+
#' \emph{Ave. Time to 50\%} \tab 6 \tab 16 \tab 7 \tab N/A \cr
792+
#' \emph{Highest Saturation} \tab 0.89 \tab 0.98 \tab 0.83 \tab N/A \cr
793+
#' \emph{Lowest Saturation} \tab 0.81 \tab 0.29 \tab 0.44 \tab N/A \cr
794+
#' \emph{Citation} \tab Coleman et al (1966) \tab Rogers et al (1970) \tab Rogers & Kincaid (1981) \tab WKU \cr
795+
#' }
796+
#'
797+
#' All core datasets include a column called \emph{study} which is coded as
798798
#' (1) Medical Innovation (2) Brazilian Farmers, (3) Korean Family Planning.
799799
#'
800800
#' @section Right censored data:
@@ -938,3 +938,52 @@ NULL
938938
#' @author George G. Vega Yon
939939
#' @name fakeEdgelist
940940
NULL # "fakeEdgelist"
941+
942+
943+
#' Epi Games Dataset
944+
#'
945+
#' @description
946+
#' The WKU Epi Games dataset represents a simulated epidemic or game environment with
947+
#' dynamic encounters over 15 time periods. It provides both node-level
948+
#' attributes and a longitudinal edgelist.
949+
#'
950+
#' @format A list with two data frames:
951+
#'
952+
#' **attributes**: A data frame with 594 rows and 9 variables representing nodes:
953+
#' \describe{
954+
#' \item{id}{Unique identifier for the participant.}
955+
#' \item{toa}{Time of Adoption (1 to 15), representing when the individual was first infected. Non-infected individuals have `NA`.}
956+
#' \item{qyes_total}{Cumulative count of times the player participated or scored positively in informative/educational "quarantine" questionnaires.}
957+
#' \item{qno_total}{Cumulative count of times the non-quarantine questionnaire factor was registered.}
958+
#' \item{mask_prop}{Proportion of time (across 15 steps) the participant used the mask intervention (0.0 to 1.0).}
959+
#' \item{med_prop}{Proportion of time the individual used pharmacological interventions or treatments.}
960+
#' \item{group}{Experimental group or node cohort.}
961+
#' \item{final_score}{Final score obtained in the game.}
962+
#' \item{status}{Final state label ("infected" or "not_infected").}
963+
#' }
964+
#'
965+
#' **edgelist**: A longitudinal data frame with 23,684 rows and 4 variables representing edges/contacts:
966+
#' \describe{
967+
#' \item{sender}{Origin node ID of the contact.}
968+
#' \item{receiver}{Destination node ID of the contact.}
969+
#' \item{time}{Time period of the contact (1 to 15).}
970+
#' \item{weight}{Strength, duration, or density of the exposure.}
971+
#' }
972+
#'
973+
#' @source WKU Epi Game simulation
974+
#' @family diffusion datasets
975+
#' @name epigames
976+
NULL # "epigames"
977+
978+
#' \code{diffnet} version of the Epi Games data
979+
#'
980+
#' A directed dynamic graph with 594 vertices and 15 time periods. The attributes
981+
#' in the graph are described in \code{\link{epigames}}.
982+
#'
983+
#' Non-adopters have \code{toa = NA}.
984+
#'
985+
#' @format A \code{\link{diffnet}} class object.
986+
#' @source WKU Epi Game simulation
987+
#' @family diffusion datasets
988+
#' @name epigamesDiffNet
989+
NULL

R/plot_diffnet2.r

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
#'
1010
#' x <- rnorm(100)
1111
#' w <- data.frame(as.integer(round_to_seq(x, as_factor = TRUE)),x)
12-
#' plot(w,x)
12+
#' plot(w)
1313
#'
1414
#' @seealso Used in \code{\link{diffmap}} and \code{\link{plot_diffnet2}}
1515
round_to_seq <- function(x, nlevels=20, as_factor=FALSE) {

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ And the actual R package:
5252
Vega Yon G, Olivera Morales A, Valente T (2025). _netdiffuseR:
5353
Analysis of Diffusion and Contagion Processes on Networks_.
5454
doi:10.5281/zenodo.1039317 <https://doi.org/10.5281/zenodo.1039317>,
55-
R package version 1.24.0, <https://github.com/USCCANA/netdiffuseR>.
55+
R package version 1.24.1, <https://github.com/USCCANA/netdiffuseR>.
5656

5757
To see these entries in BibTeX format, use 'print(<citation>,
5858
bibtex=TRUE)', 'toBibtex(.)', or set
@@ -374,7 +374,7 @@ sessionInfo()
374374
#> [1] stats graphics grDevices utils datasets methods base
375375
#>
376376
#> other attached packages:
377-
#> [1] netdiffuseR_1.24.0
377+
#> [1] netdiffuseR_1.24.1
378378
#>
379379
#> loaded via a namespace (and not attached):
380380
#> [1] Matrix_1.7-4 jsonlite_2.0.0 dplyr_1.1.4

data-raw/epigames.R

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# data-raw/epigames.R
2+
# Pre-processing script for the EpiGames Raw Dataset
3+
4+
rm(list = ls())
5+
6+
# The raw data consists of an attributes data frame and an hourly edgelist,
7+
# both using consistent node IDs (1-594).
8+
load("data-raw/epigames_hourly.rda")
9+
10+
epigames <- epigames_hourly
11+
12+
# Save compressed raw data
13+
usethis::use_data(epigames, overwrite = TRUE, compress = "xz")

0 commit comments

Comments
 (0)