Introduction
- This demo script illustrates the use of the
sotk2package through a step-by-step workflow:- It demonstrates how to construct a Spatial Omics Set (
soSet) by integrating (i) NMF-derived outputs as inputs and (ii) an all-pairs correlation matrix as the primary output. - The workflow then proceeds to create a spatial omics object
(
soObj), in which a metagene network is inferred by thresholding correlation strengths (for example, Spearman's rho > 0.5). - A community detection algorithm is subsequently applied to the resulting network to identify metagene communities, which can be interpreted as candidate biological modules.
- Following community detection, each community can be systematically annotated using sample-level metadata (for example, molecular subtype labels such as the Verhaak classification) to facilitate biological interpretation.
- Users may also incorporate external annotations or project-specific features onto the network to support tailored interpretation and downstream analyses.
- It demonstrates how to construct a Spatial Omics Set (
Data download to run the demos
- Demo data are available from Zenodo (https://doi.org/10.5281/zenodo.18063318).
- You may download the files directly from the Zenodo record or use the provided helper script to retrieve them programmatically.
-
Two data bundles are provided: core and full
- The core bundle contains five files comprising cNMF outputs for the GLASS, IVYGAP, and HEILAND datasets, along with the GLASS cohort expression matrix and sample-level annotation metadata.
- The full bundle includes two additional files; however,
UKF269_T_Visium.RDS(which contains Visium v1 slide image) is approximately 542 MB. Because of its size, we recommend downloading the core bundle first to begin running the demos, and downloadingUKF269_T_Visium.RDSin the background. This Visium object is required for the visualization steps later in the workflow.
-
Downloads are handled by the package helper
sotk2::download_demo_data(), which fetches the requested bundle and MD5-verifies each file. Setdownload_dirto control where the files land.
library(sotk2)
# Optional: choose a custom directory; otherwise the default
# tools::R_user_dir("sotk2", "data") is used.
# download_dir <- "/path/to/download"
options(timeout = max(1200, getOption("timeout"))) # 20 minutes
if (exists("download_dir") && is.character(download_dir) &&
length(download_dir) == 1 && nzchar(download_dir)) {
paths <- download_demo_data(set = "core", download_dir = download_dir)
} else {
paths <- download_demo_data(set = "core")
}
## Checking demo data files... (set = core)
Directory settings
- This block defines the working directory used throughout the demo to store downloaded inputs and generated outputs.
- If a valid
download_dirvariable already exists in the current R session, the script reuses that user-specified location. - Otherwise, it defaults to a standard, package-specific data directory
returned by
tools::R_user_dir("sotk2", "data"), which provides a consistent and user-writable storage path across platforms.
if (exists("download_dir") && is.character(download_dir) && length(download_dir) == 1
&& nzchar(download_dir)) {
download_dir <- download_dir
} else {
download_dir <- tools::R_user_dir("sotk2", "data")
}
Load the sotk2 library
- If
sotk2is not yet installed, please refer to the project repository for installation instructions: https://github.com/Snyder-Institute/sotk2. - After loading
sotk2, its required dependencies (includingigraph,methods,NMF,RColorBrewer, andstringr) are loaded to support the full demonstration workflow.
# install.packages("devtools")
# devtools::install_github("Snyder-Institute/sotk2")
library(sotk2)
Load cNMF objects
- In this section, we load the precomputed cNMF result objects for three cohorts (GLASS, IVYGAP, and HEILAND) from the downloaded demo files.
- These cohort-specific objects are then assembled into a single named
list (
dataL), which serves as the primary input for subsequentsotk2workflows. - We additionally define:
- The candidate factorization ranks to consider for each cohort
(
rankL) - A cohort-specific color palette (
dataCol) to ensure consistent visualization across figures - The correlation method (
corMethod) used to quantify metagene similarity during the downstream integration and network-construction steps
- The candidate factorization ranks to consider for each cohort
(
glass <- readRDS(file.path(download_dir, "nmfRes_GLASS.RDS"))
ivygap <- readRDS(file.path(download_dir, "nmfRes_IVYGAP.RDS"))
heiland <- readRDS(file.path(download_dir, "nmfRes_HEILAND.RDS"))
dataL <- list(
GLASS = glass,
IVYGAP = ivygap,
HEILAND = heiland
)
rankL <- list(
GLASS = c(3:10, 15, 20),
IVYGAP = c(3:10, 15, 20),
HEILAND = seq(5, 30, 5)
)
dataCol <- c(
"GLASS" = "cyan3",
"IVYGAP" = "chartreuse1",
"HEILAND" = "magenta"
)
corMethod <- "spearman"
Calculate pairwise correlations
- In this section, we construct a Spatial Omics Set (
soSet) by integrating the cohort-specific cNMF objects and the corresponding rank specifications. - Internally,
SOSet()concatenates the metagene loading matrices (the W matrices) across cohorts and selected ranks, thereby placing all metagenes into a common representation suitable for cross-dataset comparison. - It then computes an all-pairs correlation matrix across the
concatenated metagenes using the specified correlation metric
(
corMet; here, Spearman). - The resulting
soSetobject stores the inputs, cohort metadata (including visualization colors), and the computed correlation structure, which serves as the basis for downstream network inference and community detection.
soSet <- SOSet(
NMFobjL = dataL,
NMFrankL = rankL,
dataCol = dataCol,
corMet = corMethod
)
## 3 dataset(s) found in the list: GLASS, IVYGAP, HEILAND
## Loading: GLASS [Rank: 3, 4, 5, 6, 7, 8, 9, 10, 15, 20]
## Loading: IVYGAP [Rank: 3, 4, 5, 6, 7, 8, 9, 10, 15, 20]
## Loading: HEILAND [Rank: 5, 10, 15, 20, 25, 30]
## WARNING::Number of genes are different across datasets.
## Calculating: all pairwise correlation coefficients.
## Correlation computed with
## • Method: 'spearman'
## • Missing treated using: 'pairwise.complete.obs'
##
## Assigned color(s):
## > GLASS: cyan3
## > IVYGAP: chartreuse1
## > HEILAND: magenta
soSet
## Dataset(s):
## GLASS (cyan3)
## IVYGAP (chartreuse1)
## HEILAND (magenta)
## Select rank(s):
## GLASS: 3, 4, 5, 6, 7, 8, 9, 10, 15, 20
## IVYGAP: 3, 4, 5, 6, 7, 8, 9, 10, 15, 20
## HEILAND: 5, 10, 15, 20, 25, 30
## Basis (W) matrices:
## #Genes : 26316
## #Metagenes : 279
## Correlation method:
## spearman (pairwise.complete.obs)
## Correlation matrix:
## Symmetric matrix with 279 columns X 279 rows.
Generate a correlation network
- This section converts the correlation structure stored in
soSetinto a metagene similarity network and performs community detection under user-defined settings.- First, a correlation threshold (
coefThre) is applied to retain only sufficiently strong metagene-metagene associations (here, correlations ≥ 0.3), yielding a sparse graph representation. - Community detection is then run with a fixed random seed (
seed) to ensure reproducibility and a specified number of iterations (niter) to stabilize the optimization. - The parameters
commWeightandcohortWeightare not used during the community detection step. Instead, they are applied after communities have been identified to rewire or reweight edges for visualization, thereby influencing the spatial arrangement of nodes in network layouts. Specifically,cohortWeightincreases the tendency for metagenes from the same cohort (dataset) to cluster together, which is useful when cohort-specific structure is a primary interpretive focus. In contrast,commWeightincreases the tendency for metagenes assigned to the same community to cluster, which is preferable when emphasizing community-level modular organization. By tuning these weights, users can generate complementary visual representations of the same inferred communities without altering the underlying community assignments. - The resulting
soObjstores the inferred network and the identified communities for downstream annotation and visualization.
- First, a correlation threshold (
corrCoefThre <- 0.3
seed <- 1234
niter <- 1000
commWeight <- 100
cohortWeight <- 10
soObj <- SOTK(
SOSet = soSet,
coefThre = corrCoefThre,
seed = seed,
niter = niter,
commWeight = commWeight,
cohortWeight = cohortWeight
)
## Seed: 1234
## Community search algorithm:
## Fast Greedy
## Updating weights for
## Community #1
## Community #2
## Community #3
## Community #4
## Community #5
## Community #6
## Community #7
## Community #8
## Updating weights for
## Data: GLASS
## Data: IVYGAP
## Data: HEILAND
## Calculating new layout based on new weights.
## Community-level network generated.
soObj
## Correlation network:
## Nodes : 279
## Communities : 8 identified
## Parameters:
## coefThre : 0.3
## seed : 1234
## niter : 1000
## drop : FALSE
## searchMet : greedy
## commWeight : 100
## cohortWeight: 10
Save the object for reuse
- To avoid repeating computationally intensive steps in later parts of
the demo, we serialize the resulting
soObjobject to disk as an.RDSfile. - This saved object can be reloaded in downstream sections to reproduce the same network and community assignments without rerunning the correlation-network construction and community detection.
saveRDS(soObj, file.path(download_dir, "soObj.RDS"))
message("soObj.RDS was created.")
## soObj.RDS was created.