Skip to content

Possible inversion of LOCO logic in build_k_matrix function #25

Description

@gzygaozhiyuan

Hi Dr. Endelman,
I’ve been benchmarking the kinship-calculating section of the script on GitHub and noticed a small but critical inversion in the way the LOCO (Leave-One-Chromosome-Out) matrices are built.
Current code (set.K):

f1 <- function(marks, data) {
      M <- scale(data@geno[, marks], center = T, scale = F)
      K <- tcrossprod(M)
      K <- K/mean(diag(K))

markers <- split(data@map$Marker, data@map$Chrom)   # per-chr SNPs
K <- lapply(markers, f1, data = data)               # f1 uses those SNPs

This produces “chromosome-specific” kinship matrices, i.e. each matrix is built from the SNPs on the same chromosome that will later be tested. In the standard LOCO implementation, we do the opposite: when we scan chromosome i we estimate kinship from all SNPs except those on chromosome i, so that the random-effect term does not absorb the very signal we are trying to detect.

allSNPidx <- seq_len(ncol(data@geno))
markers   <- split(data@map$Marker, data@map$Chrom)
K <- lapply(names(markers), function(chr) {
  out   <- which(data@map$Chrom == chr)  
  M     <- scale(data@geno[, -out], center = T, scale = F)
  tcrossprod(M) / mean(diag(tcrossprod(M)))
})

That would make the resulting list K[[i]] correspond to the kinship matrix used when scanning chromosome i, consistent with the usual LOCO definition.

Thanks very much for maintaining the code—it's been incredibly useful.

Best regards,
Zhiyuan
China

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions