Skip to content

Issue of plot_varimax_z_pairs() when leverage value is zero #78

@yixuan

Description

@yixuan

Hi all,

I am the author of the RSpectra package, and I found a potential issue of plot_varimax_z_pairs() when I tried to update RSpectra to a new version.

The source code of plot_varimax_z_pairs() is simple:

> plot_varimax_z_pairs
function (fa, factors = 1:min(5, fa$rank), ...) 
{
    stop_if_not_installed("dplyr")
    stop_if_not_installed("GGally")
    stop_if_not_installed("purrr")
    fa %>% get_varimax_z(factors) %>% dplyr::select(-id) %>% 
        dplyr::mutate(leverage = purrr::pmap_dbl(., sum)) %>% 
        dplyr::sample_n(min(nrow(.), 1000), weight = leverage^2) %>% 
        dplyr::select(-leverage) %>% GGally::ggpairs(ggplot2::aes(alpha = 0.001), 
        ...) + ggplot2::theme_minimal()
}

Consider the following reproducible code:

data(enron, package = "igraphdata")
fa <- vsp(enron, rank = 3)
res <- fa %>% get_varimax_z(factors) %>% dplyr::select(-id) %>% 
        dplyr::mutate(leverage = purrr::pmap_dbl(., sum))
res[160:170, ]

And the output is:

> r[160:170, ]
# A tibble: 11 × 4
          z1        z2        z3 leverage
       <dbl>     <dbl>     <dbl>    <dbl>
 1  1.34e- 2  6.60e- 2  4.39e- 2 1.23e- 1
 2 -1.83e- 3 -2.07e- 3  4.31e- 1 4.27e- 1
 3  7.73e- 2  1.79e- 1  1.68e- 2 2.73e- 1
 4 -1.60e- 2 -1.47e- 1  3.91e+ 0 3.75e+ 0
 5  8.22e- 2  4.80e+ 0 -1.03e- 2 4.87e+ 0
 6 -5.00e-15 -2.17e-13  1.21e-12 9.90e-13
 7 -8.80e- 3 -8.05e- 2  1.91e+ 0 1.83e+ 0
 8 -2.01e- 4 -1.19e- 3  4.94e- 2 4.80e- 2
 9  4.39e- 3  5.72e- 1  1.83e- 2 5.95e- 1
10  2.09e- 5  3.84e- 3  1.53e- 2 1.92e- 2
11 -2.63e- 2 -1.43e- 1  8.96e+ 0 8.79e+ 0

Note that the leverage value in line 6 is very close to zero. Now with the new version of RSpectra, the output is almost identical, except for some rounding errors:

> r1[160:170, ]
# A tibble: 11 × 4
           z1       z2      z3 leverage
        <dbl>    <dbl>   <dbl>    <dbl>
 1  0.0134     0.0660   0.0439   0.123 
 2 -0.00183   -0.00207  0.431    0.427 
 3  0.0773     0.179    0.0168   0.273 
 4 -0.0160    -0.147    3.91     3.75  
 5  0.0822     4.80    -0.0103   4.87  
 6  0          0        0        0     
 7 -0.00880   -0.0805   1.91     1.83  
 8 -0.000201  -0.00119  0.0494   0.0480
 9  0.00439    0.572    0.0183   0.595 
10  0.0000209  0.00384  0.0153   0.0192
11 -0.0263    -0.143    8.96     8.79  

And here line 6 has exact zero values.

This is where the issue occurs. The sample_n(tbl, size, replace, weight) function in the implementation of plot_varimax_z_pairs() requires positive weights when size == nrow(tbl) and replace = FALSE. So in case the data set is small and leverage contains exact zeros, the sample_n() function will throw errors:

Error in `dplyr::sample_n()`:
! Can't compute indices.
Caused by error in `sample.int()`:
! too few positive probabilities

The fix is also simple: we can just make weight = leverage^2 + 1e-10, so that every weight is strictly positive, and it does not deviate much from the exact value.

Could you consider making this change in a future version? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions