22 Skills and Hobbies to Learn in 2020

It’s 2020 – we are trying our best to physical distance to keep everyone around us and ourselves safe. Other than binge-watching TV shows, it’s the perfect time for self-care, self-reflection, and self-improvement. It’s also the perfect time for us to take up hobbies and learn new skills (some more useful than others).

Here are some skills and hobbies that are on my list. Let’s check them off together!

  1. Cooking: Learn to cook your favourite dish, learn about cooking skills 101, knife skills, etc.
  2. Learn a new language: Right now, I am trying to improve my Mandarin, and learn Korean.
  3. Morse Code: You can learn morse from google and from CW online. I hear that learning morse code from sound is most effective.
  4. Learn and practice throwing darts: I got my dart board from Dollarama. When bars/pubs open up, you can be the next dart throwing champion.
  5. Learn to solve the Rubik’s cube
  6. Learn to play an instrument: I’m trying to learn the Ukulele now.
  7. Learn to skateboard
  8. Read books
  9. Renovate/Redecorate your home
  10. Learn some card/coin tricks
  11. Learn to tie knots
  12. Juggling
  13. Exercise / Workout
  14. Meditate: Take time to listen to your own thoughts and be present
  15. Start journaling / blogging: Can’t you see that’s what I’m trying to do now?
  16. Learn / Make bath bombs: Bath bombs only take 3 main ingredients to make. Bath bombs make great gifts, and it makes taking a bath very fun. 🙂
  17. Go biking / hiking and explore different neighbourhoods in your city!
  18. Virtually explore museums from around the world.
  19. Learn / improve your coding skills
  20. Learn / find an alternate source of passive income
  21. Take online courses: There are so many free online courses that you can take now (e.g, Harvard)
  22. Learn lock picking: Ever left your keys somewhere and need to get home?

What hobbies and to-learn skills are on your list?

R function to adjust p.values and show only thresholded correlation coefficients in matrix

# function
# takes data frame, calculates correlations, adjusts p.values, outputs p.values in matrix
# also provides thresholded matrix of correlation coefficients (r)

adj.cor = function(df, p.adjust = FALSE, p.adjust.method = “none”, threshold = 1){
cor_test = rcor.test(df, p.adjust = p.adjust, p.adjust.method = p.adjust.method)
r.mat = cor_test$cor.mat     # matrix of coefficient values
p.list = cor_test$p.values   # p.list will be 3 columns

# initiate empty matrix for the p values only
MAT = matrix(, nrow = 76, ncol = 76)
diag(MAT) = 0

# starting to convert p.list to p.values matrix
for(ind in 1:(length(p.list[,1]))){
var1 = p.list[ind, 1] # var1
var2 = p.list[ind, 2] # var2
p.value = p.list[ind, 3] # p value
MAT[var1, var2] = p.value
MAT[var2, var1] = p.value
rownames(MAT) = names(df)
colnames(MAT) = names(df)
# At this point, MAT has the p.values in a matrix

# subset only the coefficients with p values < 0.05 (or threshold)
subset = ifelse(MAT < threshold, r.mat, NA)
rownames(subset) = names(df)
colnames(subset) = names(df)

output = list(adj.p.values = MAT, threshold.r = subset)

### No Corrections ####
cor = adj.cor(data.frame, p.adjust = FALSE, p.adjust.method = “none”, threshold = 0.05)
pval = cor$adj.p.values  # shows the p.values in a matrix format
thresR = cor$threshold.r  # shows only the significant <0.05 (if you don’t want any thresholding, put threshold =1 above)
# plotting thresholded r’s in matrix with cor.plot from the psych package
cor.plot(thresR, show.legend = TRUE, main = “p < 0.05”, numbers = FALSE)

### Bonferroni ####
cor = adj.cor(data.frame, p.adjust = FALSE, p.adjust.method = “bonferroni”, threshold = 0.05)
pval = cor$adj.p.values  # shows the p.values in a matrix format
thresR = cor$threshold.r  # shows only the q value <0.05
cor.plot(thresR, show.legend = TRUE, main = “bonferroni correction”, numbers = FALSE)

### FDR – BH ####
cor = adj.cor(data.frame, p.adjust = FALSE, p.adjust.method = “BH”, threshold = 0.05)
pval = cor$adj.p.values  # shows the p.values in a matrix format
thresR = cor$threshold.r  # shows only the q value <0.05
cor.plot(thresR, show.legend = TRUE, main = “BH correction, numbers = FALSE)

Check out other useful R tips and tricks here!

What is the False Discovery Rate?

When correcting for the multiple testing problem, you’ve probably been familiar with the stringent Bonferroni correction.  You’ve also probably heard of the False Discovery Rate (FDR), but what is the main difference between the two corrections?

I was browsing around for a simple explanation and came across the totallab website (see below).

I quote:

[FDR] controls the number of false discoveries in those tests that result in a discovery (i.e. a significant result). Because of this, it is less conservative than the Bonferroni approach and has greater ability (i.e. power) to find truly significant results.

Another way to look at the difference is that a p-value of 0.05 implies that 5% of all tests will result in false positives. An FDR adjusted p-value (or q-value) of 0.05 implies that 5% of significant tests will result in false positives. The latter is clearly a far smaller quantity.


Selecting and visualizing only significant correlation coefficients in matrix

You have:
1) a matrix of correlation coefficients (e.g., matrix A)
2) a matrix of their p-values (e.g., matrix B)

You want to:
1) visualize the correlation coefficients in a correlogram
2) visualize the coefficients with only significant p-values

What to do:
output = corr.test(rawData)
names(output)  # to take a look at the available output statistics
A = output$r    # matrix A here contains the correlation coefficients
B = output$p   # matrix B here contains the corresponding p-values

# first, to visualize the entire matrix in a correlogram
corrgram(A)  # visualizing the correlation coefficients corrgram(B)  # visualizing the p-values

# But, you also want to visualize the correlation coefficients with significant p-values!
# to do that, you need to select only the matrix elements with significant p-values
# if it is above 0.05 (not significant), then replace with NAs)
sig_element = ifelse(B < 0.05, A, NA)

# can plot the new matrix
corrgram(sig_element)  # this displays only the correlation coefficients that have significant p-values

Note:  If you have NA’s in your matrix, sometimes corrgram might not be able to visualize your matrix. This may be due to the NA’s that are present in the diagonals.  Replace the NA’s in the diagonals with 1’s and you might be able fix that issue (e.g., diag(A) = 1)

Better Alternative for plotting:
plot with cor.plot also from the psych package – I realized that it is more flexible as it handles missing values (NAs) better than the corrgram().
# or
cor.plot(sig_element, show.legend = TRUE, main = “title”, numbers = TRUE, labels = names)

CAUTION:  Please be careful when calculating multiple correlation coefficients.  Please correct for multiple comparisons when appropriate!

For more information, please check out:

Check out other useful R tips and tricks here!

Missing Data? Try Multiple Imputation

It is common for researchers to exclude participants with missing data — but what are some ways to keep the participants and analyse the data in unbiased ways?

Data imputation involves replacing missing data with plausible values based on the Monte Carlo technique. Here, the missing values are replaced by several simulated versions.

1. you have missing data in a data set
2. missing data are simulated with different versions –> several simulated data sets
3. analyze data set (version #1) as it were a complete data set
4. repeat step 3 with other data set versions (e.g., #2, #3, …, #N). [for low rates of missing data, only 3-10 simulated data sets are needed.]
5. combine (average) the results to produce a single point estimates and confidence intervals (or p-values) that incorporate missing-data uncertainty.

Screen Shot 2014-02-12 at 11.06.53 PM

How do I generate imputations for the missing values?
The imputation model:
Impose a probability model on the complete data (observed and missing values).

Key points on what to include in the imputation model: [~ 30 min into the Amelia I video (link below)]
1. Include all the variables that you want to include in your analysis model
e.g., age, education, ideology, income … have to include all the variables you will need later in analysis stage

2. Include variables that are highly predictive of the variables you are going to analyse
e.g., Voter turnout analysis: ideology — e.g., include views on homelessness, abortion (predictors of variables you’re interested in)

3. Include variables that are highly predictive of the missingness of your data
e.g., income the predict missingness — throw it in model as well

“…Because you are throwing in a lot of variables in the imputation model than the variables you would be looking at the analysis stage and that’s OK!”

*Note: this method assumes MAR (Missing At Random)

– “imputation model should be compatible with the analyses to be performed on the imputed datasets… In general, any association that may be important in subsequent analyses should be in the imputation model” … that means include those relevant variables in the imputation model.
– On the other hand, you don’t necessarily have to examine those variables in your final analyses (unless it’s of interest pertaining to the outcome).

“When working with binary or ordered categorical variables, it’s often acceptable to impute under under a normality assumption and then round off the continuous imputed values to the nearest category. Variables whose distributions are heavily skewed may be transformed (e.g., logs) to approximate normality and then transformed back to their original scale after imputation.” – the multiple imputation FAQ page

Other notes:
– if you have ordinal variables – code them to as close to an interval scale as possible
– include any non-linear relationship in your model, e.g., age and age-squared in voter turnout, so include age-squared in your imputation model
– if you will look at any interactions, throw those terms in your imputation model as well.

Software for Multiple Imputation:
– R (Amelia II, mi, etc.)
– Strata (mi, ice, mim, etc.)
– …

There are many softwares to do multiple imputations, but if you use R, you can check out AmeliaII
– the user guide is very clear and helpful – can run through the example dataset and code.

Additional Resources:

video – Innovation in Amelia I (explanation of multiple imputation in general): http://vimeo.com/18534025

the multiple imputation FAQ page: http://sites.stat.psu.edu/~jls/mifaq.html

Click to access MultInt99.pdf

Click to access l13.pdf

How to Calculate R-squared Change in R

In R, the anova.lm() does not give out R-squared change when you are comparing different regression models.
anova.lm() does provide you with the F Change, df1, df2, and Sig F Change in the output.
Sure, you can calculate the R-squared change yourself, but there’s a package for it!  The output is also more intuitive!

# first, remember to install and load the package.

# specify the regression models
model1 = lm(dataset$var1 ~ dataset$var2)
model2 = lm(dataset$var1 ~ dataset$var2 + dataset$var3)

# compare the two models!
# to get the R-squared change and other stats
lm.deltaR2(model1, model2)

You can check out this forum for more information: http://stats.stackexchange.com/questions/37785/does-lm-use-partial-correlation-r-squared-change

Check if at least two or more booleans or conditions are True in R and Debugging in R

But, what do you do when you want to find out if at least two or more of the conditions are true?

You take advantage of the Boolean math!   Here’s how:

# for at least two of the conditions being true

if ( ((condA) + (condB) + (condC) + (condD)) >=2 ) {

# something happens here


For more, check out: https://psycnotes.wordpress.com/at-least-2-or-more-boolean-conditions/

Now, I also have another post about Debugging in R, which uses the debug()