Permutation example

8/24/2023

The coin package is big and complicated and powerful. lmp() seems to automatically change the contrast settings from the default treatment contrast to sum-to-zero contrasts, so that the reported effect size is half what it was (3.75/2), because it is computing the difference between the (unweighted) average of the two groups and the first group (field). We’ll talk more about specifying linear models next week, but the formula is response ~ in this case place is the predictor (this is again equivalent to a t-test or a one-way ANOVA with two groups). # Residual standard error: 1.961 on 8 degrees of freedom # lmp(formula = colonies ~ place, data = ants) summary(lmp(colonies~place,data=ants)) # "Settings: unique SS " # Here the lmp() function is the permutation analog of the lm() (linear model) function in base R. R has a software package ( lmPerm) that automatically fits linear models and computes p-values by permutations. In place of the difference between means computed above. If you want to use this in your testing code you would use tt <- t.test(colonies~place,data=bdat,var.equal=TRUE) # mean in group field mean in group forest # alternative hypothesis: true difference in means is not equal to 0 In this case this should give the same answer … (tt <- t.test(colonies~place,data=ants,var.equal=TRUE)) # The test statistic here is not the difference between the means, as we used above, but the difference divided by the standard error. For some reason R’s t-test seems to use opposite signs for the effect size (i.e. it reports a positive value, “field minus forest”, rather than a negative value as we did above), but this doesn’t really matter. The standard parametric test to use here would be a \(t\) test, or equivalently a 1-way ANOVA (as done by lm()). Although we’re using the same test statistic, we’re not assuming that the values of the test statistic are \(t\)-distributed, which would require the assumptions we want to avoid. Instead of computing the difference between means, we could use the test statistic from a standard statistical test. (It’s equivalent to sum(x=TRUE)/length(x).) 2*mean(res>=obs) # doubling (as suggested by JD) # 0.0466 mean(abs(res)>=abs(obs)) # count both tails: matches lmPerm # 0.0374 If x is a logical vector (such as res>=obs), then mean(x) first converts FALSE values to 0 and TRUE values to 1, then computes the mean this calculates the proportion of the values that are TRUE.

If we want a two-tailed test, we have to decide whether we are doubling the observed value or counting the area in both tails. Since there aren’t actually that many possible outcomes, we could plot them this way instead of using a histogram: par(las=1,bty="l") If you want to be very fancy/tidyverse-ish, check out purrr::map_dbl(1:n, ~. %>% pull(colonies) # extract a single column Some alternative recipes for computing the difference in the means: (1) base R with aggregate() … either of these could be substituted for the mean(bdat.) line in the code above. colonies above scrambles the response variable with respect to the predictors (in this case, to the “field” vs. “forest” location)Ī picture of the results: hist(res,col="gray",las=1,main="") sample() is a general-purpose tool: by default it samples a specified number of values without replacement, which means that it generates a re-ordering of the numbers, e.g.transform() is a base-R analog of tidyverse mutate().for loops are a way to repeat computations many times: e.g. see here for an introduction.You should always use set.seed() before running computations involving randomizations Nsim ) resets the random-number stream to a specified place.

The simplest way to do this would be something like: set.seed(101) # for reproducibility There are always trade-offs between simplicity, transparency, length of code, computational efficiency …

0 Comments

BLOG

Permutation example

Leave a Reply.

Author

Archives

Categories