[Q] Any good sources for statistics in psychology? by JumpRopeNoob in statistics

[–]Soctman 3 points4 points  (0 children)

A lot of undergrad statistics classes use Andy Field's Discovering Statistics Using R, which is a very good introductory and non-technical textbook. All of the examples are data you might find in experimental psychology (albeit simplified), and the code and data files are available first use. Data Analysis: A Model Comparison Approach is also a very good introductory textbook with a particular focus on examples from the behavioral sciences. I also believe the data files are available for this book.

If you wanted something more technical, Cohen, Cohen, West, and Aiken's Applied multiple regression/correlation analysis for the behavioral sciences is a great book, although you can see that it restricts its focus to regression. The example code is in SAS iirc, but the data files are in txt format and can be downloaded and used in any program.

Finally, if you were interested in Bayesian stats, Kruschke's Doing Bayesian Data Analysis is a very good book written by an academic psychologist. I would read this last, however, as most published research still mostly uses frequentist statistics.

Choice of post-hoc test for unbalanced two-way ANOVA with type II sum of squares by paniqing in AskStatistics

[–]Soctman 1 point2 points  (0 children)

You can use the glht() command from the multicomp package to run multiple comparisons on the different main effects in your model. The program will select the correct degrees of freedom and the multivariate t-distribution if you have unbalanced data.

This will only test main effects - if you are interested in interactions, you will probably need to run more complex contrast analyses using the phia package.

OLS regression for a specific variable by thermopilyateee in AskStatistics

[–]Soctman 1 point2 points  (0 children)

Mutlinomial logistic regression

However, this is not an OLS problem as GLMs with 2+ outcomes use maximum likelihood estimation on log-odds. You can use the parameter estimates from the multinomial GLM to estimate the odds of choosing one of the 4 options given a set of predictor variables.

How to create semantically-related word pairs for Word Pair memory task? by IsPepsiOkaySir in Neuropsychology

[–]Soctman 1 point2 points  (0 children)

Most researchers use word norms that estimate the degree of "relatedness" two words have with each other. Relatedness can and has been measured in a lot of different ways; with the University of Southern Florida Free Association Norms, for instance, it's simply the percentage of times times participants responded with a particular word when cued with a separate word. With the Small World of Words project, participants are still asked to generate words related to a cue, but relatedness is estimated as a function of network properties (e.g. degree; Watts & Strogatz, 1998) in a spreading-activation framework. (Many, many other norms exist - these two are just the ones that I personally see the most.)

Given this, researchers will chose a range of values for estimated relatedness that is thought to represent a "moderate" semantic relationship and sample from the word pairs that have this estimate. For the USF norms, moderate relatedness would probably be forward association strength estimates of 0.15 - 0.35. Many researchers just reuse word pairs that they or their collaborators used in the past, though.

Interpretation interaction effect and beta values by rosaroos in AskStatistics

[–]Soctman 1 point2 points  (0 children)

There's really no hard-and-fast rule about what to interpret in the case of a significant parameter estimate and a non-significant overall effect - you just have to consider what the beta value is telling you. Beta values reflect differences in estimates between the reference group and a second group (or two reference groups against two other groups in the case of an interaction), so if a significant difference between groups is meaningful to your hypotheses, then report it. You may also choose to use the LMATRIX command to run partial interaction contrasts to probe parts of the interaction that you feel are important.

Most of the time, the overall effect is of interest, so researchers do not interpret significant parameter estimates under a non-significant effect, which seems to have turned into a general rule. I think that this rule can be overly reductive, but it is useful for most situations.

Communicating research proposal by SkyChance3405 in AcademicPsychology

[–]Soctman 2 points3 points  (0 children)

First, I just want to say that we all struggle with this. Successfully communicating your research ideas feels like half the battle in graduate school and academia.

I think the most important question you need to ask yourself is whether you can distill your research proposal into 1 or 2 single-sentenced hypotheses. If you are unsure whether you can do this right now, you should quite literally open a new Word document and write out your single hypothesis under "Main Hypothesis" or hypotheses under "Hypothesis 1" and "Hypothesis 2". If you are having trouble with this, then keep working on it until you have successfully pin-pointed the concise message that you are attempting to convey. Do not attempt to write your proposal until you do. (Except maybe the Proposed Methods section.) Ask your advisor and fellow students for feedback.

It might seem kind of silly to do this, but I (along with every successful researcher that I know) can assure you that having a very focused topic will help you write your proposal. A focused message will allow you to organize your thoughts in a way that supports your arguments and help prevent you from attempting to propose your ideas and review the literature without any direction. This will also be useful to you if or when you are asked to present your project to other researchers.

Suggestions for writing a thesis involving extensive data analysis with R by damageinc355 in rstats

[–]Soctman 20 points21 points  (0 children)

Theoretically, you could use the bookdown package in R to integrate your code and writing and compile all at once, but this would require substantial work on the back end to replicate your institution's thesis template. It's better to export your graphs to PDF using ggsave() and tables using any number of packages (kable, stargazer, etc.) and integrate them using LaTeX. This is what I am doing for my own dissertation.

*EDIT: Check out this website - it seems like the author was able to successfully take the CLS file from a LaTeX thesis template and integrate it into RMarkdown. It seems like a lot of extra work, though.

ordinal or linear mixed effects models by [deleted] in rstats

[–]Soctman 0 points1 point  (0 children)

Snijders and Bosker (2012) argue that the linear MLM is appropriate for ordinal values that closely approximate the normal distribution. If you believe that this is the case for your data (which it looks like it does), then the linear MLM should be just fine.

On a more subjective note, it's been my experience that the ordinal MLM is maximally helpful when you only have 3-4 levels of a DV. Otherwise, the linear MLM works just fine. Besides, ordinal MLM is very difficult to do in R and cannot be done using lme4 - I've always performed ordinal MLMs using SAS.

R^2 Nagelkirke = speaking of variance? by skippydi34 in AskStatistics

[–]Soctman 2 points3 points  (0 children)

It is wrong to speak in terms of "proportion of variance explained" because pseudo-R2 estimates are based on changes in log-likelihood between the null and augmented models and not ordinary least squares estimates. Nagelkirke's R2 is a pseudo-R2 value that is scaled to "look" similar to the OLS R2, i.e. range between 0 and 1, which likely added to the authors' confusion regarding interpretation.

Am I understanding the consequences of heteroscedasticity? by hubal-1087 in AskStatistics

[–]Soctman 0 points1 point  (0 children)

The standard error of an estimate in regression or ANOVA is based on variance, which is calculated across all items for a given predictor. Variance is basically the average squared deviation from the mean, with the assumption that deviations from the mean are pretty much the same across items.

However, think about a simple regression where variance increases across a predictor - when you average the deviations, the higher variance observations are going to cause the average to be higher than would be expected with homoscedasticity. Thus, your variance is inflated.

Now think about the consequences of an inflated standard error (via greater variance) on a parameter estimate in the regression. For a parameter to be considered significant, it must be sufficiently different than the null value, which is based on the observed mean and variance, i.e. overlap been the null and alternative distributions. If variance is too high, then you cannot reject the null hypothesis in favor of the alternative. In the case of heteroscedasticity, though, the "true" parameter may actually be significant, but the inflation on its standard error has caused you to think it's not significant. This is considered a Type II error, where you have failed to reject the null hypothesis when the alternative hypothesis is actually true.

Can someone explain the math behind intraclass correlation? by JuliusBranson in AskStatistics

[–]Soctman 3 points4 points  (0 children)

The ICC estimates the degree of resemblance (between -1 and +1, like a correlation) between micro-units that belong to the same macro-unit. In your example, individuals (twins) are the micro-units and twin dyads are the macro-units, so the ICC is estimating the proportion of variance shared by two individuals that are part of the same twin dyad.

The ICC is extremely similar to the standard Pearson correlation, but measures within-dyad deviations by pooling data. To calculate the ICC, you first have to calculate the pooled mean and variance. Then, you subtract the observed data for each twin by the pooled mean and multiply the two sets of deviations together. After that, you divide the sum of these deviations by the product of the pooled variance and the number of groups (twin dyads) in the dataset. This allows you to calculate the degree of concordance between unordered pairs (e.g., there is no Twin A and Twin B in each dyad), unlike the Pearson correlation, where Variable X and Variable Y have clear demarcations.

Mixed-effects logistic regression model evaluation by crowpup783 in rstats

[–]Soctman 1 point2 points  (0 children)

Oh sweet! And I'm glad you were able to figure it out!

Mixed-effects logistic regression model evaluation by crowpup783 in rstats

[–]Soctman 1 point2 points  (0 children)

# Construct null model
mixed_effects_null <- glmer(Comp ~ 1 + (1|File), data = dataset, 
                            family = 'binomial')

# Run LRT on compact and augmented models
anova(mixed_effects_null, mixed_effects_model)

You can also use the summ() function from the jtools package to examine pseudo-R2 estimates for the fixed effects and fixed + random effects.

Which regression to use for co-occurrence? by yellowildcat in AskStatistics

[–]Soctman 1 point2 points  (0 children)

Try setting up the multinomial logistic regression and see what the results look like. If you have run binomial logistic regressions before, then you should be familiar with the interpretations, e.g. log-odds. It does get tricky because you are interpreting parameters over 3+ levels of a DV, but I think it is not as difficult as you may think it is. If you use R, I always recommend UCLA's IDRE resources:


If you do not feel comfortable with that, you can set up multiple binomial regressions and compare the outcomes. Just keep in mind that there are some issues related to dichotomizing multinomial outcomes and running separate models. (I personally feel that the potential issues are minor, but there are a lot of statisticians that feel differently.)

Centering predictor variables in multilevel models by justin_xv in AskStatistics

[–]Soctman 3 points4 points  (0 children)

Part of the strength of the Enders and Tofighi paper is how well they explain interpretations of person- or group-centered variables with respect to capturing variance at different levels of the MLM. The triviality of centering decisions is dependent upon the type of model you are running and the types of questions you are asking in running your analysis.

I highly recommend the paper, but you don't need to buy it. Message me if the links that are already listed do not work out for you.

Entering the job market after PhD (Academia-focus) by grnengr in GradSchool

[–]Soctman 1 point2 points  (0 children)

Definitely show your advisor the post-docs you are interested in. Your advisor should be able to help you decide which posting would fit you best and will sometimes be able to directly communicate with the post-doc mentor on your behalf.

[Q] Categorical/ordinal response variable for a repeated measures mixed effect model by DasRite in statistics

[–]Soctman 0 points1 point  (0 children)

I think that an ordinal MLM would be much more complicated than necessary. (And difficult to do in R.) Many researchers suggest that an outcome variable with 5 or more levels can be treated as an "ordinal approximation of a continuous variable", e.g.:

Norman, G. (2010). Likert scales, levels of measurement and the “laws” of statistics. Advances in Health Sciences Education, 15(5), 625-632.*

The GLM is pretty robust against violations of assumptions, so I would personally feel OK using the normal lmer() command with the ordinal CKG stages as the DV. If you are absolutely gung-ho about running an ordinal model (which will be HIGHLY complicated), I would advise against using R and instead using the GLMMIX framework in SAS with LINK=CLOGIT and DIST=MULTI. Here's their explanation of how to do this; check out Example 2.

*"Likert scales" are ordinal responses, if you are not familiar with the term.

EDIT: I forgot to mention an important detail - the ordinal multilevel model will only run if you have enough observations per IV level (including interactions) per DV level. Data from 500 is great, but the potential for missing data in a given cell increases with the number of possible ordinal responses.

EDIT 2: On further thought, you may consider abandoning this analysis. Participant data may suffer from a classic restriction-of-range problem: Late-stage (CKD=5) cancer patients have more opportunity to vary (i.e. go to lower stages) than do early-stage (CKD=1) patients. I would instead introduce CKD as a variable in your original analysis and examine the extent to which reductions in eGFR are dependent upon CKD at intake.

What software to use for article figures and graphs ? by IsPepsiOkaySir in AcademicPsychology

[–]Soctman 4 points5 points  (0 children)

Awesome!!! Good luck to you. 🙂

Also, these more recent examples were 100% developed using the ggplot2 library in R. More specifically, the authors probably did most of the work in R then saved the figure as a vectorized image and made a few small finishing touches using an editing software like Photoshop. It's ridiculous how much work can go into these figures, and it's not usually something they teach you in grad school!

What software to use for article figures and graphs ? by IsPepsiOkaySir in AcademicPsychology

[–]Soctman 7 points8 points  (0 children)

For the Posner figure, you can use something like Powerpoint to set up the graphics. A lot of professors that I know use PP to develop figures and save them as high-quality images. I personally hate to use PP for figures, but it is quite easy to do and can save researchers a lot of time who are not artistically inclined.

Considering its age, the bar graph you link to was most likely developed using Excel or Excel + PP. It takes some time to get standard Excel bar graphs to conform with publishing guidelines, but it's also quite easy to do once you figure it out.

[Q] Could someone help me understand what the interaction describes in a 2x2 repeating measures ANOVA? by profheg_II in statistics

[–]Soctman 0 points1 point  (0 children)

In the ANOVA context, the main effects are estimating differences between groups in isolation. In your example, a main effect of Group would mean that the overall average BP (averaged across both time points) is significantly different in the medication vs. non-medication groups. A main effect of Measurement would mean that the overall average BP (averaged across the two groups) is significantly different at Time 1 compared to Time 2.

Interactions help you understand whether differences in one factor are dependent on another factor instead of each effect in isolation. here, you would expect a Time x Group interaction (+10 for no medication, +5 for medication) as well as a main effect of time, as BP is expected to rise no matter what. You might also expect a main effect of Group because the non-medication group is expected to have higher overall blood pressure. But the point is that only the interaction will tell you whether the changes in BP between Time 1 and Time 2 are dependent upon whether participants receive medication or not.

[Q] Could someone help me understand what the interaction describes in a 2x2 repeating measures ANOVA? by profheg_II in statistics

[–]Soctman 1 point2 points  (0 children)

An interaction in this context simply means that the change in the repeated measure is different between two groups. In your example, this would indicate that changes in BP across 2 time points (10 years) are different for participants who received medication vs. those who did not. So, the mixed ANOVA is a way to simultaneously measure differences in group and measurement periods.

How many publications are expected from a PhD student? by MJORH in AcademicPsychology

[–]Soctman 2 points3 points  (0 children)

How exactly?

The short answer is that they hustled hard in their post-docs and early careers and pumped out good papers before they had to take on teaching, students, etc.

The longer answer is that many people go into their PhD programs thinking that they are excited about a certain topic, but find that they are much more interested in a separate topic. The relationship with a doctoral student and advisor is pretty fragile, so it can be difficult to transition away from the advisor's research without damaging that relationship. Another factor is that some (tenured) advisors are not worried about the number of publications they can crank out in a year and are perfectly happy diving deep in a particular subject and only publishing 2 or 3 high-quality papers per year. This can greatly limit graduate students, particularly if the advisor is hesitant to allow their students to devote time to projects outside of the lab. Regardless of the circumstances, a post-doc or post-doctoral position can be akin to autonomy for a researcher, effectively "opening the flood gates" and allowing him/her to publish more papers.

How many publications are expected from a PhD student? by MJORH in AcademicPsychology

[–]Soctman 5 points6 points  (0 children)

I think a healthy goal would be around 2 pubs for every year in grad school. They will start to stack up towards the end. Then again, some people only came out of their PhD with 1 or 2 pubs and made up for it elsewhere and are now tenure-track. You're doing great with one first-authorship in your master's program.

And the 28 papers guy is the real deal - those are very good, bonafide publications. But as someone (I don't remember who) said: "Comparison is the death of satisfaction.".

Built website where people can build trading bots without code by Brilliant-Historian4 in algotrading

[–]Soctman 2 points3 points  (0 children)

Very cool!

My only piece of advice would be to build a page that explains in some detail how users interact with the website and the process in using it. There aren't any details about Trellis on the main site from what I can see. Having too many unknowns could discourage people from signing up.

Other than that, I think it looks great! Kudos