Skip to content

displays pvalues for all the covariates in the model#21

Open
jazberna wants to merge 8 commits into
iansealy:masterfrom
jazberna:master
Open

displays pvalues for all the covariates in the model#21
jazberna wants to merge 8 commits into
iansealy:masterfrom
jazberna:master

Conversation

@jazberna

Copy link
Copy Markdown
Contributor

No description provided.

@iansealy

Copy link
Copy Markdown
Owner

Thanks for this. So at the moment, this will all be ignored because there's nothing in DETCT::Misc::R to handle the extra p values. Have you got any thoughts on how best to display this to users? Just add the extra p values as extra columns in all.tsv, all.csv, etc... I worry that users will just ignore the extra columns. Is there any way we can combine them so we still have one p value per region?

@jazberna

jazberna commented Aug 6, 2014

Copy link
Copy Markdown
Contributor Author

The reported pvalues (before and after FDR) are obtained using a Likelihood Ratio Test that compares the full model (which contains the interactions between condition and group) at the intercept only model.

1 similar comment
@jazberna

jazberna commented Aug 6, 2014

Copy link
Copy Markdown
Contributor Author

The reported pvalues (before and after FDR) are obtained using a Likelihood Ratio Test that compares the full model (which contains the interactions between condition and group) at the intercept only model.

@jazberna jazberna closed this Aug 6, 2014
@iansealy iansealy reopened this Aug 6, 2014
@iansealy

iansealy commented Aug 6, 2014

Copy link
Copy Markdown
Owner

Two things:

  1. The code as is won't work if there's only one factor (i.e. there are conditions, but not groups).
  2. It sounds, from what you said in your email, that simplifying this to one p value per region isn't appropriate. So maybe we should go back to presenting all the numbers and just give some guidance to users about how to interpret them. That is, add a bunch more columns to all.tsv (and the other output files). What do you think? Would you be able to write the kind of guidance that anyone in the lab could use to decipher their data?

@jazberna

jazberna commented Aug 6, 2014

Copy link
Copy Markdown
Contributor Author

Hi Ian.

On 6 Aug 2014, at 13:20, Ian Sealy notifications@github.com wrote:

Two things:

The code as is won't work if there's only one factor (i.e. there are conditions, but not groups).

I will run it with one factor to see if it crashes but I see that the saturated model only happens with there are two factors:

Create DESeqDataSet (with design according to number of factors)

dds <- DESeqDataSetFromMatrix(countData, samples, design = ~ condition)
if (numFactors == 2) {
design(dds) <- formula(~ group * condition) # <--- the asterisk only appears when there are two factors
}

It sounds, from what you said in your email, that simplifying this to one p value per region isn't appropriate.

Yes, the thing is that the LR test is for model selection. In my last commit that unique pvalue tells you if the model with condition and group is significantly better than non having the information coming the condition and group but since you are already interested in the condition and to control by group, that's your model, no selection is needed.

So maybe we should go back to presenting all the numbers and just give some guidance to users about how to interpret them. That is, add a bunch more columns to all.tsv (and the other output files). What do you think?

Yeap, If I were the user I would at least see the pvalues for each factor in the model. Also I know we all know this but… if the interaction is significant the region counts even if the condition itself is not significant in the same way that in all stats book they tell you not yo remove main effects of the model if the interaction is significant. i.e from this link
http://www.ssc.wisc.edu/sscc/pubs/sfr-stats.htm
"It's almost always a mistake to include interactions in a regression without the main effects,.."

Would you be able to write the kind of guidance that anyone in the lab could use to decipher their data?

Yes, can write some notes explaining a given example.


Reply to this email directly or view it on GitHub.

Jorge

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants