You are performing logistic regression in SPSS and you see a warning message that tells you that "There are 235 (13.7%) cells (i.e., dependent variable levels by subpopulations) with zero frequencies." What is this warning message trying to tell you?
a. |
There are some combinations of your variables that cancel each other out. |
|
b. |
There are some combinations of your variables that have no observations. |
|
c. |
There are some combination of your variables that produced measurements with a value of zero. |
|
d. |
There are some combinations of your variables that occurred far too frequently to be accurately measured. |
when
the warning is triggered, the deviance (-2*Likelihood Ratio) can
no
longer be assumed to have a chi-sq distribution, so it can no
longer be used as an overall goodness of fit indicator.
More strictly speaking, if the number of "settings"
(combinations
of values of explanatory variables) is large (or dependent on
the
sample, as it will be if continuous variables are present)
you
can't use the -2LR test for goodness of fit. If so, the
Hosmer-Lemeshow statistic is an apparently effective fudge - as
I
recall it splits the sample into deciles according to
predicted
probability, and makes a calculation based on residuals in
these
groups -- details should be available in SPSS, H&S's own book,
and
Agresti's _Intro to Categ Data Analysis_, none of which I have
to
hand ATM.
Logistic regression with grouped data has a fixed number
of
settings (N-cells in the implied crosstabulation), so as long
as
there are few cells with low expected values, the asymptotics
are
satisfied.
In any case, the parameter estimates and their SEs, and a
chi-sq
test on nested pairs of models (X2 = Dev1 - Dev2, df = df1 -
df2;
H0 that none of the added variables improve the model) are
not
affected by this problem.
You are performing logistic regression in SPSS and you see a warning message that tells you that "There are 235 (13....