Logistic regression for ordered dependant variable with more than 2 levels

Multinomial Logistic Regression Models

January 1, 2013 ©Arup Guha - Indian Institute of Foreign Trade - New Delhi, India

 Logistic regression CAN handle dependant variables
with more than two categories
 It is important to note whether the response variable
is ordinal (consisting of ordered categories like young,
middle-aged, old) or nominal (dependant is unordered
like red, blue, black)
 Some multinomial logistic models are appropriate only
for ordered response
 It is not mathematically necessary to consider the
natural ordering when modeling ordinal response but,
 Considering the natural ordering
 Leads to a more parsimonious model
 Increase power to detect relationships with other variables


 Applying logistic regression considering the natural
order is done using a modeling technique called the
“Proportional Odds Model”
 Say the dependant variable Y has 4 states measuring
the impact of radiation on the human body; fine,
sick, serious,dead
 Let p1=prob of fine, p2=prob of sick, p3=prob of
serious, p4=prob of dead
 Let us define a baseline category: fine, since this is
the normal stage (we shall see why we need this
later)

 What if we break up the modeling of the 4 level
ordered dependant into 3 binary logistic
situations: 1 – (fine,sick), 2 – (fine,serious),3 –
(fine,dead)?
 Then we would have 3 logit equations:
 Log(p2/p1)=B11+B12X1+B13X2
 Log(p3/p1)=B21+B22X1+B23X2
 Log(p4/p1)=B31+B32X1+B33X2
X is the degree of radiation dummy with 3 levels so
broken into 2 binary dummies
 So, 9 parameters to be estmated


 Now consider an alternative model for the same
situation
 Cumulative logit model:
 L1=log(p1/p2+p3+p4)
 L2=log(p1+p2/p3+p4)
 L3=log(p1+p2+p3/p4)
 The obvious way to introduce covariates is
 L1=B11+B12X1+B13X2
 L2=B21+B22X1+B23X2
 L3=B31+B32X1+B33X2


 Let us simplyfy the model by specifying that
the slope parameters are identical over the
logit equations. Then,
 L1=A1+B1X1+B2X2
 L2=A2+B1X1+B2X2
 L3=A3+B1X1+B2X2
 This is the proportional odds cumulative logit
model


 Suppose that the categorical outcome is actually a
categorized version of an unobservable (latent)
continuous variable which has a logistic distribution
 The continuous scale is divided into ﬁve regions by
four cut-points c1, c2, c3, c4 which are determined by
nature
 If Z ≤ c1 we observe Y = 1; if c1 < Z ≤ c2 we observe Y =
2; and so on
 Suppose that the Z is related to the X’s through a linear
regression
 Then, the coarsened categorical variable would be
related Y will be related to the X’s by a proportional-
odds cumulative logit model

 Let us go back to the model
 L1=A1+B1X1+B2X2
 L2=A2+B1X1+B2X2
 L3=A3+B1X1+B2X2
 Note that Lj is the log-odds of falling into or below category j
versus falling above it
 Aj is the log-odds of falling into or below category j when X1 =
X2 = 0
 B1 is the increase in log-odds of falling into or below any
category associated with a one-unit increase in Xk, holding all
the other X-variables constant.
 Therefore, a positive slope indicates a tendency for the
response level to decrease as the variable decreases

 Our example of 4 levels of impact of radiation
corresponding to 3 levels of radiation

proc logistic data=radiation_impact;
freq count;
class radiation / order=data param=ref ref=first;
model sickness (order=data descending) = radiation /
link=logit
aggregate=(radiation) scale=none;
run;

 Freq=count
 This is important for specifying grouped data
 Count is the variable that contains the frequency of
occurrance of each observation
 In its absence, each row would be considered a
separate row of data
 Class=radiation
 Specifies that radiation is a classification variable to
be used in the analysis
 SAS would automatically generate n-1 binary
dummies for n categories of radiation with param=ref
option

 Order=data
 Simply tells SAS to arrange the response categories in
the order they occur in the input data 1,2,3,4
 Param=ref
 This implies that there is going to be dummy coding
for the classification variable ‘radiation’listed in class
 Ref=first
 Designates the first ordered level, in this case ‘fine’ as
the reference level


 Order=data descending
 This tells SAS to reverse the order of the logits
 So, instead of the cumulative logit model being
 L1=log(p1/p2+p3+p4)
 L2=log(p1+p2/p3+p4)
 L3=log(p1+p2+p3/p4), it becomes
 L1=log(p4/p1+p2+p3)
 L2=log(p4+p3/p1+p2)
 L3=log(p4+p3+p2/p1)
 Now, a positive B1 indicates that a higher value of X1
leads to greater chance of radiation sickness

 Link=logit
 fits the cumulative logit model when there are more
than two response categories
 Aggregate=radiation
 Indicates that the goodness of fit statistics are to be
calculated on the subpopulations of the variable:
radiation
 Scale=none
 No correction is need for the dispersion parameter
 To understand this, read up. This happens when the
goodness of fit statistic exceeds its degrees of freedom
and need to be corrected for

 When we ﬁt this model, the first output we
see:
Score Test for the Proportional Odds Assumption
Chi-Square DF Pr > ChiSq
17.2866 21 0.6936

 Null hypothesis is that the current proportional-odds
cumulative logit model is true
 Seems like we fail to reject the null and so can proceed to the
rest of the output under the current assumption


 Ultimately we are interested in the predicted
probabilities
OUTPUT <OUT=SAS-data-set><options>
 Predicted=
 For a cumulative model, it is the predicted cumulative
probability (that is, the probability that the response
variable is less than or equal to the value of _LEVEL_);
 PREDPROBS=I or C
 Individual|I requests the predicted probability of each
response level.
 CUMULATIVE | C requests the cumulative predicted
probability of each response level

Logistic regression for ordered dependant variable with more than 2 levels

More Related Content

What's hot (18)

Viewers also liked (20)

Similar to Logistic regression for ordered dependant variable with more than 2 levels (20)

More from Arup Guha (7)

Recently uploaded (20)

Logistic regression for ordered dependant variable with more than 2 levels