See the introductory post for the idea behind these posts and some important disclaimers.

### Today: How does the probability of a binary variable being = 1 change if we set an independent variable from min to max while keeping all other independent variables at their mean?

First, we will have a look at a simple example: Paul Whiteley, Harold D. Clarke, David Sanders and Marianne C. Stewart look at why some voters in the UK voted in favor of electoral reform and some didn´t. You can find the article here or here. The dependent variable is „vote Yes“; the model is a simple binomial logit. The usual way to depict the results is the well-known table, depicting coefficients, standard errors, significance levels, and so on.

However, as Whiteley et al. put it on p.315: „Since the binomial logit functional form is non-linear, the substantive impact of statistically signiﬁcant predictor variables is not readily apparent from the coefﬁcients in Table 3. To provide intuition about the inﬂuence of various predictors, Figure 4 shows the effect on the probability of voting Yes of increasing a given predictor from its minimum to its maximum value while holding the other variables constant at their means.“

Thus (if you cannot download the original article): What we want is a figure that looks roughly like this:

Now, that would be a helpful figure. We can discern how the probability of voting Yes changes with changes of each independent variable from its minimum to its maximum, given that all other independent variables are held constant.

Let´s try to produce a similar graph with our example dataset. The example dataset has nothing to do with voting yes or no, its sole purpose is to illustrate how we can generate a graph like Whiteley et al.

The dataset covers 21 OECD countries from 1980 to 2008 and contains mostly macro-level data characterizing these countries. Most of the data are from the dataset by Klaus Armingeon, David Weisstanner, Sarah Engler, Panajotis Potolidis and Marlène Gerber, some are from my own work.

Again, think of the disclaimer – the econometrics are awful, the purpose is just to show how to produce these graphs:

OK. Let´s start by generating our dependent variable. The variable needs to be binary, so let´s use TSO to generate a binary variable. TSO stands for telecom state ownership, the variable denotes how much of the dominant telecoms operator is owned by the government. 100 is 100%, 0 is 0%, 51 is 51% and so on… Thus, we can generate a simple binary variable denoting whether a telecoms operator is state-owned or not:

generate stateowned = 0 replace stateowned = 1 if tso == 100

Then, we estimate a simple model: which variables can predict whether a given country in a given year still owns its telecoms operator? How about: Percentage of left parties in government (GOVLEFT), Lijpharts executive-parties dimension (EXPART), Lijpharts federal-unitary dimension (FEDUN) and public deficits (DEFICIT)?

The result looks like this:

logit stateowned fedun deficit expart govleft Iteration 0: log likelihood = -399.00501 Iteration 1: log likelihood = -346.51031 Iteration 2: log likelihood = -346.13927 Iteration 3: log likelihood = -346.13866 Iteration 4: log likelihood = -346.13866 Logistic regression Number of obs = 583 LR chi2(4) = 105.73 Prob > chi2 = 0.0000 Log likelihood = -346.13866 Pseudo R2 = 0.1325 ------------------------------------------------------------------------------ stateowned | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- fedun | -.2183449 .0812929 -2.69 0.007 -.377676 -.0590138 deficit | -.2822931 .0362184 -7.79 0.000 -.3532799 -.2113062 expart | .5477356 .0984081 5.57 0.000 .3548593 .740612 govleft | .0058022 .0023995 2.42 0.016 .0010993 .0105051 _cons | -.4860532 .1372772 -3.54 0.000 -.7551116 -.2169949 ------------------------------------------------------------------------------

Now, how about some clarifying? What do these coefficients mean? A figure that tells us how the probability of state ownership changes with changes of the independent variables from min to max would be nice.

*** clarify is a simulation-based tool. If you want replicable results, you have to set the same seed set seed 1695378 *** basic clarify command No. 1: estsimp. estimates the model and simulates its parameters estsimp logit stateowned fedun deficit expart govleft *** Now would be a good time to have a look at your dataset if you´re not familiar with clarify. estsimp has generated 1000 simulations of the model parameters, denoted b1 to b5. They are right at the beginning of your dataset. For example, the point estimate for the coefficient of FEDUN is -.2183449, and in the new variable b1 you can find the simulated values for this parameter. b2 is the coefficient for deficit, and so on. *** We can forget these values for now, what we really want is simulating quantities of interest (e.g „How much does the probability of having a state-owned telecoms operator change if we change the value of our independent variable GOVLEFT from its min to its max?“) *** OK. Now we generate an empty variable in which we will store our quantities of interest. *** Our quantity of interest is: How much does the probability of having a state-owned telecoms operator change if we change the value of our independent variable from its min to its max? *** We will first do it the hard way and make these changes by hand. *** Later on, we can use one of the built-in wrappers of clarify *** Three variables: iv_min (predicted probability if our independent variable is at its minimum) iv_max (predicted probability if our independent variable is at its minimum), changepredictedprobminmax (the difference between these two) *** Note: I always like to generate these „empty“ variables beforehand, so that I know what I need. generate iv_min = . generate iv_max = . generate changepredictedprobminmax = . generate n = _n *** we need a counter variable later on, so let´s just generate it. *** basic clarify command No. 2: setx. Set values for the independent variables. setx mean *** sets all independent variables to their mean *** Now, the process is quite similar for all the independent variables used. *** we could do it with a nice loop, forval { or while {, but to get the idea, we´ll do it by hand. *** Independent variable Number 1: fedun *** basic clarify command No. 3: simqi. Simulates quantities of interest. setx fedun min *** set fedun to its minimum simqi, prval(1) genpr(pi) *** Translates as: simulate quantity of interest. In our case: probability that the dependent variable takes on the value 1. Save the simulated probability in the variable pi *** Now that we have our quantity of interest, we save it in our pre-arranged variable *** Here, our counter variable is important: we only want one value of our variable overwritten – the first one (first independent variable, first value of iv_min) *** after that, pi can be dropped. replace iv_min = pi if n == 1 drop pi *** Now the same procedure, only with fedun set to its max setx fedun max simqi, prval(1) genpr(pi) replace iv_max = pi if n == 1 drop pi replace changepredictedprobminmax = iv_max-iv_min if n == 1 *** I hope you have got the logic? Now we do the exact same procedure for the other independent variables *** Don´t forget to setx the independent variables to their means first *** Otherwise, e.g. fedun will remain at its max *** and don´t forget to set the counter variable +1 *** deficit setx mean setx deficit min simqi, prval(1) genpr(pi) replace iv_min = pi if n == 2 drop pi setx deficit max simqi, prval(1) genpr(pi) replace iv_max = pi if n == 2 drop pi replace changepredictedprobminmax = iv_max-iv_min if n == 2 *** expart setx mean setx expart min simqi, prval(1) genpr(pi) replace iv_min = pi if n == 3 drop pi setx expart max simqi, prval(1) genpr(pi) replace iv_max = pi if n == 3 drop pi replace changepredictedprobminmax = iv_max-iv_min if n == 3 *** govleft setx mean setx govleft min simqi, prval(1) genpr(pi) replace iv_min = pi if n == 4 drop pi setx govleft max simqi, prval(1) genpr(pi) replace iv_max = pi if n == 4 drop pi replace changepredictedprobminmax = iv_max-iv_min if n == 4

OK. Now we have our quantities of interest. If you would like to look them up in the dataset: They´re the first four values in CHANGEPREDICTEDPROBMINMAX. The first value is the change of the probability of having a state-owned enterprise given that FEDUN changes fromm its minimum to its maximum, all other independent variables held at their mean, the second value is the same information for DEFICIT.

Now, we can easily plot these values. That´s pretty much standard bar graph code.

*** we could do it by hand, but we will simply generate a variable containing our variable names generate variablename = _n label define independent_variables 1 "Federal-Unitary Dimension" 2 "Deficit" 3 "Executive-Parties Dimension" 4 "Left Parties in Goverment" label value variablename independent_variables graph hbar (asis) changepredictedprobminmax, over(variablename, label(labsize(vsmall))) nofill blabel(bar, size(vsmall) position(outside) format(%6.5g)) ytitle(Change in probability of having a state-owned enterprise) yscale(range(-1 1)) ylabel(-1(.25)1) title(, size(vsmall)) graphregion(fcolor(white))

And the results look like this:

OK. That´s it. Nice and informative. You can surely improve the graph in many regards (nicer bars, vertical line at zero, putting „change of independent variable from min to max in the subtitle…), but that´s the basic way to generate this type of graphs.

And we haven´t even started yet with many of the tricks of the trade of using clarify (wrappers, depicting uncertainty, interaction terms, nonlinear relations, using loops…) – so stay with me. When (or rather if) I´ve got some spare time, I´ll post some more code.

Pingback: Clarify for dummies like me, part 3 | Simon Fink

Pingback: Clarify for dummies like me, part 5 | Simon Fink