Solved
Contributor
Posts: 40

# Create monthly dummy variables

I have data on about 30000 records, each record can have up to 15 admission dates (Admission1-Admission15), up to 15 release dates (release1-release15) and up to 15 codes decribing the release type (street1-street15- for whether or not the immate was released to the streets).  Dates cover Jan 2011-Dec 2014

WW       01/10/2011      03/01/2013       01/03/2014      01/01/2013    11/01/2013   02/15/2014       Y         N          Y

So now I want to create 48 dummy variables: Month1-Month48 covering the 4 months of the data- Jan 2011-Dec 2014, where Month1 is January 2011 and dummy variable will be 1 only if the person is in prison and the release was not to the street. If the variable street1-street15 is Y that means if was a "true" release and the dummy variable will be 0 for that month and forward until the next admission.

So using this example record, Month1 and Month2-Month24 are all 1. Month25 is 0, Month26 get admitted again so its in prison that month and this should be 1.... and so forth.

How can I write a macro to do all of this- create all of these 48 dummy variables depending on admission date, release date and street, all which can have up to 15 items per person,

thanks

Accepted Solutions
Solution
‎05-17-2016 11:47 AM
Super User
Posts: 23,776

## Re: Create monthly dummy variables

Your correct in that if your missing values it generates an error. So create an if condition around the NInterval and do loop calculations.

If if you can't get it to work post you code.

All Replies
Super User
Posts: 23,776

## Re: Create monthly dummy variables

malena wrote:

How can I write a macro to do all of this- create all of these 48 dummy variables depending on admission date, release date and street, all which can have up to 15 items per person,

thanks

Why do you want a macro?

Super User
Posts: 23,776

## Re: Create monthly dummy variables

Here's an incredibly inefficient solution. But it's not a macro and its at a basic programming level.

I'm going to guess, since you only presented one case, that others will break the logic, so hopefully you'll be able to debug it yourself.

You could probably do this in a single data step using ARRAYS more efficiently, but it may be harder to understand...or not

At any rate, here's one long winded solution:

``````*Create fake data;
data have;
cards;
WW       01/10/2011      03/01/2013       01/03/2014      01/01/2013    11/01/2013   02/15/2014       Y         N          Y
;
run;

*Flip to long structure;
data have_flipped;
set have;
array release(3) release1-release3;
array street(3) street1-street3;

date_release=release(i);
release_type=street(i);
output;

end;
format date_: date9.;

run;

*Create a record for each month based on admission and release date;
data have_monthly;
set have_flipped;
do i=1 to N_interval+1;
Status=1;

if i=(N_interval+1) and release_type='Y' then status=0;

output;
end;

format month yymon7.;

keep ID month status;
run;

/*This section of code creates the empty data set required for the final data structure
one record per month, per ID
*/

*Create months;
data empty_months;
start = '01Dec2010'd;
do i=1 to 48;
Month=intnx('month', start, i, 'b');
Monthly_Number = catt("Month", put(i, z2.));
output;
end;
format month yymon7.;
keep month Monthly_Number;
run;

*Get list of IDs;
proc sql;
create table ids as
select distinct id from have;
quit;

*Create monthly by ID table with no status;
proc sql;
create table empty_status as
select a.id, b.*
from empty_months as b
cross join ids as a
order by 1, 2;
quit;

*Add in status from previous calculation;
proc sql;
create table status as
select a.*, b.status
from empty_status as a
left join have_monthly as b
on a.id=b.id
and a.month=b.month
order by 1,2;
quit;

*fill down values so values are retained across time;
data status_filled;
length status 8.;
set status;
by id;
retain final_status;
if first.id then final_status=status;
else if not missing(status) then final_status=status;
status=final_status;

drop final_status;
run;

*Transpose back to final wide data structure;
proc transpose data=status_Filled out=want(drop= _name_);
by id;
var status;
id monthly_number;
idlabel month;
run;
``````
Contributor
Posts: 40

## Re: Create monthly dummy variables

thanks, this looks good- exactly what I want and easier to follow since I am sharing it with others who do not know much of SAS. But I do get an error when I run this code- probably because not all records have all 15 admission/release dates

ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the

BY expression is missing, zero, or invalid.

*Create a record for each month based on admission and release date;

data have_monthly;

set have_flipped;

do i=1 to N_interval+1;

Status=1;

if i=(N_interval+1) and release_type='Y' then status=0;

output;

end;

format month yymon7.;

keep ID month status;

run;

Solution
‎05-17-2016 11:47 AM
Super User
Posts: 23,776

## Re: Create monthly dummy variables

Your correct in that if your missing values it generates an error. So create an if condition around the NInterval and do loop calculations.

If if you can't get it to work post you code.

Contributor
Posts: 40

## Re: Create monthly dummy variables

what kind of loop condition do I need?

Super User
Posts: 23,776

## Re: Create monthly dummy variables

You need an IF condition around the loop already present.

The logic is, if admission date is missing then we don't enter the loop.

*code that executes if date_admission is not missig/

end;

Contributor
Posts: 40

## Re: Create monthly dummy variables

following the creation of these monthly variables DOC1-DOC48 and only having month_death and year_death on deaths- how could I run a survival analysis? comparing inmmates vs noninmmates?

Super User
Posts: 9,599

## Re: Create monthly dummy variables

Several things here.  Firstly can you post some test data - in the form of a datastep, which covers one complete case, and what the output should look like.  Secondly, why do you need loads of per month variables, this sounds like an ineffcient method of doing whatever it is you are doing as there are plenty of functions to check if something is within time windows.  Thirdly think about your data, how do you want to work with it, not how is it supposed to look like at the end.  The data you program with can look completely different from a report output, this is because programming works differently.  For instance why not have a normalised structure:

```data have;
id="WW";
rel_date="01jan2013"d;
street="Y"; output;
rel_date="01nov2013"d;
street="N"; output;
rel_date="15feb2014"d;
street="Y"; output;
run;```

You will then have all the information about one event (admission/release/street) on one observation - much easier to work with.  You can use agregates and functions to go down the rows quite siplpy as the data strucure doesn't change.

🔒 This topic is solved and locked.