Calcite | Level 5

Create a Dummy Variable based on values in multiple different rows

Hello, I'd very much appreciate a certain amount of help with this.
This is my current data:

data WORK.HAVE;
infile datalines dsd truncover;
input Thing:32. Day:32.;
datalines;
0 1
0 4
0 11
0 14
0 19
0 28
0 35
0 41
1 48
1 55
;;;;

I want to create an Early_Warning dummy Variable that will take the value of 1, if Thing is happening with the next 30 days.
So in this case this is what the result should look like:

data WORK.WANT;
infile datalines dsd truncover;
input Thing:32. Day:32. Early_Warning:32.;
datalines;
0 1 0
0 4 0
0 11 0
0 14 0
0 19 1
0 28 1
0 35 1
0 41 1
1 48 1
1 55 1
;;;;

Lastly, if we can't be sure whether Thing is happening within the next 30 days, I want the Early_Warning to be null and the data should look like this.

data WORK.WANT2;
infile datalines dsd truncover;
input Thing:32. Day:32. Early_Warning:32.;
datalines;
0 1 0
0 4 0
0 11 0
0 14 0
0 19 0
0 28 .
0 35 .
0 41 .
0 48 .
0 55 .
;;;;

1 ACCEPTED SOLUTION

Accepted Solutions
Tourmaline | Level 20

Re: Create a Dummy Variable based on values in multiple different rows

Here is an illustration of what I mean in my previous message:

``````
data WORK.HAVE;
infile datalines truncover;
input Thing  Day ;
datalines;
0 1
0 4
0 11
0 14
0 19
0 28
0 35
0 41
1 48
1 55
;;;;
/*Test with thing positive 1*/
proc sql;
create table want(drop=d) as
select distinct a.*,ifn((a.day<=b.day<=a.day+30)=0,d,a.day<=b.day<=a.day+30) as Early_Warning
from (select *,ifn(max(day)>day+30,0,.) as d from have) a left join have(where=(thing=1)) b
on a.day<=b.day<=a.day+30
order by thing,a.day;
quit;
/*Test with all Thing 0*/
data WORK.HAVE;
infile datalines truncover;
input Thing  Day ;
datalines;
0 1
0 4
0 11
0 14
0 19
0 28
0 35
0 41
0 48
0 55
;;;;
proc sql;
create table want(drop=d) as
select distinct a.*,ifn((a.day<=b.day<=a.day+30)=0,d,a.day<=b.day<=a.day+30) as Early_Warning
from (select *,ifn(max(day)>day+30,0,.) as d from have) a left join have(where=(thing=1)) b
on a.day<=b.day<=a.day+30
order by thing,a.day;
quit;``````
9 REPLIES 9
PROC Star

Re: Create a Dummy Variable based on values in multiple different rows

Let's say you know

1. Data are sorted by DAY
2. DAY is always a positive integer that will never exceed 1000.

Then you could:

1. Initialize a 1,000-element array with zeroes.
2. Read in all the records with thing=1, and set to 1 all the the array elements corresponding to DAY-1 through DAY.
3. After reading in all the thing=1 records, set all the elements from last DAY + 1 to 1000 as missing values.
4. Reread the  entire HAVE dataset and pull a value from the array that corresponds to DAY.

``````data WORK.HAVE;
infile datalines truncover;
input Thing  Day ;
datalines;
0 1
0 4
0 11
0 14
0 19
0 28
0 35
0 41
1 48
1 55
;;;;

data want;
array possible_days {1:1000} _temporary_  (1000*0);
do until (end_of_thing_eq_1);
set have (where=(thing=1)) end=end_of_thing_eq_1;
do d=day-30 to day;
possible_days{d}=1;
end;
end;
do d=day+1 to 1000;
possible_days{d}=.;
end;
drop d;
do until (end_of_have);
set have end=end_of_have;
early_warning=possible_days{day};
output;
end;
run;``````

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tourmaline | Level 20

Re: Create a Dummy Variable based on values in multiple different rows

Hi @dcd   , @mkeintz 's solution is full proof.

I played mine as I was bored.

Anyways,  for want2 ,  if we can't be sure whether Thing is happening within the next 30 days,

then how come earlywarning is zero upto 0 19 0 assuming all records of thing is 0 as your want2 indicates. What and where am i missing?

0 1 0
0 4 0
0 11 0
0 14 0
0 19 0

``````
data WORK.HAVE;
infile datalines truncover;
input Thing  Day ;
datalines;
0 1
0 4
0 11
0 14
0 19
0 28
0 35
0 41
1 48
1 55
;;;;

proc sql;
create table want as
select distinct a.*,(a.day<=b.day<=a.day+30) as Early_Warning
from have a left join have(where=(thing=1)) b
on a.day<=b.day<=a.day+30
order by thing,a.day;
quit;``````
Calcite | Level 5

Re: Create a Dummy Variable based on values in multiple different rows

In want2 early_warning is 0 up to day 19 because we do have info up to day 55 -that is more than 30 days later- that there is no triggering event. In day 28 we don't have info for 30 days -day 55 is only 27 days away- so we can't be sure, hence I'd like a null there.

PROC Star

Re: Create a Dummy Variable based on values in multiple different rows

@dcd    Are you aware that my program is intended to produce results satisfying BOTH of your objectives?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Calcite | Level 5

Re: Create a Dummy Variable based on values in multiple different rows

I was just clarifying to novinosrin. I only now got the chance to try what you suggested but I'm getting an empty table when the Thing is always 0. Any idea why is that?

Also there is an out of subscript error when Thing happens on say day 10 but that's easy to fix.

Tourmaline | Level 20

Re: Create a Dummy Variable based on values in multiple different rows

Hi @dcd  Thank you for clarifying. I am afraid while proc sql syntax can be made to look simple, that condition might warrant another pass or a Cartesian for that matter which would make it inefficient in my opinion. Well I am pleased and glad you have got a datastep solution though

Tourmaline | Level 20

Re: Create a Dummy Variable based on values in multiple different rows

Here is an illustration of what I mean in my previous message:

``````
data WORK.HAVE;
infile datalines truncover;
input Thing  Day ;
datalines;
0 1
0 4
0 11
0 14
0 19
0 28
0 35
0 41
1 48
1 55
;;;;
/*Test with thing positive 1*/
proc sql;
create table want(drop=d) as
select distinct a.*,ifn((a.day<=b.day<=a.day+30)=0,d,a.day<=b.day<=a.day+30) as Early_Warning
from (select *,ifn(max(day)>day+30,0,.) as d from have) a left join have(where=(thing=1)) b
on a.day<=b.day<=a.day+30
order by thing,a.day;
quit;
/*Test with all Thing 0*/
data WORK.HAVE;
infile datalines truncover;
input Thing  Day ;
datalines;
0 1
0 4
0 11
0 14
0 19
0 28
0 35
0 41
0 48
0 55
;;;;
proc sql;
create table want(drop=d) as
select distinct a.*,ifn((a.day<=b.day<=a.day+30)=0,d,a.day<=b.day<=a.day+30) as Early_Warning
from (select *,ifn(max(day)>day+30,0,.) as d from have) a left join have(where=(thing=1)) b
on a.day<=b.day<=a.day+30
order by thing,a.day;
quit;``````
Calcite | Level 5

Re: Create a Dummy Variable based on values in multiple different rows

This works perfectly, thanks!

Calcite | Level 5

Re: Create a Dummy Variable based on values in multiple different rows

I tried adding a counter to make it work in both cases, instead it stopped working in either one.

``````data want;
array possible_days {1:1000} _temporary_  (1000*0);
events=0;
do until (end_of_have);
set have end=end_of_have;
if Thing=1 then do;
events+1;
d=max(day-30, 1) to day;
possible_days{d}=1;
end;
end;
if events=0 then do;
d=day-29 to 1000;
possible_days{d}=.;
end;
drop d;
drop events;
do until (end_of_have);
set have end=end_of_have;
early_warning=possible_days{day};
output;
end;
run;``````

However the code seems like it should work? (more likely I'm just clueless I guess)

The log locates the errors in the

``````d=max(day-30, 1) to day;

d=day-29 to 1000;``````

lines saying 'syntax error' and 'expecting arithmetic operator' respectively.

Discussion stats
• 9 replies
• 1174 views
• 0 likes
• 3 in conversation