Hello all,
I have code that allows me to reorganize my data (companies observed each year 2006-2015) based on a variable. The variable is an event-based variable that takes on values from 0-1 (1 if it happened). When (and/or when) the variable takes on a 1, it collapses the rows and sum the values on other relevant variables. It also creates different indicators where: (1) one variable indicates when time starts (2) one variable that indicates when the time ends, and (3) a variable that indicates 0/1 whether an event took place at all. Essentially, each observation of a case becomes a start-stop as to when the event-based variable was observed.
I Illustrate below what I have:
Have:
Time | ID | County | FocalEvent | X1mva | X2police |
1 | 222 | ABC | 1 | 10 | 15 |
2 | 222 | ABC | 0 | 5 | 10 |
3 | 222 | ABC | 0 | 15 | 5 |
4 | 222 | ABC | 0 | 2 | 4 |
5 | 222 | ABC | 1 | 5 | 10 |
6 | 222 | ABC | 0 | 10 | 5 |
7 | 222 | ABC | 0 | 5 | 10 |
8 | 222 | ABC | 0 | 4 | 4 |
9 | 222 | ABC | 0 | 5 | 5 |
10 | 222 | ABC | 1 | 10 | 10 |
1 | 333 | BBB | 0 | 3 | 4 |
2 | 333 | BBB | 0 | 2 | 1 |
3 | 333 | BBB | 0 | 5 | 5 |
4 | 333 | BBB | 1 | 10 | 5 |
5 | 333 | BBB | 0 | 15 | 15 |
6 | 333 | BBB | 0 | 10 | 10 |
7 | 333 | BBB | 0 | 10 | 10 |
8 | 333 | BBB | 1 | 5 | 5 |
9 | 333 | BBB | 0 | 10 | 10 |
10 | 333 | BBB | 0 | 15 | 15 |
1 | 444 | CCC | 0 | 1 | 1 |
2 | 444 | CCC | 0 | 5 | 5 |
3 | 444 | CCC | 0 | 4 | 4 |
4 | 444 | CCC | 0 | 5 | 5 |
5 | 444 | CCC | 0 | 1 | 1 |
6 | 444 | CCC | 1 | 5 | 5 |
7 | 444 | CCC | 0 | 10 | 10 |
8 | 444 | CCC | 0 | 5 | 5 |
9 | 444 | CCC | 0 | 5 | 5 |
10 | 444 | CCC | 0 | 5 | 5 |
1 | 555 | DDD | 0 | 1 | 1 |
2 | 555 | DDD | 0 | 1 | 1 |
3 | 555 | DDD | 0 | 1 | 1 |
Want:
ID | County | X1mva | X2police | Time1 | Time2 | Event |
222 | ABC | 10 | 15 | 1 | 1 | 1 |
222 | ABC | 27 | 29 | 1 | 5 | 1 |
222 | ABC | 34 | 34 | 5 | 10 | 1 |
333 | BBB | 20 | 15 | 1 | 4 | 1 |
333 | BBB | 40 | 40 | 4 | 8 | 1 |
333 | BBB | 25 | 25 | 8 | 10 | 0 |
444 | CCC | 21 | 21 | 1 | 6 | 1 |
444 | CCC | 25 | 25 | 6 | 10 | 0 |
555 | DDD | 3 | 3 | 1 | 3 | 0 |
Code:
data have;
input Time ID County :$ FocalEvent X1mva X2police;
cards;
1 222 ABC 1 10 15
2 222 ABC 0 5 10
3 222 ABC 0 15 5
4 222 ABC 0 2 4
5 222 ABC 1 5 10
6 222 ABC 0 10 5
7 222 ABC 0 5 10
8 222 ABC 0 4 4
9 222 ABC 0 5 5
10 222 ABC 1 10 10
1 333 BBB 0 3 4
2 333 BBB 0 2 1
3 333 BBB 0 5 5
4 333 BBB 1 10 5
5 333 BBB 0 15 15
6 333 BBB 0 10 10
7 333 BBB 0 10 10
8 333 BBB 1 5 5
9 333 BBB 0 10 10
10 333 BBB 0 15 15
1 444 CCC 0 1 1
2 444 CCC 0 5 5
3 444 CCC 0 4 4
4 444 CCC 0 5 5
5 444 CCC 0 1 1
6 444 CCC 1 5 5
7 444 CCC 0 10 10
8 444 CCC 0 5 5
9 444 CCC 0 5 5
10 444 CCC 0 5 5
1 555 DDD 0 1 1
2 555 DDD 0 1 1
3 555 DDD 0 1 1
;
run;
proc sort data = have;
by ID County Time;
run;
data want1;
set have;
by ID;
category + first.ID;
output;
category + FocalEvent;
run;
proc sql;
create table want2 as
select
ID, County,
sum(X1mva) as X1mva,
sum(X2police) as X2police,
max(min(Time)-1,1) as Time1,
max(Time) as Time2,
max(FocalEvent) as Event
from want1
group by
ID, County, category
order by
ID, County, Time1, Time2
;
quit;
proc print;
run;
In practice, the variables (e.g., X1MVA, X2POLICE) that I sum have a lot of zeros on them and the time based events are spread out and more sparse. Is there a way I can add to my code such that I can count the time since the latest occurrence of the X1MVA or X2POLICE variables that I sum. For instance:
Have:
Time | ID | County | FocalEvent | X1mva | X2police |
1 | 222 | ABC | 1 | 0 | 1 |
2 | 222 | ABC | 0 | 1 | 0 |
3 | 222 | ABC | 0 | 0 | 1 |
4 | 222 | ABC | 0 | 0 | 0 |
5 | 222 | ABC | 1 | 0 | 0 |
6 | 222 | ABC | 0 | 0 | 1 |
7 | 222 | ABC | 0 | 0 | 0 |
8 | 222 | ABC | 0 | 0 | 0 |
9 | 222 | ABC | 0 | 1 | 0 |
10 | 222 | ABC | 1 | 1 | 0 |
1 | 333 | BBB | 0 | 0 | 1 |
2 | 333 | BBB | 0 | 0 | 1 |
3 | 333 | BBB | 0 | 1 | 0 |
4 | 333 | BBB | 1 | 0 | 0 |
5 | 333 | BBB | 0 | 1 | 0 |
6 | 333 | BBB | 0 | 0 | 0 |
7 | 333 | BBB | 0 | 0 | 0 |
8 | 333 | BBB | 1 | 0 | 1 |
9 | 333 | BBB | 0 | 0 | 1 |
10 | 333 | BBB | 0 | 1 | 0 |
Want:
ID | County | X1mva | X2police | timesinceX1 | timesinceX2 | Time1 | Time2 | Event |
222 | ABC | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
222 | ABC | 1 | 1 | 3 | 2 | 1 | 5 | 1 |
222 | ABC | 2 | 1 | 0 | 4 | 5 | 10 | 1 |
333 | BBB | 1 | 2 | 1 | 2 | 1 | 4 | 1 |
333 | BBB | 1 | 1 | 3 | 0 | 4 | 8 | 1 |
333 | BBB | 1 | 1 | 0 | 1 | 8 | 10 | 0 |
Thanks in advance and I greatly appreciate you taking a look.
Your first want dataset can be easily achieved in a single data step:
data want;
set have (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
X1mva
X2police
time1
;
if first.id
then do;
X1mva = 0;
X2police = 0;
time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if focalevent = 1 or last.id
then do;
time2 = time;
event = focalevent;
output;
X1mva = 0;
X2police = 0;
time1 = time;
end;
drop _: time focalevent;
run;
For the second part, something like this:
if X1mva = 1
then timesinceX1 = 0;
else timesinceX1 + 1;
if X2police = 1
then timesinceX2 = 0;
else timesinceX2 + 1;
Please post your example data in a working data step with datalines, so it's easy to recreate your dataset for testing.
Thank you for your response. Greatly appreciated. I set up the code like so:
data test.agmagtest;
set test.agmag2 (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
X1mva
X2police
time1
;
if first.id
then do;
X1mva = 0;
X2police = 0;
time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if focalevent = 1 or last.id
then do;
time2 = time;
event = focalevent;
output;
X1mva = 0;
X2police = 0;
time1 = time;
end;
if X1mva = 1
then timesinceX1 = 0;
else timesinceX1 + 1;
if X2police = 1
then timesinceX2 = 0;
else timesinceX2 + 1;
drop _: time focalevent;
run;
A couple issues I am having:
(1) I don't get sums, mins. and maxes for the x1 and x2 variables
(2) The timesinceX1 and timesinceX2 variables are producing impossible values. I only want it to tell me the time since the X1 or the time since the last X2 for that particular ID.
Could you please advise? Thanks a lot.
I missed to use underlines, and you positioned the additional code in the wrong place:
data have;
input Time ID $ County $ FocalEvent X1mva X2police;
datalines;
1 222 ABC 1 0 1
2 222 ABC 0 1 0
3 222 ABC 0 0 1
4 222 ABC 0 0 0
5 222 ABC 1 0 0
6 222 ABC 0 0 1
7 222 ABC 0 0 0
8 222 ABC 0 0 0
9 222 ABC 0 1 0
10 222 ABC 1 1 0
;
data want;
set have (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
X1mva
X2police
timesinceX1
timesinceX2
time1
;
if first.id
then do;
X1mva = 0;
X2police = 0;
time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if _X1mva = 1 or first.id
then timesinceX1 = 0;
else timesinceX1 + 1;
if _X2police = 1 or first.id
then timesinceX2 = 0;
else timesinceX2 + 1;
if focalevent = 1 or last.id
then do;
time2 = time;
event = focalevent;
output;
X1mva = 0;
X2police = 0;
time1 = time;
end;
drop _: time focalevent;
run;
Kurt, thanks for your help on this. I am still having issues getting it just right. Specifically, the problem is that 'the want' spits out time1 and time2 values that aren't centered at zero. For example,
time1 time2
8 26
26 42
instead of
time1 time2
0 18
18 34
I wrote some code that spits out what I want better, but I still don't know how to insert in the timesince variables that I want. Could you advise?
proc sort data=have out=have_sorted;
by ID County Time FocalEvent;
run;
data have_group;
do i=1 until (last.FocalEvent);
set have_sorted;
by ID County FocalEvent notsorted;
if lag(FocalEvent)=1 then group+1;
if first.ID then group=1;
end;
run;
data have_group2;
set have_group;
by ID County group notsorted;
if first.County then nobs=1;
else nobs+1;
run;
proc means data=have_group2 noprint;
var FocalEvent X1mva X2police nobs;
class ID County group;
ways 3;
output out=have_sum sum= min= max= / autoname;
run;
data want;
set have_sum (keep=ID County FocalEvent_Sum X1mva_Sum X2police_Sum nobs_Min nobs_Max);
if nobs_Min ne 1 then Time1 = nobs_Min-1;
else Time1 = nobs_Min;
rename FocalEvent_Sum = Event X1mva_Sum = X1mva X2police_Sum = X2police nobs_Max = Time2;
run;
To "center on zero", save the first value of a group to a retained variable, and subtract that from all the values you calculate later.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.