Hello all,
I have data on companies observed each year 2000-2015 (16 years) where I am looking to reorganize it based on a variable. This variable is an event-based variable that takes on values from 0-1 (1 if it happened). If (and/or when) the variable takes on a 1, I would like to collapse the rows and sum the values on other relevant variables. I would also like to create different indicators based on this: (1) one variable indicates when time starts (2) one variable that indicates when the time ends, and (3) a variable that indicates 0/1 whether an event took place at all. Essentially, each observation of a case becomes a start-stop as to when the event-based variable was observed.
I Illustrate below what I have and then want.
Have:
Time | ID | County | FocalEvent | X1mva | X2police |
1 | 222 | ABC | 1 | 10 | 15 |
2 | 222 | ABC | 0 | 5 | 10 |
3 | 222 | ABC | 0 | 15 | 5 |
4 | 222 | ABC | 0 | 2 | 4 |
5 | 222 | ABC | 1 | 5 | 10 |
6 | 222 | ABC | 0 | 10 | 5 |
7 | 222 | ABC | 0 | 5 | 10 |
8 | 222 | ABC | 0 | 4 | 4 |
9 | 222 | ABC | 0 | 5 | 5 |
10 | 222 | ABC | 1 | 10 | 10 |
1 | 333 | BBB | 0 | 3 | 4 |
2 | 333 | BBB | 0 | 2 | 1 |
3 | 333 | BBB | 0 | 5 | 5 |
4 | 333 | BBB | 1 | 10 | 5 |
5 | 333 | BBB | 0 | 15 | 15 |
6 | 333 | BBB | 0 | 10 | 10 |
7 | 333 | BBB | 0 | 10 | 10 |
8 | 333 | BBB | 1 | 5 | 5 |
9 | 333 | BBB | 0 | 10 | 10 |
10 | 333 | BBB | 0 | 15 | 15 |
1 | 444 | CCC | 0 | 1 | 1 |
2 | 444 | CCC | 0 | 5 | 5 |
3 | 444 | CCC | 0 | 4 | 4 |
4 | 444 | CCC | 0 | 5 | 5 |
5 | 444 | CCC | 0 | 1 | 1 |
6 | 444 | CCC | 1 | 5 | 5 |
7 | 444 | CCC | 0 | 10 | 10 |
8 | 444 | CCC | 0 | 5 | 5 |
9 | 444 | CCC | 0 | 5 | 5 |
10 | 444 | CCC | 0 | 5 | 5 |
1 | 555 | DDD | 0 | 1 | 1 |
2 | 555 | DDD | 0 | 1 | 1 |
3 | 555 | DDD | 0 | 1 | 1 |
Want:
ID | County | X1mva | X2police | Time1 | Time2 | Event |
222 | ABC | 10 | 15 | 1 | 1 | 1 |
222 | ABC | 27 | 34 | 1 | 5 | 1 |
222 | ABC | 34 | 34 | 5 | 10 | 1 |
333 | BBB | 20 | 15 | 1 | 4 | 1 |
333 | BBB | 40 | 40 | 4 | 8 | 1 |
333 | BBB | 25 | 25 | 8 | 10 | 0 |
444 | CCC | 21 | 21 | 1 | 6 | 1 |
444 | CCC | 25 | 25 | 6 | 10 | 0 |
555 | DDD | 3 | 3 | 1 | 3 | 0 |
Thanks for your help.
R
Hi,
try this one:
data have;
input Time ID County :$ FocalEvent X1mva X2police;
cards;
1 222 ABC 1 10 15
2 222 ABC 0 5 10
3 222 ABC 0 15 5
4 222 ABC 0 2 4
5 222 ABC 1 5 10
6 222 ABC 0 10 5
7 222 ABC 0 5 10
8 222 ABC 0 4 4
9 222 ABC 0 5 5
10 222 ABC 1 10 10
1 333 BBB 0 3 4
2 333 BBB 0 2 1
3 333 BBB 0 5 5
4 333 BBB 1 10 5
5 333 BBB 0 15 15
6 333 BBB 0 10 10
7 333 BBB 0 10 10
8 333 BBB 1 5 5
9 333 BBB 0 10 10
10 333 BBB 0 15 15
1 444 CCC 0 1 1
2 444 CCC 0 5 5
3 444 CCC 0 4 4
4 444 CCC 0 5 5
5 444 CCC 0 1 1
6 444 CCC 1 5 5
7 444 CCC 0 10 10
8 444 CCC 0 5 5
9 444 CCC 0 5 5
10 444 CCC 0 5 5
1 555 DDD 0 1 1
2 555 DDD 0 1 1
3 555 DDD 0 1 1
;
run;
proc sort data = have;
by ID County Time;
run;
data want1;
set have;
by ID;
category + first.ID;
output;
category + FocalEvent;
run;
proc sql;
create table want2 as
select
ID, County,
sum(X1mva) as X1mva,
sum(X2police) as X2police,
max(min(Time)-1,1) as Time1,
max(Time) as Time2,
max(FocalEvent) as Event
from want1
group by
ID, County, category
order by
ID, County, Time1, Time2
;
quit;
proc print;
run;
Just one question, are you 100% sure that time periods should overlap (i.e. for 222 you want [1,1], [1,5],[5,10] but the sum seems to be for [1,1],[2,5][6,10])
All the best
Bart
Hi,
try this one:
data have;
input Time ID County :$ FocalEvent X1mva X2police;
cards;
1 222 ABC 1 10 15
2 222 ABC 0 5 10
3 222 ABC 0 15 5
4 222 ABC 0 2 4
5 222 ABC 1 5 10
6 222 ABC 0 10 5
7 222 ABC 0 5 10
8 222 ABC 0 4 4
9 222 ABC 0 5 5
10 222 ABC 1 10 10
1 333 BBB 0 3 4
2 333 BBB 0 2 1
3 333 BBB 0 5 5
4 333 BBB 1 10 5
5 333 BBB 0 15 15
6 333 BBB 0 10 10
7 333 BBB 0 10 10
8 333 BBB 1 5 5
9 333 BBB 0 10 10
10 333 BBB 0 15 15
1 444 CCC 0 1 1
2 444 CCC 0 5 5
3 444 CCC 0 4 4
4 444 CCC 0 5 5
5 444 CCC 0 1 1
6 444 CCC 1 5 5
7 444 CCC 0 10 10
8 444 CCC 0 5 5
9 444 CCC 0 5 5
10 444 CCC 0 5 5
1 555 DDD 0 1 1
2 555 DDD 0 1 1
3 555 DDD 0 1 1
;
run;
proc sort data = have;
by ID County Time;
run;
data want1;
set have;
by ID;
category + first.ID;
output;
category + FocalEvent;
run;
proc sql;
create table want2 as
select
ID, County,
sum(X1mva) as X1mva,
sum(X2police) as X2police,
max(min(Time)-1,1) as Time1,
max(Time) as Time2,
max(FocalEvent) as Event
from want1
group by
ID, County, category
order by
ID, County, Time1, Time2
;
quit;
proc print;
run;
Just one question, are you 100% sure that time periods should overlap (i.e. for 222 you want [1,1], [1,5],[5,10] but the sum seems to be for [1,1],[2,5][6,10])
All the best
Bart
Bart,
Thank you for your help! Going through this now and hoping to ask a follow-up. Yes, you were correct that I needed the time periods like so (i.e. for 222 you want [1,1], [1,5],[5,10] and the sums like so [1,1],[2,5][6,10]) .
So far this code works great, but how can I set the code such that time1 on the first observation of each set of observations (or IDs) code has a 0 instead of a 1? See below for reference:
Time | ID | County | FocalEvent | X1mva | X2police |
1 | 222 | ABC | 1 | 10 | 15 |
2 | 222 | ABC | 0 | 5 | 10 |
3 | 222 | ABC | 0 | 15 | 5 |
4 | 222 | ABC | 0 | 2 | 4 |
5 | 222 | ABC | 1 | 5 | 10 |
6 | 222 | ABC | 0 | 10 | 5 |
7 | 222 | ABC | 0 | 5 | 10 |
8 | 222 | ABC | 0 | 4 | 4 |
9 | 222 | ABC | 0 | 5 | 5 |
10 | 222 | ABC | 1 | 10 | 10 |
Want:
ID | County | X1mva | X2police | Time1 | Time2 | Event |
222 | ABC | 10 | 15 | 0 | 1 | 1 |
222 | ABC | 27 | 29 | 1 | 5 | 1 |
222 | ABC | 34 | 34 | 5 | 10 | 1 |
Thanks again,
R
I think that changing from
max(min(Time)-1,1) as Time1,
to
max(min(Time)-1,0) as Time1,
should do the job.
Bart
Hi @r4321
Here is a complimentary approach using PROC MEANS:
proc sort data=have out=have_sorted;
by ID County Time FocalEvent;
run;
data have_group;
do i=1 until (last.FocalEvent);
set have_sorted;
by ID County FocalEvent notsorted;
if lag(FocalEvent)=1 then group+1;
if first.ID then group=1;
end;
run;
data have_group2;
set have_group;
by ID County group notsorted;
if first.County then nobs=1;
else nobs+1;
run;
proc means data=have_group2 noprint;
var FocalEvent X1mva X2police nobs;
class ID County group;
ways 3;
output out=have_sum sum= min= max= / autoname;
run;
data want;
set have_sum (keep=ID County FocalEvent_Sum X1mva_Sum X2police_Sum nobs_Min nobs_Max);
if nobs_Min ne 1 then Time1 = nobs_Min-1;
else Time1 = nobs_Min;
rename FocalEvent_Sum = Event X1mva_Sum = X1mva X2police_Sum = X2police nobs_Max = Time2;
run;
Best,
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.