Hello all,
I have data on companies observed each year 2000-2015 (16 years) where I am looking to reorganize it based on a variable. This variable is an event-based variable that takes on values from 0-1 (1 if it happened). If (and/or when) the variable takes on a 1, I would like to collapse the rows and sum the values on other relevant variables. I would also like to create different indicators based on this: (1) one variable indicates when time starts (2) one variable that indicates when the time ends, and (3) a variable that indicates 0/1 whether an event took place at all. Essentially, each observation of a case becomes a start-stop as to when the event-based variable was observed.
I Illustrate below what I have and then want.
Have:
Time | ID | County | FocalEvent | X1mva | X2police |
1 | 222 | ABC | 1 | 10 | 15 |
2 | 222 | ABC | 0 | 5 | 10 |
3 | 222 | ABC | 0 | 15 | 5 |
4 | 222 | ABC | 0 | 2 | 4 |
5 | 222 | ABC | 1 | 5 | 10 |
6 | 222 | ABC | 0 | 10 | 5 |
7 | 222 | ABC | 0 | 5 | 10 |
8 | 222 | ABC | 0 | 4 | 4 |
9 | 222 | ABC | 0 | 5 | 5 |
10 | 222 | ABC | 1 | 10 | 10 |
1 | 333 | BBB | 0 | 3 | 4 |
2 | 333 | BBB | 0 | 2 | 1 |
3 | 333 | BBB | 0 | 5 | 5 |
4 | 333 | BBB | 1 | 10 | 5 |
5 | 333 | BBB | 0 | 15 | 15 |
6 | 333 | BBB | 0 | 10 | 10 |
7 | 333 | BBB | 0 | 10 | 10 |
8 | 333 | BBB | 1 | 5 | 5 |
9 | 333 | BBB | 0 | 10 | 10 |
10 | 333 | BBB | 0 | 15 | 15 |
1 | 444 | CCC | 0 | 1 | 1 |
2 | 444 | CCC | 0 | 5 | 5 |
3 | 444 | CCC | 0 | 4 | 4 |
4 | 444 | CCC | 0 | 5 | 5 |
5 | 444 | CCC | 0 | 1 | 1 |
6 | 444 | CCC | 1 | 5 | 5 |
7 | 444 | CCC | 0 | 10 | 10 |
8 | 444 | CCC | 0 | 5 | 5 |
9 | 444 | CCC | 0 | 5 | 5 |
10 | 444 | CCC | 0 | 5 | 5 |
1 | 555 | DDD | 0 | 1 | 1 |
2 | 555 | DDD | 0 | 1 | 1 |
3 | 555 | DDD | 0 | 1 | 1 |
Want:
ID | County | X1mva | X2police | Time1 | Time2 | Event |
222 | ABC | 10 | 15 | 1 | 1 | 1 |
222 | ABC | 27 | 34 | 1 | 5 | 1 |
222 | ABC | 34 | 34 | 5 | 10 | 1 |
333 | BBB | 20 | 15 | 1 | 4 | 1 |
333 | BBB | 40 | 40 | 4 | 8 | 1 |
333 | BBB | 25 | 25 | 8 | 10 | 0 |
444 | CCC | 21 | 21 | 1 | 6 | 1 |
444 | CCC | 25 | 25 | 6 | 10 | 0 |
555 | DDD | 3 | 3 | 1 | 3 | 0 |
Thanks for your help.
R
Hi,
try this one:
data have;
input Time ID County :$ FocalEvent X1mva X2police;
cards;
1 222 ABC 1 10 15
2 222 ABC 0 5 10
3 222 ABC 0 15 5
4 222 ABC 0 2 4
5 222 ABC 1 5 10
6 222 ABC 0 10 5
7 222 ABC 0 5 10
8 222 ABC 0 4 4
9 222 ABC 0 5 5
10 222 ABC 1 10 10
1 333 BBB 0 3 4
2 333 BBB 0 2 1
3 333 BBB 0 5 5
4 333 BBB 1 10 5
5 333 BBB 0 15 15
6 333 BBB 0 10 10
7 333 BBB 0 10 10
8 333 BBB 1 5 5
9 333 BBB 0 10 10
10 333 BBB 0 15 15
1 444 CCC 0 1 1
2 444 CCC 0 5 5
3 444 CCC 0 4 4
4 444 CCC 0 5 5
5 444 CCC 0 1 1
6 444 CCC 1 5 5
7 444 CCC 0 10 10
8 444 CCC 0 5 5
9 444 CCC 0 5 5
10 444 CCC 0 5 5
1 555 DDD 0 1 1
2 555 DDD 0 1 1
3 555 DDD 0 1 1
;
run;
proc sort data = have;
by ID County Time;
run;
data want1;
set have;
by ID;
category + first.ID;
output;
category + FocalEvent;
run;
proc sql;
create table want2 as
select
ID, County,
sum(X1mva) as X1mva,
sum(X2police) as X2police,
max(min(Time)-1,1) as Time1,
max(Time) as Time2,
max(FocalEvent) as Event
from want1
group by
ID, County, category
order by
ID, County, Time1, Time2
;
quit;
proc print;
run;
Just one question, are you 100% sure that time periods should overlap (i.e. for 222 you want [1,1], [1,5],[5,10] but the sum seems to be for [1,1],[2,5][6,10])
All the best
Bart
Hi,
try this one:
data have;
input Time ID County :$ FocalEvent X1mva X2police;
cards;
1 222 ABC 1 10 15
2 222 ABC 0 5 10
3 222 ABC 0 15 5
4 222 ABC 0 2 4
5 222 ABC 1 5 10
6 222 ABC 0 10 5
7 222 ABC 0 5 10
8 222 ABC 0 4 4
9 222 ABC 0 5 5
10 222 ABC 1 10 10
1 333 BBB 0 3 4
2 333 BBB 0 2 1
3 333 BBB 0 5 5
4 333 BBB 1 10 5
5 333 BBB 0 15 15
6 333 BBB 0 10 10
7 333 BBB 0 10 10
8 333 BBB 1 5 5
9 333 BBB 0 10 10
10 333 BBB 0 15 15
1 444 CCC 0 1 1
2 444 CCC 0 5 5
3 444 CCC 0 4 4
4 444 CCC 0 5 5
5 444 CCC 0 1 1
6 444 CCC 1 5 5
7 444 CCC 0 10 10
8 444 CCC 0 5 5
9 444 CCC 0 5 5
10 444 CCC 0 5 5
1 555 DDD 0 1 1
2 555 DDD 0 1 1
3 555 DDD 0 1 1
;
run;
proc sort data = have;
by ID County Time;
run;
data want1;
set have;
by ID;
category + first.ID;
output;
category + FocalEvent;
run;
proc sql;
create table want2 as
select
ID, County,
sum(X1mva) as X1mva,
sum(X2police) as X2police,
max(min(Time)-1,1) as Time1,
max(Time) as Time2,
max(FocalEvent) as Event
from want1
group by
ID, County, category
order by
ID, County, Time1, Time2
;
quit;
proc print;
run;
Just one question, are you 100% sure that time periods should overlap (i.e. for 222 you want [1,1], [1,5],[5,10] but the sum seems to be for [1,1],[2,5][6,10])
All the best
Bart
Bart,
Thank you for your help! Going through this now and hoping to ask a follow-up. Yes, you were correct that I needed the time periods like so (i.e. for 222 you want [1,1], [1,5],[5,10] and the sums like so [1,1],[2,5][6,10]) .
So far this code works great, but how can I set the code such that time1 on the first observation of each set of observations (or IDs) code has a 0 instead of a 1? See below for reference:
Time | ID | County | FocalEvent | X1mva | X2police |
1 | 222 | ABC | 1 | 10 | 15 |
2 | 222 | ABC | 0 | 5 | 10 |
3 | 222 | ABC | 0 | 15 | 5 |
4 | 222 | ABC | 0 | 2 | 4 |
5 | 222 | ABC | 1 | 5 | 10 |
6 | 222 | ABC | 0 | 10 | 5 |
7 | 222 | ABC | 0 | 5 | 10 |
8 | 222 | ABC | 0 | 4 | 4 |
9 | 222 | ABC | 0 | 5 | 5 |
10 | 222 | ABC | 1 | 10 | 10 |
Want:
ID | County | X1mva | X2police | Time1 | Time2 | Event |
222 | ABC | 10 | 15 | 0 | 1 | 1 |
222 | ABC | 27 | 29 | 1 | 5 | 1 |
222 | ABC | 34 | 34 | 5 | 10 | 1 |
Thanks again,
R
I think that changing from
max(min(Time)-1,1) as Time1,
to
max(min(Time)-1,0) as Time1,
should do the job.
Bart
Hi @r4321
Here is a complimentary approach using PROC MEANS:
proc sort data=have out=have_sorted;
by ID County Time FocalEvent;
run;
data have_group;
do i=1 until (last.FocalEvent);
set have_sorted;
by ID County FocalEvent notsorted;
if lag(FocalEvent)=1 then group+1;
if first.ID then group=1;
end;
run;
data have_group2;
set have_group;
by ID County group notsorted;
if first.County then nobs=1;
else nobs+1;
run;
proc means data=have_group2 noprint;
var FocalEvent X1mva X2police nobs;
class ID County group;
ways 3;
output out=have_sum sum= min= max= / autoname;
run;
data want;
set have_sum (keep=ID County FocalEvent_Sum X1mva_Sum X2police_Sum nobs_Min nobs_Max);
if nobs_Min ne 1 then Time1 = nobs_Min-1;
else Time1 = nobs_Min;
rename FocalEvent_Sum = Event X1mva_Sum = X1mva X2police_Sum = X2police nobs_Max = Time2;
run;
Best,
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.