BookmarkSubscribeRSS Feed
r4321
Pyrite | Level 9

Hello all, 

I have code that allows me to reorganize my data (companies observed each year 2006-2015) based on a variable. The variable is an event-based variable that takes on values from 0-1 (1 if it happened). When (and/or when) the variable takes on a 1, it collapses the rows and sum the values on other relevant variables. It also creates  different indicators where: (1) one variable indicates when time starts (2) one variable that indicates when the time ends, and (3) a variable that indicates 0/1 whether an event took place at all. Essentially, each observation of a case becomes a start-stop as to when the event-based variable was observed.

I Illustrate below what I have: 

 

Have:

Time

ID

County

FocalEvent

X1mva

X2police

1

222

ABC

1

10

15

2

222

ABC

0

5

10

3

222

ABC

0

15

5

4

222

ABC

0

2

4

5

222

ABC

1

5

10

6

222

ABC

0

10

5

7

222

ABC

0

5

10

8

222

ABC

0

4

4

9

222

ABC

0

5

5

10

222

ABC

1

10

10

1

333

BBB

0

3

4

2

333

BBB

0

2

1

3

333

BBB

0

5

5

4

333

BBB

1

10

5

5

333

BBB

0

15

15

6

333

BBB

0

10

10

7

333

BBB

0

10

10

8

333

BBB

1

5

5

9

333

BBB

0

10

10

10

333

BBB

0

15

15

1

444

CCC

0

1

1

2

444

CCC

0

5

5

3

444

CCC

0

4

4

4

444

CCC

0

5

5

5

444

CCC

0

1

1

6

444

CCC

1

5

5

7

444

CCC

0

10

10

8

444

CCC

0

5

5

9

444

CCC

0

5

5

10

444

CCC

0

5

5

1

555

DDD

0

1

1

2

555

DDD

0

1

1

3

555

DDD

0

1

1

 

 

Want:

ID

County

X1mva

X2police

Time1

Time2

Event

222

ABC

10

15

1

1

1

222

ABC

27

29

1

5

1

222

ABC

34

34

5

10

1

333

BBB

20

15

1

4

1

333

BBB

40

40

4

8

1

333

BBB

25

25

8

10

0

444

CCC

21

21

1

6

1

444

CCC

25

25

6

10

0

555

DDD

3

3

1

3

0

 

 

Code: 

data have;
input Time ID County :$ FocalEvent X1mva X2police;
cards;
1 222 ABC 1 10 15
2 222 ABC 0 5 10
3 222 ABC 0 15 5
4 222 ABC 0 2 4
5 222 ABC 1 5 10
6 222 ABC 0 10 5
7 222 ABC 0 5 10
8 222 ABC 0 4 4
9 222 ABC 0 5 5
10 222 ABC 1 10 10
1 333 BBB 0 3 4
2 333 BBB 0 2 1
3 333 BBB 0 5 5
4 333 BBB 1 10 5
5 333 BBB 0 15 15
6 333 BBB 0 10 10
7 333 BBB 0 10 10
8 333 BBB 1 5 5
9 333 BBB 0 10 10
10 333 BBB 0 15 15
1 444 CCC 0 1 1
2 444 CCC 0 5 5
3 444 CCC 0 4 4
4 444 CCC 0 5 5
5 444 CCC 0 1 1
6 444 CCC 1 5 5
7 444 CCC 0 10 10
8 444 CCC 0 5 5
9 444 CCC 0 5 5
10 444 CCC 0 5 5
1 555 DDD 0 1 1
2 555 DDD 0 1 1
3 555 DDD 0 1 1
;
run;
proc sort data = have;
  by ID County Time;
run;

data want1;
  set have;
  by ID;
  category + first.ID;
  output;
  category + FocalEvent; 
run;

proc sql;
  create table want2 as
  select 
    ID, County, 
    sum(X1mva) as X1mva,
    sum(X2police) as X2police,
    max(min(Time)-1,1) as Time1,
    max(Time) as Time2,
    max(FocalEvent) as Event
  from want1
  group by
    ID, County, category 
  order by 
    ID, County, Time1, Time2
;
quit;
proc print;
run;

 

In practice, the variables (e.g., X1MVA, X2POLICE) that I sum have a lot of zeros on them and the time based events are spread out and more sparse. Is there a way I can add to my code such that I can count the time since the latest occurrence of the X1MVA or X2POLICE variables that I sum. For instance: 

 

Have: 

 

Time

ID

County

FocalEvent

X1mva

X2police

1

222

ABC

1

0

1

2

222

ABC

0

1

0

3

222

ABC

0

0

1

4

222

ABC

0

0

0

5

222

ABC

1

0

0

6

222

ABC

0

0

1

7

222

ABC

0

0

0

8

222

ABC

0

0

0

9

222

ABC

0

1

0

10

222

ABC

1

1

0

1

333

BBB

0

0

1

2

333

BBB

0

0

1

3

333

BBB

0

1

0

4

333

BBB

1

0

0

5

333

BBB

0

1

0

6

333

BBB

0

0

0

7

333

BBB

0

0

0

8

333

BBB

1

0

1

9

333

BBB

0

0

1

10

333

BBB

0

1

0

 

 

 

Want:

 

ID

County

X1mva

X2police

timesinceX1

timesinceX2

Time1

Time2

Event

222

ABC

0

1

0

0

1

1

1

222

ABC

1

1

3

2

1

5

1

222

ABC

2

1

0

4

5

10

1

333

BBB

1

2

1

2

1

4

1

333

BBB

1

1

3

0

4

8

1

333

BBB

1

1

0

1

8

10

0

 

 

Thanks in advance and I greatly appreciate you taking a look. 

6 REPLIES 6
Kurt_Bremser
Super User

Your first want dataset can be easily achieved in a single data step:

data want;
set have (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
  X1mva
  X2police
  time1
;
if first.id
then do;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if focalevent = 1 or last.id
then do;
  time2 = time;
  event = focalevent;
  output;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
drop _: time focalevent;
run;
Kurt_Bremser
Super User

For the second part, something like this:

if X1mva = 1
then timesinceX1 = 0;
else timesinceX1 + 1;
if X2police = 1
then timesinceX2 = 0;
else timesinceX2 + 1;

Please post your example data in a working data step with datalines, so it's easy to recreate your dataset for testing.

r4321
Pyrite | Level 9

Thank you for your response. Greatly appreciated. I set up the code like so: 

 


data test.agmagtest;
set test.agmag2 (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
  X1mva
  X2police
  time1
;
if first.id
then do;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if focalevent = 1 or last.id
then do;
  time2 = time;
  event = focalevent;
  output;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
if X1mva = 1
then timesinceX1 = 0;
else timesinceX1 + 1;
if X2police = 1
then timesinceX2 = 0;
else timesinceX2 + 1;
drop _: time focalevent;
run;

A couple issues I am having:

(1) I don't get sums, mins. and maxes for the x1 and x2 variables

(2) The timesinceX1 and timesinceX2 variables are producing impossible values. I only want it to tell me the time since the X1 or the time since the last X2 for that particular ID.

 

Could you please advise? Thanks a lot. 

Kurt_Bremser
Super User

I missed to use underlines, and you positioned the additional code in the wrong place:

data have;
input Time ID $ County $ FocalEvent X1mva X2police;
datalines;
1 222 ABC 1 0 1
2 222 ABC 0 1 0
3 222 ABC 0 0 1
4 222 ABC 0 0 0
5 222 ABC 1 0 0
6 222 ABC 0 0 1
7 222 ABC 0 0 0
8 222 ABC 0 0 0
9 222 ABC 0 1 0
10 222 ABC 1 1 0
;

data want;
set have (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
  X1mva
  X2police
  timesinceX1
  timesinceX2
  time1
;
if first.id
then do;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if _X1mva = 1 or first.id
then timesinceX1 = 0;
else timesinceX1 + 1;
if _X2police = 1 or first.id
then timesinceX2 = 0;
else timesinceX2 + 1;
if focalevent = 1 or last.id
then do;
  time2 = time;
  event = focalevent;
  output;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
drop _: time focalevent;
run;
r4321
Pyrite | Level 9

Kurt, thanks for your help on this. I am still having issues getting it just right. Specifically, the problem is that 'the want' spits out time1 and time2 values that aren't centered at zero. For example, 

 

time1 time2

8         26

26       42

 

instead of 

 

time1 time2

0         18

18       34

 

 

I wrote some code that spits out what I want better, but I still don't know how to insert in the timesince variables that I want. Could you advise? 

 

 

 

proc sort data=have out=have_sorted;
	by ID County Time FocalEvent;
run;

data have_group;
	do i=1 until (last.FocalEvent);
		set have_sorted;
		by ID County FocalEvent notsorted;
		if lag(FocalEvent)=1 then group+1;
		if first.ID then group=1;
	end;
run;

data have_group2;
	set have_group;
	by ID County group notsorted;
	if first.County then nobs=1;
	else nobs+1;
run;

proc means data=have_group2 noprint;
	var FocalEvent X1mva X2police nobs;
	class ID County group;
	ways 3;
	output out=have_sum sum= min= max= / autoname;
run;

data want;
	set have_sum (keep=ID County FocalEvent_Sum X1mva_Sum X2police_Sum nobs_Min nobs_Max);
	if nobs_Min ne 1 then Time1 = nobs_Min-1;
	else Time1 = nobs_Min;
	rename FocalEvent_Sum = Event X1mva_Sum = X1mva X2police_Sum = X2police  nobs_Max = Time2;
run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1423 views
  • 2 likes
  • 2 in conversation