BookmarkSubscribeRSS Feed
r4321
Pyrite | Level 9

Hello all, 

I have code that allows me to reorganize my data (companies observed each year 2006-2015) based on a variable. The variable is an event-based variable that takes on values from 0-1 (1 if it happened). When (and/or when) the variable takes on a 1, it collapses the rows and sum the values on other relevant variables. It also creates  different indicators where: (1) one variable indicates when time starts (2) one variable that indicates when the time ends, and (3) a variable that indicates 0/1 whether an event took place at all. Essentially, each observation of a case becomes a start-stop as to when the event-based variable was observed.

I Illustrate below what I have: 

 

Have:

Time

ID

County

FocalEvent

X1mva

X2police

1

222

ABC

1

10

15

2

222

ABC

0

5

10

3

222

ABC

0

15

5

4

222

ABC

0

2

4

5

222

ABC

1

5

10

6

222

ABC

0

10

5

7

222

ABC

0

5

10

8

222

ABC

0

4

4

9

222

ABC

0

5

5

10

222

ABC

1

10

10

1

333

BBB

0

3

4

2

333

BBB

0

2

1

3

333

BBB

0

5

5

4

333

BBB

1

10

5

5

333

BBB

0

15

15

6

333

BBB

0

10

10

7

333

BBB

0

10

10

8

333

BBB

1

5

5

9

333

BBB

0

10

10

10

333

BBB

0

15

15

1

444

CCC

0

1

1

2

444

CCC

0

5

5

3

444

CCC

0

4

4

4

444

CCC

0

5

5

5

444

CCC

0

1

1

6

444

CCC

1

5

5

7

444

CCC

0

10

10

8

444

CCC

0

5

5

9

444

CCC

0

5

5

10

444

CCC

0

5

5

1

555

DDD

0

1

1

2

555

DDD

0

1

1

3

555

DDD

0

1

1

 

 

Want:

ID

County

X1mva

X2police

Time1

Time2

Event

222

ABC

10

15

1

1

1

222

ABC

27

29

1

5

1

222

ABC

34

34

5

10

1

333

BBB

20

15

1

4

1

333

BBB

40

40

4

8

1

333

BBB

25

25

8

10

0

444

CCC

21

21

1

6

1

444

CCC

25

25

6

10

0

555

DDD

3

3

1

3

0

 

 

Code: 

data have;
input Time ID County :$ FocalEvent X1mva X2police;
cards;
1 222 ABC 1 10 15
2 222 ABC 0 5 10
3 222 ABC 0 15 5
4 222 ABC 0 2 4
5 222 ABC 1 5 10
6 222 ABC 0 10 5
7 222 ABC 0 5 10
8 222 ABC 0 4 4
9 222 ABC 0 5 5
10 222 ABC 1 10 10
1 333 BBB 0 3 4
2 333 BBB 0 2 1
3 333 BBB 0 5 5
4 333 BBB 1 10 5
5 333 BBB 0 15 15
6 333 BBB 0 10 10
7 333 BBB 0 10 10
8 333 BBB 1 5 5
9 333 BBB 0 10 10
10 333 BBB 0 15 15
1 444 CCC 0 1 1
2 444 CCC 0 5 5
3 444 CCC 0 4 4
4 444 CCC 0 5 5
5 444 CCC 0 1 1
6 444 CCC 1 5 5
7 444 CCC 0 10 10
8 444 CCC 0 5 5
9 444 CCC 0 5 5
10 444 CCC 0 5 5
1 555 DDD 0 1 1
2 555 DDD 0 1 1
3 555 DDD 0 1 1
;
run;
proc sort data = have;
  by ID County Time;
run;

data want1;
  set have;
  by ID;
  category + first.ID;
  output;
  category + FocalEvent; 
run;

proc sql;
  create table want2 as
  select 
    ID, County, 
    sum(X1mva) as X1mva,
    sum(X2police) as X2police,
    max(min(Time)-1,1) as Time1,
    max(Time) as Time2,
    max(FocalEvent) as Event
  from want1
  group by
    ID, County, category 
  order by 
    ID, County, Time1, Time2
;
quit;
proc print;
run;

 

In practice, the variables (e.g., X1MVA, X2POLICE) that I sum have a lot of zeros on them and the time based events are spread out and more sparse. Is there a way I can add to my code such that I can count the time since the latest occurrence of the X1MVA or X2POLICE variables that I sum. For instance: 

 

Have: 

 

Time

ID

County

FocalEvent

X1mva

X2police

1

222

ABC

1

0

1

2

222

ABC

0

1

0

3

222

ABC

0

0

1

4

222

ABC

0

0

0

5

222

ABC

1

0

0

6

222

ABC

0

0

1

7

222

ABC

0

0

0

8

222

ABC

0

0

0

9

222

ABC

0

1

0

10

222

ABC

1

1

0

1

333

BBB

0

0

1

2

333

BBB

0

0

1

3

333

BBB

0

1

0

4

333

BBB

1

0

0

5

333

BBB

0

1

0

6

333

BBB

0

0

0

7

333

BBB

0

0

0

8

333

BBB

1

0

1

9

333

BBB

0

0

1

10

333

BBB

0

1

0

 

 

 

Want:

 

ID

County

X1mva

X2police

timesinceX1

timesinceX2

Time1

Time2

Event

222

ABC

0

1

0

0

1

1

1

222

ABC

1

1

3

2

1

5

1

222

ABC

2

1

0

4

5

10

1

333

BBB

1

2

1

2

1

4

1

333

BBB

1

1

3

0

4

8

1

333

BBB

1

1

0

1

8

10

0

 

 

Thanks in advance and I greatly appreciate you taking a look. 

6 REPLIES 6
Kurt_Bremser
Super User

Your first want dataset can be easily achieved in a single data step:

data want;
set have (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
  X1mva
  X2police
  time1
;
if first.id
then do;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if focalevent = 1 or last.id
then do;
  time2 = time;
  event = focalevent;
  output;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
drop _: time focalevent;
run;
Kurt_Bremser
Super User

For the second part, something like this:

if X1mva = 1
then timesinceX1 = 0;
else timesinceX1 + 1;
if X2police = 1
then timesinceX2 = 0;
else timesinceX2 + 1;

Please post your example data in a working data step with datalines, so it's easy to recreate your dataset for testing.

r4321
Pyrite | Level 9

Thank you for your response. Greatly appreciated. I set up the code like so: 

 


data test.agmagtest;
set test.agmag2 (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
  X1mva
  X2police
  time1
;
if first.id
then do;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if focalevent = 1 or last.id
then do;
  time2 = time;
  event = focalevent;
  output;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
if X1mva = 1
then timesinceX1 = 0;
else timesinceX1 + 1;
if X2police = 1
then timesinceX2 = 0;
else timesinceX2 + 1;
drop _: time focalevent;
run;

A couple issues I am having:

(1) I don't get sums, mins. and maxes for the x1 and x2 variables

(2) The timesinceX1 and timesinceX2 variables are producing impossible values. I only want it to tell me the time since the X1 or the time since the last X2 for that particular ID.

 

Could you please advise? Thanks a lot. 

Kurt_Bremser
Super User

I missed to use underlines, and you positioned the additional code in the wrong place:

data have;
input Time ID $ County $ FocalEvent X1mva X2police;
datalines;
1 222 ABC 1 0 1
2 222 ABC 0 1 0
3 222 ABC 0 0 1
4 222 ABC 0 0 0
5 222 ABC 1 0 0
6 222 ABC 0 0 1
7 222 ABC 0 0 0
8 222 ABC 0 0 0
9 222 ABC 0 1 0
10 222 ABC 1 1 0
;

data want;
set have (rename=(X1mva=_X1mva X2police=_X2police));
by id;
retain
  X1mva
  X2police
  timesinceX1
  timesinceX2
  time1
;
if first.id
then do;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
X1mva + _X1mva;
X2police + _X2police;
if _X1mva = 1 or first.id
then timesinceX1 = 0;
else timesinceX1 + 1;
if _X2police = 1 or first.id
then timesinceX2 = 0;
else timesinceX2 + 1;
if focalevent = 1 or last.id
then do;
  time2 = time;
  event = focalevent;
  output;
  X1mva = 0;
  X2police = 0;
  time1 = time;
end;
drop _: time focalevent;
run;
r4321
Pyrite | Level 9

Kurt, thanks for your help on this. I am still having issues getting it just right. Specifically, the problem is that 'the want' spits out time1 and time2 values that aren't centered at zero. For example, 

 

time1 time2

8         26

26       42

 

instead of 

 

time1 time2

0         18

18       34

 

 

I wrote some code that spits out what I want better, but I still don't know how to insert in the timesince variables that I want. Could you advise? 

 

 

 

proc sort data=have out=have_sorted;
	by ID County Time FocalEvent;
run;

data have_group;
	do i=1 until (last.FocalEvent);
		set have_sorted;
		by ID County FocalEvent notsorted;
		if lag(FocalEvent)=1 then group+1;
		if first.ID then group=1;
	end;
run;

data have_group2;
	set have_group;
	by ID County group notsorted;
	if first.County then nobs=1;
	else nobs+1;
run;

proc means data=have_group2 noprint;
	var FocalEvent X1mva X2police nobs;
	class ID County group;
	ways 3;
	output out=have_sum sum= min= max= / autoname;
run;

data want;
	set have_sum (keep=ID County FocalEvent_Sum X1mva_Sum X2police_Sum nobs_Min nobs_Max);
	if nobs_Min ne 1 then Time1 = nobs_Min-1;
	else Time1 = nobs_Min;
	rename FocalEvent_Sum = Event X1mva_Sum = X1mva X2police_Sum = X2police  nobs_Max = Time2;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1309 views
  • 2 likes
  • 2 in conversation