BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ANKH1
Pyrite | Level 9

In SAS I have two datasets.
Dataset1: age_event variable is the age at which each ID reported an event. One same ID can have one or more than one event. Not all IDs from the sample reported events. IDs that did not reported events were not included in this dataset.

data ds1;
input ID age_event;
datalines;
a1	32
b2	54
b2 67 c3 34 c3 45 c3 78 ; run;


Dataset2: All IDs of the sample are reported. This dataset contains one row per ID. variable "last_agerecorded" is the age at which each reported their last record for the whole study.

data main;
input ID last_agerecorded;
datalines;
a1	56
a2	67
b1	68
b2	72
b3	132
c2	121
c3	124
c4	58
d1	89
d2	95
e2	74
;
run;

We would like to create a count_event variable that counts the number of events per ID. However, if an ID reports more than one event, count_event variable will need to add the total of events up to the last "age_event" reported. See cases for ID=b2 and c3 below. Each event has one row and the "count_event" variable sums the number of event at the last age_event. If the ID did not report any event at all, then age_event should be equal to "last_agerecorded" from the main dataset and the count_event should be equal to zero.

This is the required output:

IDage_eventcount_event
a1321
a2670
b1680
b2541
b2672
b31320
c21210
c3341
c3452
c3783
c4580
d1890
d2950
e2740

 

 

How can we get the required output? Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

 

Assuming the data are sorted by ID (main) or ID/age_event (DS1), you can use the interleaving process of SET with BY, as in:

 

data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;

data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;

data want;
  set main (rename=last_agerecorded=age_event) ds1 (in=inds1);
  by id;
  if first.id then count=0;
  count+inds1;
  if inds1=1 or last.id;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
ChanceTGardener
SAS Employee
data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;

data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;
run;

proc sort data=ds1;
 by ID age_event;
run;

data ds1;
 set ds1; 
  by ID;
   if first.ID then count_event = 0;
    count_event + 1;
run;

proc sql;
 create table want as 
  select ID
  ,	     age_event
  ,	     count_event
  from ds1
   outer union corresponding
  select ID
  ,	     last_agerecorded as age_event
  ,      0 as count_event
  from main
  where ID not in(select distinct ID from ds1)
 order by ID, count_event;
quit;
mkeintz
PROC Star

 

Assuming the data are sorted by ID (main) or ID/age_event (DS1), you can use the interleaving process of SET with BY, as in:

 

data ds1;
input ID $ age_event;
datalines;
a1 32
b2 54
b2 67
c3 34
c3 45
c3 78
;
run;

data main;
input ID $ last_agerecorded;
datalines;
a1 56
a2 67
b1 68
b2 72
b3 132
c2 121
c3 124
c4 58
d1 89
d2 95
e2 74
;

data want;
  set main (rename=last_agerecorded=age_event) ds1 (in=inds1);
  by id;
  if first.id then count=0;
  count+inds1;
  if inds1=1 or last.id;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 554 views
  • 2 likes
  • 3 in conversation