BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following DB: 

 

data DB;
input ID :$20. Age Age_class Event;
cards;
0001 45 3 1
0001 45 3 1
0002 50 4 1
0002 50 4 0
0003 55 5 1
0003 55 5 0
0003 55 5 0
0003 55 5 1
0003 55 5 1
0004 60 6 0
0004 60 6 0
0004 60 6 0
run;

The variables are the age, the age_class and a verified event. 

Is there a way to get the following? 

 

data DB1;
input ID :$20. Age Age_class Event;
cards;
0001 45 3 1
0001 45 3 0
0002 50 4 1
0002 50 4 0
0003 55 5 1
0003 55 5 0
0003 55 5 0
0003 55 5 0
0003 55 5 0
0004 60 6 2
0004 60 6 0
0004 60 6 0
run;

In other words if the event verified (i.e., =1) in the same age/age class more than once for each ID, it must be reported only once. The first ID would be ok to report the occurrence (i.e., to set Event = 1) in the final DB1. If the ID never had the event a new index = 2 should be added to finally count how many IDs never had the event. The rule of the repeated age class should be maintained as for the cases where the event verified.

 

Thank you in advance

 

Best

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data DB;
input ID :$20. Age Age_class Event;
cards;
0001 45 3 1
0001 45 3 1
0002 50 4 1
0002 50 4 0
0003 55 5 1
0003 55 5 0
0003 55 5 0
0003 55 5 1
0003 55 5 1
0004 60 6 0
0004 60 6 0
0004 60 6 0
;

proc sort data=DB out=temp;
by id age age_class desending event;
run;
data want;
 set temp;
 by id age age_class;
 _event=0;
 if first.age_class then do;
   if event=1 then _event=1;
    else if event=0 then _event=2;
 end;
 drop event;
 run;

View solution in original post

4 REPLIES 4
ballardw
Super User

It is much easier to place the EVENT=2 (not INDEX you don't show a variable named index) on the LAST observation of the age_class.

 

data db1;
   set db;
   by id  age_class;
   retain found_event;
   if first.age_class then found_event=0;
   if event=1 then do;
      if found_event=0 then found_event=1;
      else if found_event=1 then event=0;
   end;
   if last.age_class and found_event=0 then event=2;
   drop found_event;
run;

Assumes the data is sorted by ID and age_class. I have a sneaking feeling that it is possible to have more than one AGE in a given AGE_CLASS even though  you don't show any.

The BY statement in a data step creates automatic variables FIRST.<variable> and LAST.<variable> for each variable on the by statement. These are numeric 1/0 or True/False for indicating if the current observation is the first or the last of a group. So you can test for that condition.

The Retain statement creates variables that will keep their value across the data step boundary so can be used to keep track of "found yet".

The first time an event is found the kept value of Found_event is set to indicate it has been found. After that with found_event=1 for any other events within the age_class event is set to 0.

If you get to the end of the age_class and found_event is still 0 then set the Event to 2 as requested.

 

Personally I would create a different variable then use that 2 in the Event but I don't know how you intend to use this data.

 

 

 


@NewUsrStat wrote:

Hi guys, 

suppose to have the following DB: 

 

data DB;
input ID :$20. Age Age_class Event;
cards;
0001 45 3 1
0001 45 3 1
0002 50 4 1
0002 50 4 0
0003 55 5 1
0003 55 5 0
0003 55 5 0
0003 55 5 1
0003 55 5 1
0004 60 6 0
0004 60 6 0
0004 60 6 0
run;

The variables are the age, the age_class and a verified event. 

Is there a way to get the following? 

 

data DB1;
input ID :$20. Age Age_class Event;
cards;
0001 45 3 1
0001 45 3 0
0002 50 4 1
0002 50 4 0
0003 55 5 1
0003 55 5 0
0003 55 5 0
0003 55 5 0
0003 55 5 0
0004 60 6 2
0004 60 6 0
0004 60 6 0
run;

In other words if the event verified (i.e., =1) in the same age/age class more than once for each ID, it must be reported only once. The first ID would be ok to report the occurrence (i.e., to set Event = 1) in the final DB1. If the ID never had the event a new index = 2 should be added to finally count how many IDs never had the event. The rule of the repeated age class should be maintained as for the cases where the event verified.

 

Thank you in advance

 

Best


 

NewUsrStat
Lapis Lazuli | Level 10
Hi, sorry, yest it is true. Different ages are in the same age class in a range of course. The key rule is: if there are many "1" for the same age then report them once.
Ksharp
Super User
data DB;
input ID :$20. Age Age_class Event;
cards;
0001 45 3 1
0001 45 3 1
0002 50 4 1
0002 50 4 0
0003 55 5 1
0003 55 5 0
0003 55 5 0
0003 55 5 1
0003 55 5 1
0004 60 6 0
0004 60 6 0
0004 60 6 0
;

proc sort data=DB out=temp;
by id age age_class desending event;
run;
data want;
 set temp;
 by id age age_class;
 _event=0;
 if first.age_class then do;
   if event=1 then _event=1;
    else if event=0 then _event=2;
 end;
 drop event;
 run;
Kurt_Bremser
Super User

It seems that age and age_class are redundant, as they never change for any ID.

This also means that you only need to keep one observation per ID, value-of-information-wise.

data db1;
_event = 2;
do until (last.id);
  set db;
  by id;
  if event then _event = 1;
end;
event = _event;
drop _event;
run;

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1242 views
  • 1 like
  • 4 in conversation