BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jk2018
Calcite | Level 5

I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value.  Below is an example of the data that I would be starting with. 

 

Case_id

Country

12456789

unknown

12456789

Mexico

12456789

unknown

56789

unknown

56789

Cuba

56789

USA

 

output I would want:

Case_id

Country

12456789

Mexico

56789

Cuba

56789

USA

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20


data have;
input (Case_id	Country) (:$15.);
cards;
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA
777 Mexico
777 Mexico
777 unknown
;


data want;
 set have;
 by case_id;
 retain t;
 if first.case_id then if country='unknown' then t=1;
 else t=.;
 if t and  country ne 'unknown';
 drop t;
run;

View solution in original post

4 REPLIES 4
Reeza
Super User

That seems to just filter out all unknowns. 

Is there a case where you'd keep a different record?

 

data want;
set have;
where country ne 'unknown';
run;

@jk2018 wrote:

I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value.  Below is an example of the data that I would be starting with. 

 

Case_id

Country

12456789

unknown

12456789

Mexico

12456789

unknown

56789

unknown

56789

Cuba

56789

USA

 

output I would want:

Case_id

Country

12456789

Mexico

56789

Cuba

56789

USA



 

jk2018
Calcite | Level 5

If the first observation for each case_id does not equal unknown then I need all the observations for that case_id to be excluded from the output.  For example, if a record like this was in the dataset I would not want it in the final dataset:

Case_id

Country

777

Mexico

777

Mexico

777

unknown

novinosrin
Tourmaline | Level 20


data have;
input (Case_id	Country) (:$15.);
cards;
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA
777 Mexico
777 Mexico
777 unknown
;


data want;
 set have;
 by case_id;
 retain t;
 if first.case_id then if country='unknown' then t=1;
 else t=.;
 if t and  country ne 'unknown';
 drop t;
run;
Tom
Super User Tom
Super User

Delete the unknown observations when they are not the only observation.

data have;
  input Case_id :$10. Country :$20.;
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
9999 unknown
;

data want ;
  set have;
  by case_id ;
  if country='unknown' and not (first.case_id and last.case_id) then delete;
run;
Obs    Case_id     Country

 1     12456789    Mexico
 2     56789       Cuba
 3     56789       USA
 4     9999        unknown


hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1329 views
  • 0 likes
  • 4 in conversation