BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jk2018
Calcite | Level 5

I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value.  Below is an example of the data that I would be starting with. 

 

Case_id

Country

12456789

unknown

12456789

Mexico

12456789

unknown

56789

unknown

56789

Cuba

56789

USA

 

output I would want:

Case_id

Country

12456789

Mexico

56789

Cuba

56789

USA

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20


data have;
input (Case_id	Country) (:$15.);
cards;
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA
777 Mexico
777 Mexico
777 unknown
;


data want;
 set have;
 by case_id;
 retain t;
 if first.case_id then if country='unknown' then t=1;
 else t=.;
 if t and  country ne 'unknown';
 drop t;
run;

View solution in original post

4 REPLIES 4
Reeza
Super User

That seems to just filter out all unknowns. 

Is there a case where you'd keep a different record?

 

data want;
set have;
where country ne 'unknown';
run;

@jk2018 wrote:

I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value.  Below is an example of the data that I would be starting with. 

 

Case_id

Country

12456789

unknown

12456789

Mexico

12456789

unknown

56789

unknown

56789

Cuba

56789

USA

 

output I would want:

Case_id

Country

12456789

Mexico

56789

Cuba

56789

USA



 

jk2018
Calcite | Level 5

If the first observation for each case_id does not equal unknown then I need all the observations for that case_id to be excluded from the output.  For example, if a record like this was in the dataset I would not want it in the final dataset:

Case_id

Country

777

Mexico

777

Mexico

777

unknown

novinosrin
Tourmaline | Level 20


data have;
input (Case_id	Country) (:$15.);
cards;
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA
777 Mexico
777 Mexico
777 unknown
;


data want;
 set have;
 by case_id;
 retain t;
 if first.case_id then if country='unknown' then t=1;
 else t=.;
 if t and  country ne 'unknown';
 drop t;
run;
Tom
Super User Tom
Super User

Delete the unknown observations when they are not the only observation.

data have;
  input Case_id :$10. Country :$20.;
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
9999 unknown
;

data want ;
  set have;
  by case_id ;
  if country='unknown' and not (first.case_id and last.case_id) then delete;
run;
Obs    Case_id     Country

 1     12456789    Mexico
 2     56789       Cuba
 3     56789       USA
 4     9999        unknown


sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1420 views
  • 0 likes
  • 4 in conversation