BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jk2018
Calcite | Level 5

I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value.  Below is an example of the data that I would be starting with. 

 

Case_id

Country

12456789

unknown

12456789

Mexico

12456789

unknown

56789

unknown

56789

Cuba

56789

USA

 

output I would want:

Case_id

Country

12456789

Mexico

56789

Cuba

56789

USA

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20


data have;
input (Case_id	Country) (:$15.);
cards;
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA
777 Mexico
777 Mexico
777 unknown
;


data want;
 set have;
 by case_id;
 retain t;
 if first.case_id then if country='unknown' then t=1;
 else t=.;
 if t and  country ne 'unknown';
 drop t;
run;

View solution in original post

4 REPLIES 4
Reeza
Super User

That seems to just filter out all unknowns. 

Is there a case where you'd keep a different record?

 

data want;
set have;
where country ne 'unknown';
run;

@jk2018 wrote:

I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value.  Below is an example of the data that I would be starting with. 

 

Case_id

Country

12456789

unknown

12456789

Mexico

12456789

unknown

56789

unknown

56789

Cuba

56789

USA

 

output I would want:

Case_id

Country

12456789

Mexico

56789

Cuba

56789

USA



 

jk2018
Calcite | Level 5

If the first observation for each case_id does not equal unknown then I need all the observations for that case_id to be excluded from the output.  For example, if a record like this was in the dataset I would not want it in the final dataset:

Case_id

Country

777

Mexico

777

Mexico

777

unknown

novinosrin
Tourmaline | Level 20


data have;
input (Case_id	Country) (:$15.);
cards;
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA
777 Mexico
777 Mexico
777 unknown
;


data want;
 set have;
 by case_id;
 retain t;
 if first.case_id then if country='unknown' then t=1;
 else t=.;
 if t and  country ne 'unknown';
 drop t;
run;
Tom
Super User Tom
Super User

Delete the unknown observations when they are not the only observation.

data have;
  input Case_id :$10. Country :$20.;
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
9999 unknown
;

data want ;
  set have;
  by case_id ;
  if country='unknown' and not (first.case_id and last.case_id) then delete;
run;
Obs    Case_id     Country

 1     12456789    Mexico
 2     56789       Cuba
 3     56789       USA
 4     9999        unknown


How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1349 views
  • 0 likes
  • 4 in conversation