I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value. Below is an example of the data that I would be starting with.
Case_id | Country |
12456789 | unknown |
12456789 | Mexico |
12456789 | unknown |
56789 | unknown |
56789 | Cuba |
56789 | USA |
output I would want:
Case_id | Country |
12456789 | Mexico |
56789 | Cuba |
56789 | USA |
data have;
input (Case_id Country) (:$15.);
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
777 Mexico
777 Mexico
777 unknown
;
data want;
set have;
by case_id;
retain t;
if first.case_id then if country='unknown' then t=1;
else t=.;
if t and country ne 'unknown';
drop t;
run;
That seems to just filter out all unknowns.
Is there a case where you'd keep a different record?
data want;
set have;
where country ne 'unknown';
run;
@jk2018 wrote:
I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value. Below is an example of the data that I would be starting with.
Case_id
Country
12456789
unknown
12456789
Mexico
12456789
unknown
56789
unknown
56789
Cuba
56789
USA
output I would want:
Case_id
Country
12456789
Mexico
56789
Cuba
56789
USA
If the first observation for each case_id does not equal unknown then I need all the observations for that case_id to be excluded from the output. For example, if a record like this was in the dataset I would not want it in the final dataset:
Case_id | Country |
777 | Mexico |
777 | Mexico |
777 | unknown |
data have;
input (Case_id Country) (:$15.);
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
777 Mexico
777 Mexico
777 unknown
;
data want;
set have;
by case_id;
retain t;
if first.case_id then if country='unknown' then t=1;
else t=.;
if t and country ne 'unknown';
drop t;
run;
Delete the unknown observations when they are not the only observation.
data have;
input Case_id :$10. Country :$20.;
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
9999 unknown
;
data want ;
set have;
by case_id ;
if country='unknown' and not (first.case_id and last.case_id) then delete;
run;
Obs Case_id Country 1 12456789 Mexico 2 56789 Cuba 3 56789 USA 4 9999 unknown
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.