I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value. Below is an example of the data that I would be starting with.
Case_id | Country |
12456789 | unknown |
12456789 | Mexico |
12456789 | unknown |
56789 | unknown |
56789 | Cuba |
56789 | USA |
output I would want:
Case_id | Country |
12456789 | Mexico |
56789 | Cuba |
56789 | USA |
data have;
input (Case_id Country) (:$15.);
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
777 Mexico
777 Mexico
777 unknown
;
data want;
set have;
by case_id;
retain t;
if first.case_id then if country='unknown' then t=1;
else t=.;
if t and country ne 'unknown';
drop t;
run;
That seems to just filter out all unknowns.
Is there a case where you'd keep a different record?
data want;
set have;
where country ne 'unknown';
run;
@jk2018 wrote:
I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value. Below is an example of the data that I would be starting with.
Case_id
Country
12456789
unknown
12456789
Mexico
12456789
unknown
56789
unknown
56789
Cuba
56789
USA
output I would want:
Case_id
Country
12456789
Mexico
56789
Cuba
56789
USA
If the first observation for each case_id does not equal unknown then I need all the observations for that case_id to be excluded from the output. For example, if a record like this was in the dataset I would not want it in the final dataset:
Case_id | Country |
777 | Mexico |
777 | Mexico |
777 | unknown |
data have;
input (Case_id Country) (:$15.);
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
777 Mexico
777 Mexico
777 unknown
;
data want;
set have;
by case_id;
retain t;
if first.case_id then if country='unknown' then t=1;
else t=.;
if t and country ne 'unknown';
drop t;
run;
Delete the unknown observations when they are not the only observation.
data have;
input Case_id :$10. Country :$20.;
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
9999 unknown
;
data want ;
set have;
by case_id ;
if country='unknown' and not (first.case_id and last.case_id) then delete;
run;
Obs Case_id Country 1 12456789 Mexico 2 56789 Cuba 3 56789 USA 4 9999 unknown
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.