Solved: Re: Find when first row in a group doesn't match other rows

jk2018 · Posted 09-15-2021 12:33 PM

I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value. Below is an example of the data that I would be starting with.

Case_id	Country
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA

output I would want:

Case_id	Country
12456789	Mexico
56789	Cuba
56789	USA

novinosrin · Posted 09-15-2021 02:50 PM



data have;
input (Case_id	Country) (:$15.);
cards;
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA
777 Mexico
777 Mexico
777 unknown
;


data want;
 set have;
 by case_id;
 retain t;
 if first.case_id then if country='unknown' then t=1;
 else t=.;
 if t and  country ne 'unknown';
 drop t;
run;

View solution in original post

Reeza · Posted 09-15-2021 12:36 PM

That seems to just filter out all unknowns.

Is there a case where you'd keep a different record?

data want;
set have;
where country ne 'unknown';
run;

@jk2018 wrote:

I need to find when the first row within a group, grouping by case_id, has a value of unknown for the variable country but any of the other rows within a group have a known value. Below is an example of the data that I would be starting with.

Case_id

Country

12456789

unknown

12456789

Mexico

12456789

unknown

56789

unknown

56789

Cuba

56789

USA

output I would want:

Case_id

Country

12456789

Mexico

56789

Cuba

56789

USA

jk2018 · Posted 09-15-2021 02:19 PM

If the first observation for each case_id does not equal unknown then I need all the observations for that case_id to be excluded from the output. For example, if a record like this was in the dataset I would not want it in the final dataset:

Case_id	Country
777	Mexico
777	Mexico
777	unknown

novinosrin · Posted 09-15-2021 02:50 PM



data have;
input (Case_id	Country) (:$15.);
cards;
12456789	unknown
12456789	Mexico
12456789	unknown
56789	unknown
56789	Cuba
56789	USA
777 Mexico
777 Mexico
777 unknown
;


data want;
 set have;
 by case_id;
 retain t;
 if first.case_id then if country='unknown' then t=1;
 else t=.;
 if t and  country ne 'unknown';
 drop t;
run;

Tom · Posted 09-15-2021 03:31 PM

Delete the unknown observations when they are not the only observation.

data have;
  input Case_id :$10. Country :$20.;
cards;
12456789 unknown
12456789 Mexico
12456789 unknown
56789 unknown
56789 Cuba
56789 USA
9999 unknown
;

data want ;
  set have;
  by case_id ;
  if country='unknown' and not (first.case_id and last.case_id) then delete;
run;

Obs    Case_id     Country

 1     12456789    Mexico
 2     56789       Cuba
 3     56789       USA
 4     9999        unknown

Find when first row in a group doesn't match other rows

Re: Find when first row in a group doesn't match other rows

Re: Find when first row in a group doesn't match other rows

Re: Find when first row in a group doesn't match other rows

Re: Find when first row in a group doesn't match other rows

Re: Find when first row in a group doesn't match other rows

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!