Hi,
The following is an example of my dataset.
data have; input ric $ price n; datalines; abc 5 . abc 6 . bcd 100 10 efg 30 20 cba 40 10 hba 10 10 hba 12 . cab 6 20 cab 7 20 run;
I would like to remove the ric group when all the n for the ric is missing. For example, abc should be removed but not hba (not all n is missing).
The output would look like the following-
data want; input ric $ price n; datalines; bcd 100 10 efg 30 20 cba 40 10 hba 10 10 hba 12 . cab 6 20 cab 7 20 run;
This is my first post. Any help is appreciated. Thanks.
As others have said, there could be RIC groups, in which all obs except the last are missing. Such an RIC should be kept, according to your criteria. But you have to read the entire RIC group to determine that outcome.
The program below does that. It then rereads each group and outputs it depending on the results of the first pass of the RIC group.
data have;
input ric $ price n;
datalines;
abc 5 .
abc 6 .
bcd 100 10
efg 30 20
cba 40 10
hba 10 10
hba 12 .
cab 6 20
cab 7 20
run;
data want (drop=_:);
do until (last.ric);
set have;
by ric notsorted;
if n^=. then _keepdummy=1;
end;
do until (last.ric);
set have;
by ric notsorted;
if _keepdummy=1 then output;
end;
run;
I actually solved it. Someone may find it useful. This is the code that I have adapted (reference community link )
data want; set have; by ric; retain flag; if first.ric then call missing(flag); if first.ric and n^=. then flag=1; if flag; drop flag; run;
That is not a general solution. It worked for your sample because non of the groups to be kept had a missing value on the first observations, just on later observations.
If you sorted
by ric descending n;
then your logic would work.
In general you need to check all of the observations in the group before deciding whether or not to delete the observations for that group.
You could use SQL code
proc sql;
create table want as
select * from have
group by ric
having not missing(max(n))
;
quit;
Or if the data is sorted a double DOW loop.
data want;
do until (last.ric);
set have;
by ric;
if not missing(n) then flag=1;
end;
do until (last.ric);
set have;
by ric;
if flag then output;
end;
run;
proc sql;
create table want as
select *
from have
where cats(ric) not in
(select cats(ric) from have where n =.);
quit;
I didn't tested may be you can for this approach.
I find it easiest to identify the values in a separate step, then merge to include or exclude the ones you want. Here is an example:
proc sort data=have;
by ric;
** Select all distinct values of ric that have at least one nonmissing value for n **;
proc sort data=have (where=(^missing(n))) out=nonmissing (keep=ric) nodupkey;
by ric;
** Restrict 'have' to only include those ric values with at least one nonmissing value for n **;
data want;
merge have (in=in1)
nonmissing (in=in2)
;
by ric;
** Technically you only need 'in2' here, but as a general practice I like to be explicit with these statements **;
if in1 & in2;
run;
As others have said, there could be RIC groups, in which all obs except the last are missing. Such an RIC should be kept, according to your criteria. But you have to read the entire RIC group to determine that outcome.
The program below does that. It then rereads each group and outputs it depending on the results of the first pass of the RIC group.
data have;
input ric $ price n;
datalines;
abc 5 .
abc 6 .
bcd 100 10
efg 30 20
cba 40 10
hba 10 10
hba 12 .
cab 6 20
cab 7 20
run;
data want (drop=_:);
do until (last.ric);
set have;
by ric notsorted;
if n^=. then _keepdummy=1;
end;
do until (last.ric);
set have;
by ric notsorted;
if _keepdummy=1 then output;
end;
run;
So this makes more sense and the code seems to work well. You are right (I have missed that part in my explanation). Thank you.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.