Hello all,
I am managing health insurance data so I got many records by individuals.
I'd like only one observation to remain per person with some conditions.
Here are the conditions below:
1) has a (two kinds of) specific classification number for the first time,
2) otherwise, has the latest record
For example, here's the sample of the data:
Data play;
input id $class1 $class2 yyyymm
datalines;
1 A12 C33 200105
1 A12 C32 200109
1 A12 C33 200203
1 A13 C34 200302
2 F12 D33 200105
2 F12 D32 200109
2 F12 D33 200203
3 S23 F32 200103
3 S23 F33 200105
3 A12 F32 200204
3 A12 C33 200205
3 A13 C34 200208
4 Q12 C33 200102
4 Q12 A12 200112
4 G12 A12 200207
4 G13 C32 200309
5 D12 G33 200105
5 D12 A32 200108
5 D12 A33 200111
;
run;
With this data, I want a row remained by an individual in which $class1 or $class2 has either "A12" or "A" for the first time. It otherwise has the latest record.
And here's the expected result below:
1 A12 C33 200105 (<- "A12" contained in $class1)
2 F12 D33 200203 (<- nothing contained, so "the latest record")
3 A12 F32 200204 (<- "A12" contained in $class1)
4 Q12 A12 200112 (<- "A12" contained in $class2)
5 D12 A32 200108 (<- "A" contained in $class2)
How can I make the data like this?
I would much appreciate all your possible answers. Thank you!
Maybe this will do:
proc sql;
create view temp as
select
*,
case
when class1 = "A12" or class2 = "A12" then 1
when first(class1) = "A" or first(class2) = "A" then 2
else 3
end as order
from play
order by id, order, yyyymm;
quit;
data want;
set temp; by id;
if first.id;
drop order;
run;
Edit -- Sorry, this code doesn't always give the correct answer. See below.
Would the result be the same if the last obs was
5 D12 A12 200111
And either way, why?
Oh I just wanted to differentiate A12 from all values which contain "A" because I set "A12" as a case and other "A"s as censored.
this may work
proc sql;
/* geting latest records other than A12 or A*/
select id, class1, class2, yyyymm from play
where id not in
(select id from play
where class1 = "A12"
or class2 = "A"
group by id
having yyyymm = min(yyyymm))
group by id
having yyyymm = min(yyyymm)
union
/* geting latest records for A12 or A*/
select id, class1, class2, yyyymm from play
where class1 = "A12"
or class2 = "A"
group by id
having yyyymm = min(yyyymm);
Thank your for your reply! I tried it as you wrote and it looks like doing well but I cannot confirm that because of a resource problem. Besides, how can I save the new data set which was made throughout the process?
Hi
Here are the steps:
1. Sort the data by id and yyyymm
proc sort data = play;
by id yyyymm;
run;
2. Extract the rows containing 'A' in either class1 or class2
data playClassA;
set play;
if substr(class1,1,1)='A' | substr(class2,1,1)='A';
run;
3. Get the first record
data dsOut1;
set playClassA;
by id;
if first.id then output;
run;
4. Extract the ids containing no 'A' in either class1 or class2
data playClassABar;
merge play
dsOut1 (in=a keep=id);
by id;
run;
5. Get the last record
data dsOut2;
set playClassA;
by id;
if last.id then output;
run;
6. Combine the results
data dsOut;
set dsOut1 dsOut2;
run;
Hope this will help.
This solution is almost similar to what I was trying to. It's kind of a classical approach so I did it, but it was complicated to use. I will also try your solution. Thank you!
Something along the line of below should work:
data want(drop=_:);
set have;
by id;
retain _done;
if first.id then _done=0;
if _done=0 then
do;
if findc(class1,'A','i') or findc(class2,'A','i') then
do;
output;
_done=1;
end;
else if last.id then output;
end;
run;
Maybe this will do:
proc sql;
create view temp as
select
*,
case
when class1 = "A12" or class2 = "A12" then 1
when first(class1) = "A" or first(class2) = "A" then 2
else 3
end as order
from play
order by id, order, yyyymm;
quit;
data want;
set temp; by id;
if first.id;
drop order;
run;
Edit -- Sorry, this code doesn't always give the correct answer. See below.
I perfectly resolved the problem thanks to your solution! Thank you very much! I really love such a laconic style of your coding.
Sorry. My suggestion above does NOT always give the expected answer in the case where A12 or A is not found..This, however will:
data want;
lookfor = "---";
do until(last.id);
set play; by id;
if class1 = "A12" or class2 = "A12" then lookfor = "A12";
if lookfor ne "A12" and (first(class1) = "A" or first(class2) = "A") then lookfor = "A";
end;
do until(last.id);
set play; by id;
if class1 =: trim(lookfor) or class2 =: trim(lookfor) then do;
output;
lookfor = "+++";
end;
end;
if lookfor = "---" then output;
drop lookfor;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.