Hello all,
I am managing health insurance data so I got many records by individuals.
I'd like only one observation to remain per person with some conditions.
Here are the conditions below:
1) has a (two kinds of) specific classification number for the first time,
2) otherwise, has the latest record
For example, here's the sample of the data:
Data play;
input id $class1 $class2 yyyymm
datalines;
1 A12 C33 200105
1 A12 C32 200109
1 A12 C33 200203
1 A13 C34 200302
2 F12 D33 200105
2 F12 D32 200109
2 F12 D33 200203
3 S23 F32 200103
3 S23 F33 200105
3 A12 F32 200204
3 A12 C33 200205
3 A13 C34 200208
4 Q12 C33 200102
4 Q12 A12 200112
4 G12 A12 200207
4 G13 C32 200309
5 D12 G33 200105
5 D12 A32 200108
5 D12 A33 200111
;
run;
With this data, I want a row remained by an individual in which $class1 or $class2 has either "A12" or "A" for the first time. It otherwise has the latest record.
And here's the expected result below:
1 A12 C33 200105 (<- "A12" contained in $class1)
2 F12 D33 200203 (<- nothing contained, so "the latest record")
3 A12 F32 200204 (<- "A12" contained in $class1)
4 Q12 A12 200112 (<- "A12" contained in $class2)
5 D12 A32 200108 (<- "A" contained in $class2)
How can I make the data like this?
I would much appreciate all your possible answers. Thank you!
Maybe this will do:
proc sql;
create view temp as
select
*,
case
when class1 = "A12" or class2 = "A12" then 1
when first(class1) = "A" or first(class2) = "A" then 2
else 3
end as order
from play
order by id, order, yyyymm;
quit;
data want;
set temp; by id;
if first.id;
drop order;
run;
Edit -- Sorry, this code doesn't always give the correct answer. See below.
Would the result be the same if the last obs was
5 D12 A12 200111
And either way, why?
Oh I just wanted to differentiate A12 from all values which contain "A" because I set "A12" as a case and other "A"s as censored.
this may work
proc sql;
/* geting latest records other than A12 or A*/
select id, class1, class2, yyyymm from play
where id not in
(select id from play
where class1 = "A12"
or class2 = "A"
group by id
having yyyymm = min(yyyymm))
group by id
having yyyymm = min(yyyymm)
union
/* geting latest records for A12 or A*/
select id, class1, class2, yyyymm from play
where class1 = "A12"
or class2 = "A"
group by id
having yyyymm = min(yyyymm);
Thank your for your reply! I tried it as you wrote and it looks like doing well but I cannot confirm that because of a resource problem. Besides, how can I save the new data set which was made throughout the process?
Hi
Here are the steps:
1. Sort the data by id and yyyymm
proc sort data = play;
by id yyyymm;
run;
2. Extract the rows containing 'A' in either class1 or class2
data playClassA;
set play;
if substr(class1,1,1)='A' | substr(class2,1,1)='A';
run;
3. Get the first record
data dsOut1;
set playClassA;
by id;
if first.id then output;
run;
4. Extract the ids containing no 'A' in either class1 or class2
data playClassABar;
merge play
dsOut1 (in=a keep=id);
by id;
run;
5. Get the last record
data dsOut2;
set playClassA;
by id;
if last.id then output;
run;
6. Combine the results
data dsOut;
set dsOut1 dsOut2;
run;
Hope this will help.
This solution is almost similar to what I was trying to. It's kind of a classical approach so I did it, but it was complicated to use. I will also try your solution. Thank you!
Something along the line of below should work:
data want(drop=_:);
set have;
by id;
retain _done;
if first.id then _done=0;
if _done=0 then
do;
if findc(class1,'A','i') or findc(class2,'A','i') then
do;
output;
_done=1;
end;
else if last.id then output;
end;
run;
Maybe this will do:
proc sql;
create view temp as
select
*,
case
when class1 = "A12" or class2 = "A12" then 1
when first(class1) = "A" or first(class2) = "A" then 2
else 3
end as order
from play
order by id, order, yyyymm;
quit;
data want;
set temp; by id;
if first.id;
drop order;
run;
Edit -- Sorry, this code doesn't always give the correct answer. See below.
I perfectly resolved the problem thanks to your solution! Thank you very much! I really love such a laconic style of your coding.
Sorry. My suggestion above does NOT always give the expected answer in the case where A12 or A is not found..This, however will:
data want;
lookfor = "---";
do until(last.id);
set play; by id;
if class1 = "A12" or class2 = "A12" then lookfor = "A12";
if lookfor ne "A12" and (first(class1) = "A" or first(class2) = "A") then lookfor = "A";
end;
do until(last.id);
set play; by id;
if class1 =: trim(lookfor) or class2 =: trim(lookfor) then do;
output;
lookfor = "+++";
end;
end;
if lookfor = "---" then output;
drop lookfor;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.