Hello,
I have a multiple claims from 2016 to 2021. I want to find the total records by (patid) in all the claims without counting any patid twice so I can find the descriptive analysis. For example, In 2016, I want patid=2, In 2017, patid=3, In 2018,patid=3, In 2019, patid=4, In 2020, patid=5. For the Total N, I dont want to count a patid twice so Total N=5 for patid2016-2021
I am not sure how to do this in SAS. Please help. Thanks
Data Test2016;
input patid $ clmid $ dtstart :YYMMDD10.;
format dtstart YYMMDD10. ;
datalines;
001 abc 2017-01-01
001 def 2017-02-01
001 ghi 2017-05-01
002 xvu 2018-01-01
002 cde 2018-02-20
;
run;
Data Test2017;
input patid $ clmid $ dtstart :YYMMDD10.;
format dtstart YYMMDD10. ;
datalines;
001 abc 2017-01-01
001 def 2017-02-01
001 ghi 2017-05-01
002 xvu 2018-01-01
002 cde 2018-02-20
003 adt 2020-03-25
003 tyh 2021-01-15
;
run;
Data Test2018;
input patid $ clmid $ dtstart :YYMMDD10.;
format dtstart YYMMDD10. ;
datalines;
001 abc 2017-01-01
001 def 2017-02-01
001 ghi 2017-05-01
002 xvu 2018-01-01
002 cde 2018-02-20
003 adt 2020-03-25
003 tyh 2021-01-15
;
run;
Data Test2019;
input patid $ clmid $ dtstart :YYMMDD10.;
format dtstart YYMMDD10. ;
datalines;
001 abc 2017-01-01
001 def 2017-02-01
001 ghi 2017-05-01
002 xvu 2018-01-01
002 cde 2018-02-20
003 adt 2020-03-25
003 tyh 2021-01-15
004 wse 2020-03-25
004 def 2021-01-15
;
run;
Data Test2020;
input patid $ clmid $ dtstart :YYMMDD10.;
format dtstart YYMMDD10. ;
datalines;
001 abc 2017-01-01
001 def 2017-02-01
001 ghi 2017-05-01
002 xvu 2018-01-01
002 cde 2018-02-20
003 adt 2020-03-25
003 tyh 2021-01-15
004 wse 2020-03-25
004 def 2021-01-15
005 aaa 2020-03-25
005 bbb 2021-01-15
;
run;
Below will write the first occurence of an ID to table WANT. The total of 5 is then simply the number of rows in table want.
data want;
if _n_=1 then
do;
dcl hash h1();
h1.defineKey('patid');
h1.defineDone();
end;
set test2016 - test2020;
if h1.check() ne 0 then
do;
output;
h1.add();
end;
run;
proc print data=want;
run;
So you want desired output to be a dataset or a report ?
data have;
set test2016 - test2020 indsname=indsn;
dsn=indsn;
run;
proc sql;
select dsn,count(distinct patid) as n_patid from have group by dsn
union all
select 'Total',count(distinct patid) from have;
quit;
Along the lines of @Ksharp :
data need /view=need;
set test: (keep=patid) indsname=indsn;
dsname=indsn;
if patid^=lag(patid) or dsname^=lag(dsname);
run;
proc freq data=need ;
tables dsname;
run;
Do you want them counted by the year that is implied by the NAME of the dataset?
Assuming the dataset are already sorted by PATID you could do this:
data counts;
if eof then output;
merge
test2016(in=flag2016)
test2017(in=flag2017)
test2018(in=flag2018)
test2019(in=flag2019)
test2020(in=flag2020)
end=eof
;
by patid;
if first.patid;
countall+1;
array flags [2016:2020] flag: ;
array counts[2016:2020] count2016-count2020;
do year=2016 to 2020;
counts[year]+flags[year];
end;
keep count: ;
run;
Results
Obs countall count2016 count2017 count2018 count2019 count2020 1 5 2 3 3 4 5
Or the YEAR that is implied by the DATE variable in the dataset? Then use SET instead of MERGE and add DTSTART to the BY statement to find their first date.
data counts;
if eof then output;
set test2016-test2020 end=eof;
by patid dtstart;
countall+first.patid;
array counts[2016:2021] count2016-count2021 (6*0);
retain count:;
if first.patid or year(dtstart) ne lag(year(dtstart)) then
counts[year(dtstart)]+1
;
keep count: ;
run;
Result
Obs countall count2016 count2017 count2018 count2019 count2020 count2021 1 5 0 1 1 0 3 3
Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.
Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.