finding total count in multiple claims record

CathyVI · Posted 11-16-2024 04:53 PM

Hello,

I have a multiple claims from 2016 to 2021. I want to find the total records by (patid) in all the claims without counting any patid twice so I can find the descriptive analysis. For example, In 2016, I want patid=2, In 2017, patid=3, In 2018,patid=3, In 2019, patid=4, In 2020, patid=5. For the Total N, I dont want to count a patid twice so Total N=5 for patid2016-2021

I am not sure how to do this in SAS. Please help. Thanks

Data Test2016;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
;
run;
Data Test2017;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
003 adt 2020-03-25 
003 tyh 2021-01-15 
;
run;
Data Test2018;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
003 adt 2020-03-25 
003 tyh 2021-01-15 
;
run;
Data Test2019;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
003 adt 2020-03-25 
003 tyh 2021-01-15 
004 wse 2020-03-25 
004 def 2021-01-15 
;
run;
Data Test2020;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
003 adt 2020-03-25 
003 tyh 2021-01-15
004 wse 2020-03-25 
004 def 2021-01-15
005 aaa 2020-03-25 
005 bbb 2021-01-15  
;
run;

Patrick · Posted 11-16-2024 08:04 PM

Below will write the first occurence of an ID to table WANT. The total of 5 is then simply the number of rows in table want.

data want;
  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('patid');
      h1.defineDone();
    end;

  set test2016 - test2020;

  if h1.check() ne 0 then
    do;
      output;
      h1.add();
    end;
run;

proc print data=want;
run;

Ksharp · Posted 11-16-2024 08:16 PM

So you want desired output to be a dataset or a report ?


data have;
set test2016 - test2020 indsname=indsn;
dsn=indsn;
run;
proc sql;
select dsn,count(distinct patid) as n_patid from have group by dsn
union all
select 'Total',count(distinct patid) from have;
quit;

mkeintz · Posted 11-21-2024 08:52 PM

Along the lines of @Ksharp :

data need /view=need;
  set test: (keep=patid)  indsname=indsn;
  dsname=indsn;
  if patid^=lag(patid) or dsname^=lag(dsname);
run;
proc freq data=need ;
  tables dsname;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Tom · Posted 11-22-2024 08:00 PM

Do you want them counted by the year that is implied by the NAME of the dataset?

Assuming the dataset are already sorted by PATID you could do this:

data counts;
  if eof then output;
  merge
    test2016(in=flag2016)
    test2017(in=flag2017)
    test2018(in=flag2018)
    test2019(in=flag2019)
    test2020(in=flag2020)
    end=eof
  ;
  by patid;
  if first.patid;
  countall+1;
  array flags [2016:2020] flag: ;
  array counts[2016:2020] count2016-count2020;
  do year=2016 to 2020;
    counts[year]+flags[year];
  end;
  keep count: ;
run;

Results

Obs    countall    count2016    count2017    count2018    count2019    count2020

 1         5           2            3            3            4            5

Or the YEAR that is implied by the DATE variable in the dataset? Then use SET instead of MERGE and add DTSTART to the BY statement to find their first date.

data counts;
  if eof then output;
  set test2016-test2020 end=eof;
  by patid dtstart;
  countall+first.patid;
  array counts[2016:2021] count2016-count2021 (6*0);
  retain count:;
  if first.patid or year(dtstart) ne lag(year(dtstart)) then 
    counts[year(dtstart)]+1
  ;
  keep count: ;
run;

Result

Obs    countall    count2016    count2017    count2018    count2019    count2020    count2021

 1         5           0            1            1            0            3            3

finding total count in multiple claims record

Re: finding total count in multiple claims record

Re: finding total count in multiple claims record

Re: finding total count in multiple claims record

Re: finding total count in multiple claims record

finding total count in multiple claims record

Re: finding total count in multiple claims record

Re: finding total count in multiple claims record

Re: finding total count in multiple claims record

Re: finding total count in multiple claims record

Registration is open

SAS Training: Just a Click Away