BookmarkSubscribeRSS Feed
CathyVI
Pyrite | Level 9

Hello,

I have a multiple claims from 2016 to 2021. I want to find the total records by (patid) in all the claims without counting any patid twice so I can find the descriptive analysis. For example, In 2016, I want patid=2, In 2017, patid=3, In 2018,patid=3, In 2019, patid=4, In 2020, patid=5.  For the Total N, I dont want to count a patid twice so Total N=5 for patid2016-2021

I am not sure how to do this in SAS. Please help. Thanks

Data Test2016;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
;
run;
Data Test2017;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
003 adt 2020-03-25 
003 tyh 2021-01-15 
;
run;
Data Test2018;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
003 adt 2020-03-25 
003 tyh 2021-01-15 
;
run;
Data Test2019;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
003 adt 2020-03-25 
003 tyh 2021-01-15 
004 wse 2020-03-25 
004 def 2021-01-15 
;
run;
Data Test2020;  
input patid $ clmid $ dtstart :YYMMDD10.;
format  dtstart YYMMDD10. ; 
datalines;
001 abc 2017-01-01
001 def 2017-02-01 
001 ghi 2017-05-01 
002 xvu 2018-01-01 
002 cde 2018-02-20 
003 adt 2020-03-25 
003 tyh 2021-01-15
004 wse 2020-03-25 
004 def 2021-01-15
005 aaa 2020-03-25 
005 bbb 2021-01-15  
;
run;

 

4 REPLIES 4
Patrick
Opal | Level 21

Below will write the first occurence of an ID to table WANT. The total of 5 is then simply the number of rows in table want.

data want;
  if _n_=1 then
    do;
      dcl hash h1();
      h1.defineKey('patid');
      h1.defineDone();
    end;

  set test2016 - test2020;

  if h1.check() ne 0 then
    do;
      output;
      h1.add();
    end;
run;

proc print data=want;
run;

Patrick_0-1731805434085.png

 

 

Ksharp
Super User

So you want desired output to be a dataset or a report ?

 


data have;
set test2016 - test2020 indsname=indsn;
dsn=indsn;
run;
proc sql;
select dsn,count(distinct patid) as n_patid from have group by dsn
union all
select 'Total',count(distinct patid) from have;
quit;

Ksharp_0-1731806166421.png

 

mkeintz
PROC Star

Along the lines of @Ksharp :

 

data need /view=need;
  set test: (keep=patid)  indsname=indsn;
  dsname=indsn;
  if patid^=lag(patid) or dsname^=lag(dsname);
run;
proc freq data=need ;
  tables dsname;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

Do you want them counted by the year that is implied by the NAME of the dataset?  

Assuming the dataset are already sorted by PATID you could do this:

data counts;
  if eof then output;
  merge
    test2016(in=flag2016)
    test2017(in=flag2017)
    test2018(in=flag2018)
    test2019(in=flag2019)
    test2020(in=flag2020)
    end=eof
  ;
  by patid;
  if first.patid;
  countall+1;
  array flags [2016:2020] flag: ;
  array counts[2016:2020] count2016-count2020;
  do year=2016 to 2020;
    counts[year]+flags[year];
  end;
  keep count: ;
run;

Results

Obs    countall    count2016    count2017    count2018    count2019    count2020

 1         5           2            3            3            4            5

Or the YEAR that is implied by the DATE variable in the dataset?  Then use SET instead of MERGE and add DTSTART to the BY statement to find their first date.

data counts;
  if eof then output;
  set test2016-test2020 end=eof;
  by patid dtstart;
  countall+first.patid;
  array counts[2016:2021] count2016-count2021 (6*0);
  retain count:;
  if first.patid or year(dtstart) ne lag(year(dtstart)) then 
    counts[year(dtstart)]+1
  ;
  keep count: ;
run;

Result

Obs    countall    count2016    count2017    count2018    count2019    count2020    count2021

 1         5           0            1            1            0            3            3

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 774 views
  • 2 likes
  • 5 in conversation