Can anyone recommend a way to get a dataset of claims and diagnosis, into rows with no more than 10 diagnosis per row?
The number of diagnosis per claim varies and I never know what the max number of diags on a claim will be.
The code below will create a simple example of the raw dataset I'm working with. I know it has to transposed (in some way) but I'm not sure about the best way to get it to transpose by groups of 10 diags, and be able to do that with various number of diags per claim.
Any help is appreciated, thanks.
data diags;
input claimid $12. diag $;
datalines;
097389172812 4500
097389172812 13
097389172812 4587
097389172812 458
097389172812 9678
097389172812 128
097389172812 986
854356214255 7895
854356214255 45
854356214255 9635
854356214255 574
854356214255 5913
854356214255 974
854356214255 2685
854356214255 857
854356214255 356
854356214255 851
854356214255 5214
854356214255 69
854356214255 111
854356214255 999
854356214255 5857
854356214255 674
854356214255 853
854356214255 276
854356214255 4975
854356214255 363
854356214255 661
854356214255 454
854356214255 751
854356214255 902
854356214255 024
854356214255 422
854356214255 190
006285475124 0052
006285475124 5264
006285475124 886
006285475124 55
006285475124 941
854356214255 504
854356214255 450
854356214255 5020
854356214255 7164
854356214255 251
854356214255 994
854356214255 574
854356214255 6637
854356214255 905
854356214255 843
854356214255 4674
854356214255 8012
854356214255 832
;
run;
Your data is not sorted by CLAIMID. You should either sort it or add the NOTSORTED option to the BY statement in the code below.
data want ;
if 0 then set diags;
seq+1;
array d $5 diag1-diag10;
do _n_=1 to 10 until (last.claimid);
set diags;
by claimid ;
if first.claimid then seq=1;
d(_n_)=diag;
end;
drop diag;
run;
Tom - absolutely perfect! I suspected a do-loop was involved, just don't have the experience with them.
Thanks again.
C. Sargent
Tom's solution is fine if you really need the specified structure (no more than 10 DIAGs per obs. and possibly multiple obs. per CLAIMID). It's good for a compact dump as generated by
proc print data=want ;
by claimid notsorted ;
id claimid ;
run ;
But for any kind of subsequent processing you are likely better served by either the given structure (one CLAIMID/DIAG pair per obs.) or a complete transpose (one obs. per CLAIMID). You can get SAS to discover the maximum number of DIAG values to be accommodated. Try
proc transpose data=diags out=wide prefix=diag ;
by claimid notsorted ;
var diag ;
run ;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.