About Sazed

Sazed · ‎06-29-2022

Thanks @Reeza . Your code executed fine and the format of the outputted results is what I'm after, however I get less counts compared @andreas_lds code and spot checking against my code. I'm not sure if this syntax is scanning all diagnosis fields or just the first diagnosis field.

Sazed · ‎06-29-2022

Here is some dummy data containing ID, DIAGNOSIS, EDIAG1-5, and EXPOSURE for 1000 obs. Cheers

Sazed · ‎06-28-2022

Ok I think its from 65 because the BYTE function maps the values to A-Z in the ASCII collating sequence.

Sazed · ‎06-28-2022

Thanks Andreas. The ICD_flags data step works perfectly and executed in about 5 sec! I then just ran a proc tab by exposure to compare counts. One question if you don't mind...where does the '65' in the DO loop come from?

Sazed · ‎06-27-2022

Thanks very much for your informative suggestions. I will test out the syntax and report back. Cheers!

Sazed · ‎06-27-2022

Hi All, I want to determine the proportion of each ICD10 category that appears in a dataset of approx 100,000 records containing 25 diagnosis variables (character), and compare these between exposed and non-exposed individuals. This would allow me to determine, for example, X% of exposed people have F10 diagnosis and X% of non-exposed people have F10 diagnosis (or any other diagnosis of interest). I'm currently doing this by writing a line of code for each new variable (A to Z by 1 to 99) which equates to 26x99 = 2574 new ICD variables (see code except below). This method is proving to be very slow and take a lot of processing time! data ICD_flags; set ICD; array dc{25} diagnosis ediag1-ediag20 ecode1-ecode4; /* flag occurrences of all ICD categories */ DO i=1 to 25; if dc{i} in : ('A01') then A01=1; if dc{i} in : ('A02') then A02=1; if dc{i} in : ('A03') then A03=1; if dc{i} in : ('A04') then A04=1; if dc{i} in : ('A05') then A05=1; if dc{i} in : ('A06') then A06=1; if dc{i} in : ('A07') then A07=1; if dc{i} in : ('A08') then A08=1; if dc{i} in : ('A09') then A09=1; if dc{i} in : ('A10') then A10=1; if dc{i} in : ('A11') then A11=1; if dc{i} in : ('A12') then A12=1; if dc{i} in : ('A13') then A13=1; if dc{i} in : ('A14') then A14=1; if dc{i} in : ('A15') then A15=1; if dc{i} in : ('A16') then A16=1; ... ... if dc{i} in : ('Z97') then Z97=1; if dc{i} in : ('Z98') then Z98=1; if dc{i} in : ('Z99') then Z99=1; END; RUN; Is there a more efficient way of doing this? I thought an alternative could be creating a new two dimensional array where each cell represents a diagnosis category. For example, each row represents ICD letters A to Z, and each column represents code categories 1 to 99. But I don't know how to do this. I hope this makes sense, thanks in advance!

Online Status	Offline
Date Last Visited	‎12-09-2022 03:42 AM

Re: Need efficient method for creating new variables to count how ofte...

Re: Need efficient method for creating new variables to count how ofte...

Re: Need efficient method for creating new variables to count how ofte...

Re: Need efficient method for creating new variables to count how ofte...

Re: Need efficient method for creating new variables to count how ofte...

Need efficient method for creating new variables to count how often ea...

Re: How to set all missing values to zero for all variables?

Re: Need efficient method for creating new variables to count how ofte...

Re: Need efficient method for creating new variables to count how ofte...

Re: Need efficient method for creating new variables to count how ofte...

Re: Need efficient method for creating new variables to count how ofte...

Re: Need efficient method for creating new variables to count how ofte...

Need efficient method for creating new variables to count how often ea...