BookmarkSubscribeRSS Feed
exbalterate
Calcite | Level 5
Hi all,
I am trying to do a couple of things with the sample below and have search online a lot for it but it
isn't working. Will it be possible to do using data step?

1. I want to find the top two diagnosis(diag) with a table showing the count from the entire data set
per patient or simply from diag1 if not possible. Something like AA|3, BB|4.
Instead of a full list of all the diagnosis which will be thousands in a large data set.
Tried various freq and count commands but it doesn't work.

2. Couldn't figure this out but is it possible to find associated diagnosis per patient?
For example patients diagnosed with 'AA' also have 'JI' and patients with 'BB' also have 'GE'.

3. In a large data set will it be possible to link agegroups to diag1 , to see which diagnosis is common to each agegroup?

Thank you



data sample;
input MYDOB ADMIN$ SEX$ ID DIAG1$ DIAG2$ DIAG3$ DIAG4$ ;
DATALINES;
111924 KL M 1 AA JI GE .
121926 OH M 2 BB AA . GE
121982 KL F 3 BB . . AA
101980 KL M 4 AA . . .
111979 OH F 5 JI . AA .
101959 KL M 6 GE . . .
121999 OH F 7 BB . . .
102008 KL M 8 AA JI . .
112001 KL F 9 JI . . AA
102016 OH M 10 BB . GE .
RUN;


data age2;
set SAMPLE;
newdate1=mdy(12,31,2016);
format newdate1 mmyyn.;
run;


/* Calculate age*/
data cleaned (keep= ADMIN SEX ID DIAG1 diag2 diag3 diag4 AGE agegroup);
retain ID AGE ADMIN SEX DIAG1 DIAG2 DIAG3 DIAG4 agegroup; /* rearrange*/
set age2;
OPTIONS YEARCUTOFF=1917;
a = put(MYDOB,6.);
NEWDOB = input (a, mmddyy6.);
format NEWDOB MMYYN.;
age = INT(YRDIF(NEWDOB, newdate1,'ACTUAL'));
/*create age group*/
if age<=18 then agegroup ='Young';
else if 19<=age<64 then agegroup='Intemediate';
else if age >64 then agegroup= 'Retired';
run;

 

4 REPLIES 4
Reeza
Super User

All of that's possible. What exactly is your question here though? 

 

Spoiler

@exbalterate wrote:
Hi all,
I am trying to do a couple of things with the sample below and have search online a lot for it but it
isn't working. Will it be possible to do using data step?

1. I want to find the top two diagnosis(diag) with a table showing the count from the entire data set
per patient or simply from diag1 if not possible. Something like AA|3, BB|4.
Instead of a full list of all the diagnosis which will be thousands in a large data set.
Tried various freq and count commands but it doesn't work.

2. Couldn't figure this out but is it possible to find associated diagnosis per patient?
For example patients diagnosed with 'AA' also have 'JI' and patients with 'BB' also have 'GE'.

3. In a large data set will it be possible to link agegroups to diag1 , to see which diagnosis is common to each agegroup?

Thank you



data sample;
input MYDOB ADMIN$ SEX$ ID DIAG1$ DIAG2$ DIAG3$ DIAG4$ ;
DATALINES;
111924 KL M 1 AA JI GE .
121926 OH M 2 BB AA . GE
121982 KL F 3 BB . . AA
101980 KL M 4 AA . . .
111979 OH F 5 JI . AA .
101959 KL M 6 GE . . .
121999 OH F 7 BB . . .
102008 KL M 8 AA JI . .
112001 KL F 9 JI . . AA
102016 OH M 10 BB . GE .
RUN;


data age2;
set SAMPLE;
newdate1=mdy(12,31,2016);
format newdate1 mmyyn.;
run;


/* Calculate age*/
data cleaned (keep= ADMIN SEX ID DIAG1 diag2 diag3 diag4 AGE agegroup);
retain ID AGE ADMIN SEX DIAG1 DIAG2 DIAG3 DIAG4 agegroup; /* rearrange*/
set age2;
OPTIONS YEARCUTOFF=1917;
a = put(MYDOB,6.);
NEWDOB = input (a, mmddyy6.);
format NEWDOB MMYYN.;
age = INT(YRDIF(NEWDOB, newdate1,'ACTUAL'));
/*create age group*/
if age<=18 then agegroup ='Young';
else if 19<=age<64 then agegroup='Intemediate';
else if age >64 then agegroup= 'Retired';
run;

 


exbalterate
Calcite | Level 5

How to do it? I cannot figure it out

 

Thanks

Reeza
Super User

I would suggest simplifying it in several ways then. 

Please only include one question at a time. 

 

So it looks like you have medical data and are looking for the top diagnosis. Based on the sample data you've provided initially, and thank you for that, what would you expect as output?


It would also help if you include anything you've tried so far. We're happy to help but when you post your requirements and data but no specific questions on what you need help with it comes across as someone asking others to do the work. 

Shmuel
Garnet | Level 18

You asked for:

1) I want to find the top two diagnosis(diag) ... per patient

2) find associated diagnosis per patient

It is not clear for me:

  1)  what do you mean by top two diagnosis(diag) ... per patient ? there is only one observation (in the sample) per ID !

  2)  do you mean by associated diagnosis  as most frequent couple of diagnosis in patiants

 

I assume next code may give you a hint to achieve your target:

%let nwdate = 31DEC2016;
data sample; retain newdate1 "&nwdate"d; format newdate1 mmyyn.; input MYDOB ADMIN$ SEX$ ID DIAG1$ DIAG2$ DIAG3$ DIAG4$ ; DATALINES; 111924 KL M 1 AA JI GE . 121926 OH M 2 BB AA . GE 121982 KL F 3 BB . . AA 101980 KL M 4 AA . . . 111979 OH F 5 JI . AA . 101959 KL M 6 GE . . . 121999 OH F 7 BB . . . 102008 KL M 8 AA JI . . 112001 KL F 9 JI . . AA 102016 OH M 10 BB . GE . RUN; data cleaned (keep= ADMIN SEX ID DIAG AGE agegroup); retain ID AGE ADMIN SEX DIAG agegroup; /* rearrange*/ set sample; OPTIONS YEARCUTOFF=1917; a = put(MYDOB,6.); NEWDOB = input (a, mmddyy6.); format NEWDOB MMYYN.; age = INT(YRDIF(NEWDOB, newdate1,'ACTUAL')); /*create age group*/ if age<=18 then agegroup ='Young'; else if 19<=age<64 then agegroup='Intemediate'; else if age >64 then agegroup= 'Retired'; array dg $ diag1-diag4; do i=1 to dim(dg); diag = dg(i); if not missing(diag) then output; end; run; proc sort data=cleaned; by sex agegroup id; run; proc freq data=cleaned; by sex; /* optional */ table agegroup * diag; run;

pay attenstion:

  - I have combined your first two steps into one, while creating the sample dataset within the newdate1 variable.

  - The newdate is given as a macro variable assigned, preceeding  thr code.

  - I made the wide dataset into a long one, each observation with one DIAG field (instead diag1-diag4)

  - I believe there are better procedurs then PROC FREQ that I have used to calculate frequency and find

    couples of associated diagnosis, but I am not familiar with those procedures.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1074 views
  • 3 likes
  • 3 in conversation