BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jcsimmo
Fluorite | Level 6

Hi there,

 

I have a large healthcare dataset. Each row is a patient encounter. Each patient has a unique ID (patientID). Each encounter has a unique encounter number (visitID). In each row, the patient has a list of diagnoses flags, eg diabetes can equal 0 or 1 depending on if that diagnosis is present ON THAT VISIT.

 

In the dataset, a patient may have a diagnosis of diabetes on one visit but not on the rest. How can i create a flag 'historyofdiabetes' if the variable 'diabetes' is seen in any visit.

 

eg:

visitID

patientID

diabetes

1

AR1

0

2

JS55

1

3

EX2

0

4

AR1

1

5

BC99

0

6

JS55

0

7

AR1

1

 

In the example, Patient AR1 has three visits and JS55 has two visits. In at least one visit, a diagnosis of diabetes is seen for each patient (Visit #4/#7 for AR1 and Visit#2 for JS55) so we know both of these patients have a history of diabetes.

 

I would like a new column 'historyofdiabetes' where all the rows with JS55 and AR1 would have a 1.

 

visitID

patientID

diabetes

historyofdiabetes

1

AR1

0

1

2

JS55

1

1

3

EX2

0

0

4

AR1

1

1

5

BC99

0

0

6

JS55

0

1

7

AR1

1

1

 

 

The only thing is this database has a ton of patients. I have been creating a list of patientID that have diabetes then having the program add a flag historyofdiabetes=1 if patientID = x or y or z but SAS has a lot of difficulty because of the number of patients in each list.

 

My code step 1:

proc freq data=kid.dataset(where=(diabetes=1)) ORDER=FREQ;
table patientID;
run;

 

My code step 2:

data kid.dataset (compress=yes);
    set kid.dataset;
if patientID in('JS55',
'AR1', etc) then historyofdiabetes=1; else historyofdiabetes=0;
run;

 

Is there a way to temporarily store a list of patientIDs who have diabetes and then have the code read that list and add the historyofdiabetes flag all in one, non-memory intensive, step

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This problem is a natural application for the hash object, which is a memory-resident "table" that allows quick access to dataitems (table "rows") based on a key - patientid in your case.  The tactic below is to create such an object containing ONLY patientid's with diabetes=1 - hash object named DIAB in this code.  Then, if the patientid is in DIAB, history_of_diabetes should be set to 1:

 

data want;
  set kid.dataset;
  if _n_=1 then do;
    declare hash diab(dataset:'kid.dataset (keep=patientid diabetes where=(diabetes=1))');
    diab.definekey('patientid');
    diab.definedata('patientid');
    diab.definedone();
  end;
  history_of_diabetes=(diab.find()=0);
run;

 

This program effectively reads the kid.dataset twice, once to populate the hash table diab, and the second to read all the observations and see if they are in the diab hash table.  Note the the diab.find() method returns a zero when successful.  That's why the assignment statement for history_of_diabetes might at first look counter-intuitive, because it returns a 1 or 0 depending on whether diab.find() is successful.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

8 REPLIES 8
learsaas
Quartz | Level 8

the best method is:one data step double set can finish your task。

but i writed the another stupid method:

 

data temp;
	set a;
	n=_n_;
run;
proc sort data=temp;
	by patientID descending diabetes;
run;
data temp;
	set temp;
	by patientID descending diabetes;
	retain historyofdiabetes;
	if first.patientID then historyofdiabetes=diabetes;
run;
proc sort data=temp out=result(drop=n);
	by n;
run;	
Reeza
Super User

Why not just sort?

 

If you sort so that 1 is first (descending) the you can set all other records to the flag if you want. 

 

proc sort data=have;
by id descending flag;
run;

data want;
set have;
by id;
retain Flag;

if first.ID then newFlag=flag;

run;

@jcsimmo wrote:

Hi there,

 

I have a large healthcare dataset. Each row is a patient encounter. Each patient has a unique ID (patientID). Each encounter has a unique encounter number (visitID). In each row, the patient has a list of diagnoses flags, eg diabetes can equal 0 or 1 depending on if that diagnosis is present ON THAT VISIT.

 

In the dataset, a patient may have a diagnosis of diabetes on one visit but not on the rest. How can i create a flag 'historyofdiabetes' if the variable 'diabetes' is seen in any visit.

 

eg:

visitID

patientID

diabetes

1

AR1

0

2

JS55

1

3

EX2

0

4

AR1

1

5

BC99

0

6

JS55

0

7

AR1

1

 

In the example, Patient AR1 has three visits and JS55 has two visits. In at least one visit, a diagnosis of diabetes is seen for each patient (Visit #4/#7 for AR1 and Visit#2 for JS55) so we know both of these patients have a history of diabetes.

 

I would like a new column 'historyofdiabetes' where all the rows with JS55 and AR1 would have a 1.

 

visitID

patientID

diabetes

historyofdiabetes

1

AR1

0

1

2

JS55

1

1

3

EX2

0

0

4

AR1

1

1

5

BC99

0

0

6

JS55

0

1

7

AR1

1

1

 

 

The only thing is this database has a ton of patients. I have been creating a list of patientID that have diabetes then having the program add a flag historyofdiabetes=1 if patientID = x or y or z but SAS has a lot of difficulty because of the number of patients in each list.

 

My code step 1:

proc freq data=kid.dataset(where=(diabetes=1)) ORDER=FREQ;
table patientID;
run;

 

My code step 2:

data kid.dataset (compress=yes);
    set kid.dataset;
if patientID in('JS55',
'AR1', etc) then historyofdiabetes=1; else historyofdiabetes=0;
run;

 

Is there a way to temporarily store a list of patientIDs who have diabetes and then have the code read that list and add the historyofdiabetes flag all in one, non-memory intensive, step

 




mkeintz
PROC Star

This problem is a natural application for the hash object, which is a memory-resident "table" that allows quick access to dataitems (table "rows") based on a key - patientid in your case.  The tactic below is to create such an object containing ONLY patientid's with diabetes=1 - hash object named DIAB in this code.  Then, if the patientid is in DIAB, history_of_diabetes should be set to 1:

 

data want;
  set kid.dataset;
  if _n_=1 then do;
    declare hash diab(dataset:'kid.dataset (keep=patientid diabetes where=(diabetes=1))');
    diab.definekey('patientid');
    diab.definedata('patientid');
    diab.definedone();
  end;
  history_of_diabetes=(diab.find()=0);
run;

 

This program effectively reads the kid.dataset twice, once to populate the hash table diab, and the second to read all the observations and see if they are in the diab hash table.  Note the the diab.find() method returns a zero when successful.  That's why the assignment statement for history_of_diabetes might at first look counter-intuitive, because it returns a 1 or 0 depending on whether diab.find() is successful.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jcsimmo
Fluorite | Level 6

This actually worked perfectly. Thank you

jcsimmo
Fluorite | Level 6

Can I ask one other question since I am having a difficult time understanding how the hash tables work.

 

say there is another variable in the dataset that is patientsmomid for the patient's mother's patientID, can I use the hash table to find all the patientsmomid that have this new diabetes flag and then find the patientIDs of the those mothers and flag them with a momofpatientwithdiabetes flag.

 

My guess would be that

 

data want;
  set kid.dataset;
  if _n_=1 then do;
    declare hash diab(dataset:'kid.dataset (keep=patientid patientsmomid history_of_diabetes where=(history_of_diabetes=1))');
    diab.definekey('patientid');
    diab.definedata('patientsmomid');
    diab.definedone();
  end;
  momofpatientwithdiabetes =(diab.find()=0);
run;

 

mkeintz
PROC Star

I presume both moms and patients are in the dataset you refer to.  So, for instance, if there is a record of a diabetes patient with ID=333, and MOMID=111, then if there is another records with ID=111, you want to set the flag of momofpatientwithdiab=1.

 

  for each patient with diabetes, you want to

     1. get their mother's id, and put it in a lookup table

     2. find that id as patient id in the dataset and add a flag variable momofpatientwithdiab=1

 

data want;
  set kid.dataset;
  declare hash momids(dataset:'kid.dataset (keep=patientsmomid history_of_diabetes                                                  
                                                 where=(history_of_diabetes=1))');
    momids.definekey('patientsmomid');
    momids.definedata();
    momids.definedone();
  end;
  momofpatientwithdiabetes=(momids.check(key:patientid)=0);
run;

 

Unlike the FIND method, the CHECK method does not transfer data from the hash object to the data set program data vector.  It just tells the users whether a key with that value is in the hash object.  As a result, while the hash object is keyed on the select patientsmomid, the program running the check compares the incoming record patientid against all the patientsmomid in the table.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jcsimmo
Fluorite | Level 6

You have

  momids.definedata();

blank.

 

Would patientid be in this field? (and also in the keep field as well).

 

Jon

novinosrin
Tourmaline | Level 20
data have;
input visitID	patientID $	diabetes ;
cards;
1	AR1	0
2	JS55	1
3	EX2	0
4	AR1	1
5	BC99	0
6	JS55	0
7	AR1	1
;

proc sql;
create table want as 
select *, max(diabetes) as historyofdiabetes
from have
group by patientid
order by visitid;
quit;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 6968 views
  • 4 likes
  • 5 in conversation