Solved: How to add a flag on multiple records if a flag is present on just one...

jcsimmo · Posted 11-27-2018 07:49 PM

Hi there,

I have a large healthcare dataset. Each row is a patient encounter. Each patient has a unique ID (patientID). Each encounter has a unique encounter number (visitID). In each row, the patient has a list of diagnoses flags, eg diabetes can equal 0 or 1 depending on if that diagnosis is present ON THAT VISIT.

In the dataset, a patient may have a diagnosis of diabetes on one visit but not on the rest. How can i create a flag 'historyofdiabetes' if the variable 'diabetes' is seen in any visit.

eg:

visitID	patientID	diabetes
1	AR1	0
2	JS55	1
3	EX2	0
4	AR1	1
5	BC99	0
6	JS55	0
7	AR1	1

In the example, Patient AR1 has three visits and JS55 has two visits. In at least one visit, a diagnosis of diabetes is seen for each patient (Visit #4/#7 for AR1 and Visit#2 for JS55) so we know both of these patients have a history of diabetes.

I would like a new column 'historyofdiabetes' where all the rows with JS55 and AR1 would have a 1.

visitID	patientID	diabetes	historyofdiabetes
1	AR1	0	1
2	JS55	1	1
3	EX2	0	0
4	AR1	1	1
5	BC99	0	0
6	JS55	0	1
7	AR1	1	1

The only thing is this database has a ton of patients. I have been creating a list of patientID that have diabetes then having the program add a flag historyofdiabetes=1 if patientID = x or y or z but SAS has a lot of difficulty because of the number of patients in each list.

My code step 1:

proc freq data=kid.dataset(where=(diabetes=1)) ORDER=FREQ;
table patientID;
run;

My code step 2:

data kid.dataset (compress=yes);
set kid.dataset;
if patientID in('JS55',
'AR1', etc) then historyofdiabetes=1; else historyofdiabetes=0;
run;

Is there a way to temporarily store a list of patientIDs who have diabetes and then have the code read that list and add the historyofdiabetes flag all in one, non-memory intensive, step

mkeintz · Posted 11-27-2018 08:52 PM

This problem is a natural application for the hash object, which is a memory-resident "table" that allows quick access to dataitems (table "rows") based on a key - patientid in your case. The tactic below is to create such an object containing ONLY patientid's with diabetes=1 - hash object named DIAB in this code. Then, if the patientid is in DIAB, history_of_diabetes should be set to 1:

data want;
  set kid.dataset;
  if _n_=1 then do;
    declare hash diab(dataset:'kid.dataset (keep=patientid diabetes where=(diabetes=1))');
    diab.definekey('patientid');
    diab.definedata('patientid');
    diab.definedone();
  end;
  history_of_diabetes=(diab.find()=0);
run;

This program effectively reads the kid.dataset twice, once to populate the hash table diab, and the second to read all the observations and see if they are in the diab hash table. Note the the diab.find() method returns a zero when successful. That's why the assignment statement for history_of_diabetes might at first look counter-intuitive, because it returns a 1 or 0 depending on whether diab.find() is successful.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

learsaas · Posted 11-27-2018 08:39 PM

the best method is：one data step double set can finish your task。

but i writed the another stupid method：

data temp;
	set a;
	n=_n_;
run;
proc sort data=temp;
	by patientID descending diabetes;
run;
data temp;
	set temp;
	by patientID descending diabetes;
	retain historyofdiabetes;
	if first.patientID then historyofdiabetes=diabetes;
run;
proc sort data=temp out=result(drop=n);
	by n;
run;

Reeza · Posted 11-27-2018 08:47 PM

Why not just sort?

If you sort so that 1 is first (descending) the you can set all other records to the flag if you want.

proc sort data=have;
by id descending flag;
run;

data want;
set have;
by id;
retain Flag;

if first.ID then newFlag=flag;

run;

@jcsimmo wrote:

Hi there,

I have a large healthcare dataset. Each row is a patient encounter. Each patient has a unique ID (patientID). Each encounter has a unique encounter number (visitID). In each row, the patient has a list of diagnoses flags, eg diabetes can equal 0 or 1 depending on if that diagnosis is present ON THAT VISIT.

In the dataset, a patient may have a diagnosis of diabetes on one visit but not on the rest. How can i create a flag 'historyofdiabetes' if the variable 'diabetes' is seen in any visit.

eg:

visitID

patientID

diabetes

1

AR1

0

2

JS55

1

3

EX2

0

4

AR1

1

5

BC99

0

6

JS55

0

7

AR1

1

In the example, Patient AR1 has three visits and JS55 has two visits. In at least one visit, a diagnosis of diabetes is seen for each patient (Visit #4/#7 for AR1 and Visit#2 for JS55) so we know both of these patients have a history of diabetes.

I would like a new column 'historyofdiabetes' where all the rows with JS55 and AR1 would have a 1.

visitID

patientID

diabetes

historyofdiabetes

1

AR1

0

1

2

JS55

1

1

3

EX2

0

0

4

AR1

1

1

5

BC99

0

0

6

JS55

0

1

7

AR1

1

1

The only thing is this database has a ton of patients. I have been creating a list of patientID that have diabetes then having the program add a flag historyofdiabetes=1 if patientID = x or y or z but SAS has a lot of difficulty because of the number of patients in each list.

My code step 1:

proc freq data=kid.dataset(where=(diabetes=1)) ORDER=FREQ;
table patientID;
run;

My code step 2:

data kid.dataset (compress=yes);
set kid.dataset;
if patientID in('JS55',
'AR1', etc) then historyofdiabetes=1; else historyofdiabetes=0;
run;

Is there a way to temporarily store a list of patientIDs who have diabetes and then have the code read that list and add the historyofdiabetes flag all in one, non-memory intensive, step

mkeintz · Posted 11-27-2018 08:52 PM

This problem is a natural application for the hash object, which is a memory-resident "table" that allows quick access to dataitems (table "rows") based on a key - patientid in your case. The tactic below is to create such an object containing ONLY patientid's with diabetes=1 - hash object named DIAB in this code. Then, if the patientid is in DIAB, history_of_diabetes should be set to 1:

data want;
  set kid.dataset;
  if _n_=1 then do;
    declare hash diab(dataset:'kid.dataset (keep=patientid diabetes where=(diabetes=1))');
    diab.definekey('patientid');
    diab.definedata('patientid');
    diab.definedone();
  end;
  history_of_diabetes=(diab.find()=0);
run;

This program effectively reads the kid.dataset twice, once to populate the hash table diab, and the second to read all the observations and see if they are in the diab hash table. Note the the diab.find() method returns a zero when successful. That's why the assignment statement for history_of_diabetes might at first look counter-intuitive, because it returns a 1 or 0 depending on whether diab.find() is successful.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jcsimmo · Posted 12-07-2018 07:30 PM

This actually worked perfectly. Thank you

jcsimmo · Posted 12-09-2018 09:26 PM

Can I ask one other question since I am having a difficult time understanding how the hash tables work.

say there is another variable in the dataset that is patientsmomid for the patient's mother's patientID, can I use the hash table to find all the patientsmomid that have this new diabetes flag and then find the patientIDs of the those mothers and flag them with a momofpatientwithdiabetes flag.

My guess would be that

data want;
  set kid.dataset;
  if _n_=1 then do;
    declare hash diab(dataset:'kid.dataset (keep=patientid patientsmomid history_of_diabetes where=(history_of_diabetes=1))');
    diab.definekey('patientid');
    diab.definedata('patientsmomid');
    diab.definedone();
  end;
  momofpatientwithdiabetes =(diab.find()=0);
run;

mkeintz · Posted 12-10-2018 02:17 AM

I presume both moms and patients are in the dataset you refer to. So, for instance, if there is a record of a diabetes patient with ID=333, and MOMID=111, then if there is another records with ID=111, you want to set the flag of momofpatientwithdiab=1.

for each patient with diabetes, you want to

1. get their mother's id, and put it in a lookup table

2. find that id as patient id in the dataset and add a flag variable momofpatientwithdiab=1

data want;
  set kid.dataset;
  declare hash momids(dataset:'kid.dataset (keep=patientsmomid history_of_diabetes                                                  
                                                 where=(history_of_diabetes=1))');
    momids.definekey('patientsmomid');
    momids.definedata();
    momids.definedone();
  end;
  momofpatientwithdiabetes=(momids.check(key:patientid)=0);
run;

Unlike the FIND method, the CHECK method does not transfer data from the hash object to the data set program data vector. It just tells the users whether a key with that value is in the hash object. As a result, while the hash object is keyed on the select patientsmomid, the program running the check compares the incoming record patientid against all the patientsmomid in the table.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

jcsimmo · Posted 12-10-2018 03:29 PM

You have

  momids.definedata();

blank.

Would patientid be in this field? (and also in the keep field as well).

Jon

novinosrin · Posted 11-27-2018 09:08 PM

data have;
input visitID	patientID $	diabetes ;
cards;
1	AR1	0
2	JS55	1
3	EX2	0
4	AR1	1
5	BC99	0
6	JS55	0
7	AR1	1
;

proc sql;
create table want as 
select *, max(diabetes) as historyofdiabetes
from have
group by patientid
order by visitid;
quit;

How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Re: How to add a flag on multiple records if a flag is present on just one of those records

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away