Hi there,
I have a large healthcare dataset. Each row is a patient encounter. Each patient has a unique ID (patientID). Each encounter has a unique encounter number (visitID). In each row, the patient has a list of diagnoses flags, eg diabetes can equal 0 or 1 depending on if that diagnosis is present ON THAT VISIT.
In the dataset, a patient may have a diagnosis of diabetes on one visit but not on the rest. How can i create a flag 'historyofdiabetes' if the variable 'diabetes' is seen in any visit.
eg:
visitID | patientID | diabetes |
1 | AR1 | 0 |
2 | JS55 | 1 |
3 | EX2 | 0 |
4 | AR1 | 1 |
5 | BC99 | 0 |
6 | JS55 | 0 |
7 | AR1 | 1 |
In the example, Patient AR1 has three visits and JS55 has two visits. In at least one visit, a diagnosis of diabetes is seen for each patient (Visit #4/#7 for AR1 and Visit#2 for JS55) so we know both of these patients have a history of diabetes.
I would like a new column 'historyofdiabetes' where all the rows with JS55 and AR1 would have a 1.
visitID | patientID | diabetes | historyofdiabetes |
1 | AR1 | 0 | 1 |
2 | JS55 | 1 | 1 |
3 | EX2 | 0 | 0 |
4 | AR1 | 1 | 1 |
5 | BC99 | 0 | 0 |
6 | JS55 | 0 | 1 |
7 | AR1 | 1 | 1 |
The only thing is this database has a ton of patients. I have been creating a list of patientID that have diabetes then having the program add a flag historyofdiabetes=1 if patientID = x or y or z but SAS has a lot of difficulty because of the number of patients in each list.
My code step 1:
proc freq data=kid.dataset(where=(diabetes=1)) ORDER=FREQ;
table patientID;
run;
My code step 2:
data kid.dataset (compress=yes);
set kid.dataset;
if patientID in('JS55',
'AR1', etc) then historyofdiabetes=1; else historyofdiabetes=0;
run;
Is there a way to temporarily store a list of patientIDs who have diabetes and then have the code read that list and add the historyofdiabetes flag all in one, non-memory intensive, step
This problem is a natural application for the hash object, which is a memory-resident "table" that allows quick access to dataitems (table "rows") based on a key - patientid in your case. The tactic below is to create such an object containing ONLY patientid's with diabetes=1 - hash object named DIAB in this code. Then, if the patientid is in DIAB, history_of_diabetes should be set to 1:
data want;
set kid.dataset;
if _n_=1 then do;
declare hash diab(dataset:'kid.dataset (keep=patientid diabetes where=(diabetes=1))');
diab.definekey('patientid');
diab.definedata('patientid');
diab.definedone();
end;
history_of_diabetes=(diab.find()=0);
run;
This program effectively reads the kid.dataset twice, once to populate the hash table diab, and the second to read all the observations and see if they are in the diab hash table. Note the the diab.find() method returns a zero when successful. That's why the assignment statement for history_of_diabetes might at first look counter-intuitive, because it returns a 1 or 0 depending on whether diab.find() is successful.
the best method is:one data step double set can finish your task。
but i writed the another stupid method:
data temp;
set a;
n=_n_;
run;
proc sort data=temp;
by patientID descending diabetes;
run;
data temp;
set temp;
by patientID descending diabetes;
retain historyofdiabetes;
if first.patientID then historyofdiabetes=diabetes;
run;
proc sort data=temp out=result(drop=n);
by n;
run;
Why not just sort?
If you sort so that 1 is first (descending) the you can set all other records to the flag if you want.
proc sort data=have;
by id descending flag;
run;
data want;
set have;
by id;
retain Flag;
if first.ID then newFlag=flag;
run;
@jcsimmo wrote:
Hi there,
I have a large healthcare dataset. Each row is a patient encounter. Each patient has a unique ID (patientID). Each encounter has a unique encounter number (visitID). In each row, the patient has a list of diagnoses flags, eg diabetes can equal 0 or 1 depending on if that diagnosis is present ON THAT VISIT.
In the dataset, a patient may have a diagnosis of diabetes on one visit but not on the rest. How can i create a flag 'historyofdiabetes' if the variable 'diabetes' is seen in any visit.
eg:
visitID
patientID
diabetes
1
AR1
0
2
JS55
1
3
EX2
0
4
AR1
1
5
BC99
0
6
JS55
0
7
AR1
1
In the example, Patient AR1 has three visits and JS55 has two visits. In at least one visit, a diagnosis of diabetes is seen for each patient (Visit #4/#7 for AR1 and Visit#2 for JS55) so we know both of these patients have a history of diabetes.
I would like a new column 'historyofdiabetes' where all the rows with JS55 and AR1 would have a 1.
visitID
patientID
diabetes
historyofdiabetes
1
AR1
0
1
2
JS55
1
1
3
EX2
0
0
4
AR1
1
1
5
BC99
0
0
6
JS55
0
1
7
AR1
1
1
The only thing is this database has a ton of patients. I have been creating a list of patientID that have diabetes then having the program add a flag historyofdiabetes=1 if patientID = x or y or z but SAS has a lot of difficulty because of the number of patients in each list.
My code step 1:
proc freq data=kid.dataset(where=(diabetes=1)) ORDER=FREQ;
table patientID;
run;
My code step 2:
data kid.dataset (compress=yes);
set kid.dataset;
if patientID in('JS55',
'AR1', etc) then historyofdiabetes=1; else historyofdiabetes=0;
run;
Is there a way to temporarily store a list of patientIDs who have diabetes and then have the code read that list and add the historyofdiabetes flag all in one, non-memory intensive, step
This problem is a natural application for the hash object, which is a memory-resident "table" that allows quick access to dataitems (table "rows") based on a key - patientid in your case. The tactic below is to create such an object containing ONLY patientid's with diabetes=1 - hash object named DIAB in this code. Then, if the patientid is in DIAB, history_of_diabetes should be set to 1:
data want;
set kid.dataset;
if _n_=1 then do;
declare hash diab(dataset:'kid.dataset (keep=patientid diabetes where=(diabetes=1))');
diab.definekey('patientid');
diab.definedata('patientid');
diab.definedone();
end;
history_of_diabetes=(diab.find()=0);
run;
This program effectively reads the kid.dataset twice, once to populate the hash table diab, and the second to read all the observations and see if they are in the diab hash table. Note the the diab.find() method returns a zero when successful. That's why the assignment statement for history_of_diabetes might at first look counter-intuitive, because it returns a 1 or 0 depending on whether diab.find() is successful.
This actually worked perfectly. Thank you
Can I ask one other question since I am having a difficult time understanding how the hash tables work.
say there is another variable in the dataset that is patientsmomid for the patient's mother's patientID, can I use the hash table to find all the patientsmomid that have this new diabetes flag and then find the patientIDs of the those mothers and flag them with a momofpatientwithdiabetes flag.
My guess would be that
data want;
set kid.dataset;
if _n_=1 then do;
declare hash diab(dataset:'kid.dataset (keep=patientid patientsmomid history_of_diabetes where=(history_of_diabetes=1))');
diab.definekey('patientid');
diab.definedata('patientsmomid');
diab.definedone();
end;
momofpatientwithdiabetes =(diab.find()=0);
run;
I presume both moms and patients are in the dataset you refer to. So, for instance, if there is a record of a diabetes patient with ID=333, and MOMID=111, then if there is another records with ID=111, you want to set the flag of momofpatientwithdiab=1.
for each patient with diabetes, you want to
1. get their mother's id, and put it in a lookup table
2. find that id as patient id in the dataset and add a flag variable momofpatientwithdiab=1
data want;
set kid.dataset;
declare hash momids(dataset:'kid.dataset (keep=patientsmomid history_of_diabetes
where=(history_of_diabetes=1))');
momids.definekey('patientsmomid');
momids.definedata();
momids.definedone();
end;
momofpatientwithdiabetes=(momids.check(key:patientid)=0);
run;
Unlike the FIND method, the CHECK method does not transfer data from the hash object to the data set program data vector. It just tells the users whether a key with that value is in the hash object. As a result, while the hash object is keyed on the select patientsmomid, the program running the check compares the incoming record patientid against all the patientsmomid in the table.
You have
momids.definedata();
blank.
Would patientid be in this field? (and also in the keep field as well).
Jon
data have;
input visitID patientID $ diabetes ;
cards;
1 AR1 0
2 JS55 1
3 EX2 0
4 AR1 1
5 BC99 0
6 JS55 0
7 AR1 1
;
proc sql;
create table want as
select *, max(diabetes) as historyofdiabetes
from have
group by patientid
order by visitid;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.