Hi,
I am trying to create a binary variable (eg. diabetes) satisfying the condition - at least 2 outpatient visits >= 7 days apart then binary variable is 1, else 0. I made it work for 1 variable here - o_diab (code below). However I need it to loop through 27 variables and create 27 flag variables. Basically the o_diab needs to be looped to include all 27 variables and (1/0) variables such as diabetes needs to be created. Can someone please help? Thanks!
(27 variables are = o_angina o_anemia o_alc o_cad o_coag_def o_chf ...............................................)
proc sql;
create table want as
select distinct id, index_date,
min(case when inpatient=0 and o_diab=1 then clm_from else . end) as firstDT_o_diab format date9.,
max(case when inpatient=0 and o_diab=1 then clm_from else . end) as lastDT_o_diab format date9.,
(case when calculated lastDT_o_diab - calculated firstDT_o_diab>=7 then 1 else 0 end) as diabetes
from raw;
quit;
You could do this relatively easily in a DATA step using ARRAYs.
If you have to do this in PROC SQL, you would need a macro to perform the looping.
Could you please share how I can do it as an array?
data want;
set raw;
array o o_angina o_anemia ... ;
array bin o_angina_bin o_anemia_bin ...;
array firstdt_o firstdt_o_angina firstdt_o_anemia ...;
array lastdt_o lastdt_o_angina lastdt_o_anemia ...;
do i=1 to dim(o);
if inpatient=0 and o[i]=1 then firstdt_o[i]=clm_from;
else firstdt_o[i]=.;
if inpatient=0 and o[i]=1 then clm_from;
else lastdt_o[i]=.;
when lastdt_o[i]-firstdt_o[i]>=7 then bin=1; else bin=0;
end;
drop i;
run;
This is UNTESTED CODE
If you don't need to have separate firstdt_o and lastdt_o for each of the 27 conditions, these don't have to be array variables.
Unfortunately,that did not work ....maybe because there are multiple rows per id and I need to take min and max vertically? The when statement in proc SQL helped with that...attaching an example table of raw and want if it helps. Maybe a macro example around the proc sql that did work (first post) is easier? Thank you!!
Raw | |||||||||
id | clm_from | o_diab | o_angina | o_anemia | |||||
1 | jan 1 2019 | 0 | 1 | 1 | |||||
1 | jan 5 2019 | 1 | 1 | 1 | |||||
1 | jan 10 2019 | 0 | 0 | 0 | |||||
1 | jan 15 2019 | 1 | 0 | 0 | |||||
1 | Jan 20 2019 | 0 | 1 | 0 | |||||
Want | |||||||||
id | firstdt_o_diab | lastdt_o_diab | diabetes | firstdt_o_angina | lastdt_o_diab | angina | firstdt_o_anemia | lastdt_o_anemia | anemia |
1 | jan 5 2019 | jan 15 2019 | 1 | jan 1 2019 | Jan 20 2019 | 1 | jan 1 2019 | jan 5 2019 | 0 |
I do not download Excel files as they are security risks. THe proper way to provide data is this:
You can just provide the data for say 10 patients and 3 conditions, that ought to be enough.
Sorry, here you go..pasted below. I think I created all possible scenarios using 2 fictitious ID's and 3 conditions. The Raw data is what it looks like right now and the want data is what I would like it to look like. Thanks!!
Raw | |||||||||
id | clm_from | o_diab | o_angina | o_anemia | |||||
1 | jan 1 2019 | 0 | 1 | 1 | |||||
1 | jan 5 2019 | 1 | 1 | 1 | |||||
1 | jan 10 2019 | 0 | 0 | 0 | |||||
1 | jan 15 2019 | 1 | 0 | 0 | |||||
1 | Jan 20 2019 | 0 | 1 | 0 | |||||
2 | mar 5 2019 | 1 | 0 | 1 | |||||
2 | mar 10 2019 | 0 | 0 | 1 | |||||
2 | apr 5 2019 | 0 | 0 | 0 | |||||
Want | |||||||||
id | firstdt_o_diab | lastdt_o_diab | diabetes | firstdt_o_angina | lastdt_o_angina | angina | firstdt_o_anemia | lastdt_o_anemia | anemia |
1 | jan 5 2019 | jan 15 2019 | 1 | jan 1 2019 | Jan 20 2019 | 1 | jan 1 2019 | jan 5 2019 | 0 |
2 | mar 5 2019 | . | 0 | . | . | 0 | mar 5 2019 | mar 10 2019 | 0 |
I would like the data provided as SAS data step code, according to the instructions that I linked to above.
data raw;
input ID clm_from o_diab o_angina o_anemia;
informat clm_from ddmmyy10.;
format clm_from ddmmyy10.;
datalines;
1 01/01/2019 0 1 1
1 05/01/2019 1 1 1
1 10/01/2019 0 0 0
1 15/01/2019 1 0 0
1 20/01/2019 0 1 0
2 05/03/2019 1 0 1
2 10/03/2019 0 0 1
2 05/04/2019 0 0 0
;
RUN;
This doesn't match the original problem statement. There is no variable named INPATIENT.
Oh yes, added it in.
data raw;
input ID clm_from inpatient o_diab o_angina o_anemia;
informat clm_from ddmmyy10.;
format clm_from ddmmyy10.;
datalines;
1 01/01/2019 0 0 1 1
1 05/01/2019 0 1 1 1
1 10/01/2019 0 0 0 0
1 15/01/2019 0 1 0 0
1 20/01/2019 0 0 1 0
2 05/03/2019 0 1 0 1
2 10/03/2019 0 0 0 1
2 05/04/2019 0 0 0 0
;
RUN;
Ok, I will use PROC SUMMARY to avoid some looping here, there are also two output data sets WANT and WANT2, as I'm not sure which one you really want.
proc summary nway data=raw(where=(inpatient=0));
class id o_diab o_angina o_anemia;
types id*(o_diab o_angina o_anemia);
var clm_from;
output out=minmax min=min max=max;
run;
data want;
set minmax;
array o o_diab o_angina o_anemia;
array r r_diab r_angina r_anemia;
do i=1 to dim(o);
if o(i)=1 and max-min>=7 then do;
r(i)=1;
output;
end;
else if o(i)=1 then do;
r(i)=0;
output;
end;
end;
drop i;
run;
proc summary data=want nway;
class id;
var r_diab r_angina r_anemia;
output out=want2 max=;
run;
Now that may seem still like a lot of typing, you can use a macro variable as the first line of the program to avoid some repetitive typing.
%let o_names=o_diab o_angina o_anemia;
and then replace the all the variable names with &o_names such as in line 2 above
class id &o_names;
Hi @Sv_kini
I have created the below code. Please test it to see whether it corresponds to your need and let me know.
NB: I don't understand why you want to retrieve first and last date for a disease when the "flag" indicating that there were at least 2 outpatient visits >= 7 days apart and each one satisfying the condition is equal to 0.
So the below output doesn't show that.
Best regards,
data raw;
input ID clm_from inpatient o_diab o_angina o_anemia;
informat clm_from ddmmyy10.;
format clm_from ddmmyy10.;
datalines;
1 01/01/2019 0 0 1 1
1 05/01/2019 0 1 1 1
1 10/01/2019 0 0 0 0
1 15/01/2019 0 1 0 0
1 20/01/2019 0 0 1 0
2 05/03/2019 0 1 0 1
2 10/03/2019 0 0 0 1
2 05/04/2019 0 0 0 0
;
run;
/* Retrieve the max number of observation for an ID in macrovariable &count_obs */
proc sql noprint;
select max(count_obs) into: count_obs from (select count(*) as count_obs from raw group by id);
quit;
/* Retrieve the list of diseases = diab, angina, ... in the dataset. raw_col */
proc contents data=raw out=raw_col (keep = name where=(lowcase(name) like 'o_%')) noprint;
run;
proc sort data=raw out=list_id (keep=id) nodupkey;
by ID;
run;
/* Put the list of disease in macrovariable &list_dis */
proc sql noprint;
select dis into:list_dis separated by " " from list_disease;
quit;
/* Macro to create for a specific disease the table containing the flag, the first and the last date */
%macro loop (disease);
/* Identify cases where flag = 1 (<=> >7 days between 2 visits) + the last date */
data dst_&disease._last (keep= id &disease lastdt_o_&disease);
set raw;
by ID;
array _lagID (&count_obs);
array _lago_&disease (&count_obs);
array _lagCLM_&disease. (&count_obs);
%do i=1 %to &count_obs;
_lagID(&i) = lag&i.(ID);
_lago_&disease.(&i) = lag&i.(o_&disease);
_lagCLM_&disease.(&i.) = clm_from - lag&i.(clm_from);
%end;
%do i=1 %to &count_obs;
if _lagID(&i) ne ID or (o_&disease. <1 or _lago_&disease.(&i) <1) then _lagCLM_&disease.(&i) = 0;
%end;
if max(of _lagCLM_&disease.(*)) >=7 then do;
&disease = 1;
output;
end;
rename clm_from = lastdt_o_&disease;
run;
/* Identify the first date */
data dst_&disease._first (keep=id clm_from rename=(clm_from = firstdt_o_&disease));
set raw;
by ID;
where o_&disease = 1;
if first.ID then output;
run;
/* Merge */
proc sql;
create table dataset_&disease as
select a.ID, b.firstdt_o_&disease, a.lastdt_o_&disease, a.&disease
from dst_&disease._last as a left join dst_&disease._first as b
on a.id = b.id;
quit;
%mend;
/* Execute the macro %loop for each disease specified in the raw_col dataset */
data list_disease;
set raw_col;
dis = compress(name,"o_");
rc=dosubl(cats('%loop(',dis,')'));
run;
/* Merge all datasets by ID + put the disease flag to 0 when not equal to 1 */
data want;
merge list_id dataset:;
by ID;
array _dis (*) &list_dis;
do i=1 to dim(_dis);
if _dis(i) = . then _dis(i) = 0;
end;
drop i;
run;
proc print data=want;
id id;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.