BookmarkSubscribeRSS Feed
Sv_kini
Calcite | Level 5

Hi,

 

I am trying to create a binary variable (eg. diabetes) satisfying the condition - at least 2 outpatient visits >= 7 days apart then binary variable is 1, else 0. I made it work for 1 variable here - o_diab (code below). However I need it to loop through 27 variables and create 27 flag variables. Basically the o_diab needs to be looped to include all 27 variables and (1/0) variables such as diabetes needs to be created. Can someone please help? Thanks!

 

(27 variables are = o_angina o_anemia o_alc o_cad o_coag_def o_chf ...............................................)

 

proc sql;
create table want as
select distinct id, index_date,
min(case when inpatient=0 and o_diab=1 then clm_from else . end) as firstDT_o_diab format date9.,
max(case when inpatient=0 and o_diab=1 then clm_from else . end) as lastDT_o_diab format date9.,
(case when calculated lastDT_o_diab - calculated firstDT_o_diab>=7 then 1 else 0 end) as diabetes
from raw;
quit;

13 REPLIES 13
PaigeMiller
Diamond | Level 26

You could do this relatively easily in a DATA step using ARRAYs.

 

If you have to do this in PROC SQL, you would need a macro to perform the looping.

--
Paige Miller
Sv_kini
Calcite | Level 5

Could you please share how I can do it as an array?

PaigeMiller
Diamond | Level 26
data want;
    set raw;
    array o o_angina o_anemia ... ;
    array bin o_angina_bin o_anemia_bin ...;
    array firstdt_o firstdt_o_angina firstdt_o_anemia ...;
    array lastdt_o lastdt_o_angina lastdt_o_anemia ...;
    do i=1 to dim(o);
        if inpatient=0 and o[i]=1 then firstdt_o[i]=clm_from; 
        else firstdt_o[i]=.;
        if inpatient=0 and o[i]=1 then clm_from;
        else lastdt_o[i]=.;
        when lastdt_o[i]-firstdt_o[i]>=7 then bin=1; else bin=0;
    end;
    drop i;
run;

This is UNTESTED CODE

 

If you don't need to have separate firstdt_o and lastdt_o for each of the 27 conditions, these don't have to be array variables.

--
Paige Miller
Sv_kini
Calcite | Level 5

Unfortunately,that did not work ....maybe because there are multiple rows per id and I need to take min and max vertically? The when statement in proc SQL helped with that...attaching an example table of raw and want if it helps. Maybe a macro example around the proc sql that did work (first post) is easier? Thank you!!

Sv_kini
Calcite | Level 5
Raw         
idclm_fromo_diabo_anginao_anemia     
1jan 1 2019011     
1jan 5 2019111     
1jan 10 2019000     
1jan 15 2019100     
1Jan 20 2019010     
          
Want         
idfirstdt_o_diablastdt_o_diabdiabetesfirstdt_o_anginalastdt_o_diabanginafirstdt_o_anemialastdt_o_anemiaanemia
1jan 5 2019jan 15 20191jan 1 2019Jan 20 20191jan 1 2019jan 5 20190
PaigeMiller
Diamond | Level 26

I do not download Excel files as they are security risks. THe proper way to provide data is this: 

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

You can just provide the data for say 10 patients and 3 conditions, that ought to be enough.

--
Paige Miller
Sv_kini
Calcite | Level 5

Sorry, here you go..pasted below. I think I created all possible scenarios using 2 fictitious ID's and 3 conditions. The Raw data is what it looks like right now and the want data is what I would like it to look like. Thanks!!

 

Raw         
idclm_fromo_diabo_anginao_anemia     
1jan 1 2019011     
1jan 5 2019111     
1jan 10 2019000     
1jan 15 2019100     
1Jan 20 2019010     
2mar 5 2019101     
2mar 10 2019001     
2apr 5 2019000     
          
Want         
idfirstdt_o_diablastdt_o_diabdiabetesfirstdt_o_anginalastdt_o_anginaanginafirstdt_o_anemialastdt_o_anemiaanemia
1jan 5 2019jan 15 20191jan 1 2019Jan 20 20191jan 1 2019jan 5 20190
2mar 5 2019.0..0mar 5 2019mar 10 20190
PaigeMiller
Diamond | Level 26

I would like the data provided as SAS data step code, according to the instructions that I linked to above.

--
Paige Miller
Sv_kini
Calcite | Level 5

data raw;
input ID clm_from o_diab o_angina o_anemia;
informat clm_from ddmmyy10.;
format clm_from ddmmyy10.;
datalines;
1 01/01/2019 0 1 1
1 05/01/2019 1 1 1
1 10/01/2019 0 0 0
1 15/01/2019 1 0 0
1 20/01/2019 0 1 0
2 05/03/2019 1 0 1
2 10/03/2019 0 0 1
2 05/04/2019 0 0 0
;
RUN;

PaigeMiller
Diamond | Level 26

This doesn't match the original problem statement. There is no variable named INPATIENT.

--
Paige Miller
Sv_kini
Calcite | Level 5

Oh yes, added it in. 

 

data raw;
input ID clm_from inpatient o_diab o_angina o_anemia;
informat clm_from ddmmyy10.;
format clm_from ddmmyy10.;
datalines;
1 01/01/2019 0 0 1 1
1 05/01/2019 0 1 1 1
1 10/01/2019 0 0 0 0
1 15/01/2019 0 1 0 0
1 20/01/2019 0 0 1 0
2 05/03/2019 0 1 0 1
2 10/03/2019 0 0 0 1
2 05/04/2019 0 0 0 0
;
RUN;

PaigeMiller
Diamond | Level 26

Ok, I will use PROC SUMMARY to avoid some looping here, there are also two output data sets WANT and WANT2, as I'm not sure which one you really want.

 

proc summary nway data=raw(where=(inpatient=0));
    class id o_diab o_angina o_anemia;
    types id*(o_diab o_angina o_anemia);
    var clm_from;
    output out=minmax min=min max=max;
run;
data want;
    set minmax;
    array o o_diab o_angina o_anemia;
    array r r_diab r_angina r_anemia;
    do i=1 to dim(o);
        if o(i)=1 and max-min>=7 then do;
            r(i)=1;
            output;
        end;
        else if o(i)=1 then do;
           r(i)=0;
           output;
        end;
    end;
    drop i;
run;
proc summary data=want nway;
    class id;
    var r_diab r_angina r_anemia;
    output out=want2 max=;
run;

Now that may seem still like a lot of typing, you can use a macro variable as the first line of the program to avoid some repetitive typing.

 

%let o_names=o_diab o_angina o_anemia;

 

and then replace the all the variable names with &o_names such as in line 2 above

 

class id &o_names; 

 

--
Paige Miller
ed_sas_member
Meteorite | Level 14

Hi @Sv_kini 

 

I have created the below code. Please test it to see whether it corresponds to your need and let me know.

NB: I don't understand why you want to retrieve first and last date for a disease when the "flag" indicating that there were at least 2 outpatient visits >= 7 days apart and each one satisfying the condition is equal to 0.

So the below output doesn't show that.

 

Best regards,

 

data raw;
	input ID clm_from inpatient o_diab o_angina o_anemia;
	informat clm_from ddmmyy10.;
	format clm_from ddmmyy10.;
	datalines;
1 01/01/2019 0 0 1 1
1 05/01/2019 0 1 1 1
1 10/01/2019 0 0 0 0
1 15/01/2019 0 1 0 0
1 20/01/2019 0 0 1 0
2 05/03/2019 0 1 0 1
2 10/03/2019 0 0 0 1
2 05/04/2019 0 0 0 0
	;
run;

/* Retrieve the max number of observation for an ID in macrovariable &count_obs */
proc sql noprint;
	select max(count_obs) into: count_obs from (select count(*) as count_obs from raw group by id);
quit;

/* Retrieve the list of diseases = diab, angina, ... in the dataset. raw_col */
proc contents data=raw out=raw_col (keep = name where=(lowcase(name) like 'o_%')) noprint;
run;

proc sort data=raw out=list_id (keep=id) nodupkey;
	by ID;
run;

/* Put the list of disease in macrovariable &list_dis */
proc sql noprint;
	select dis into:list_dis separated by " " from list_disease;
quit;

/* Macro to create for a specific disease the table containing the flag, the first and the last date */

%macro loop (disease);

	/* Identify cases where flag = 1 (<=> >7 days between 2 visits) + the last date */
	data dst_&disease._last (keep= id &disease lastdt_o_&disease);
		set raw;
		by ID;

		array _lagID (&count_obs);
		array _lago_&disease (&count_obs);
		array _lagCLM_&disease. (&count_obs);
	
		%do i=1 %to &count_obs;
			_lagID(&i) = lag&i.(ID);
			_lago_&disease.(&i) = lag&i.(o_&disease);
			_lagCLM_&disease.(&i.) = clm_from - lag&i.(clm_from);
		%end;
	
		%do i=1 %to &count_obs;
			if _lagID(&i) ne ID or (o_&disease. <1 or _lago_&disease.(&i) <1) then _lagCLM_&disease.(&i) = 0;
		%end;
	
		if max(of _lagCLM_&disease.(*)) >=7 then do;
			&disease = 1;
			output;
		end;
		rename clm_from = lastdt_o_&disease;
	run;

	/* Identify the first date */
	data dst_&disease._first (keep=id clm_from rename=(clm_from = firstdt_o_&disease));
		set raw;
		by ID;
		where o_&disease = 1;
		if first.ID then output;
	run;

	/* Merge */
	proc sql;
		create table dataset_&disease as
		select a.ID, b.firstdt_o_&disease, a.lastdt_o_&disease, a.&disease
		from dst_&disease._last as a left join dst_&disease._first as b
		on a.id = b.id;
	quit;

%mend;

/* Execute the macro %loop for each disease specified in the raw_col dataset */
data list_disease;
	set raw_col;
	dis = compress(name,"o_");
	rc=dosubl(cats('%loop(',dis,')'));
run;

/* Merge all datasets by ID + put the disease flag to 0 when not equal to 1 */
data want;
	merge list_id dataset:;
	by ID;
	array _dis (*) &list_dis;
	do i=1 to dim(_dis);
		if _dis(i) = . then _dis(i) = 0;
	end;
	drop i;
run;





proc print data=want;
	id id;
run;

Dataset WANT: outputDataset WANT: output

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1975 views
  • 1 like
  • 3 in conversation