BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sandrube
Fluorite | Level 6

I have patients with multiple rows and variables diabetes (1=yes and 2=no) and hypertension (1=yes and 2=no). For each patient, if it is 1 in any row, it should be considered as 1. Please see the initial data set:

data have;
input pt_id	diabetes hypertension;
datalines;
1 0 0
1 1 0
1 1 0
1 0 0
2 0 0
2 0 0
2 0 0
2 0 0
3 1 1
3 0 1
3 0 1
run;

Final data set should look like this:

 

data have;
input pt_id	diabetes hypertension;
datalines;
1 1 0
2 0 0
3 1 1
run;

Thank you for your help!

 

1 ACCEPTED SOLUTION

Accepted Solutions
RichardDeVen
Barite | Level 11

The question subject is misstated as "Long to wide".  The question is really one of "Max over Group", or more generically "Result over Group"

 

The result over group for this question "presence within group".

A "DOW" loop is very effective for processing data sorted into BY groups, and outputting one row per group. (The technique can also be used to apply the computed result to each row in the group).

Example:

 

data want(keep=pt_id diabetes hypertension);
  do until (last.pt_id);
    SET have (rename=(diabetes=_dia hypertension=_htn));
    BY pt_id;

    diabetes = _dia OR diabetes;
    hypertension = _htn OR hypertension;
  end;
run;

The secret to the DOW loop is placing the SET and BY statements inside a DO LOOP.  An implicit OUTPUT occurs at the end of the step with the desired computed results.  The rename= is necessary because the programmer wants the aggregate result variables (diabetes/hypertension) to be the same name as the individual flag variables.

 

View solution in original post

7 REPLIES 7
unison
Lapis Lazuli | Level 10

Try proc sql -- you desire the max of each flag for each pt_id:

data have;
	input pt_id diabetes hypertension;
	datalines;
1 0 0
1 1 0
1 1 0
1 0 0
2 0 0
2 0 0
2 0 0
2 0 0
3 1 1
3 0 1
3 0 1
;
run;

data desired;
	input pt_id diabetes hypertension;
	datalines;
1 1 0
2 0 0
3 1 1
;
run;

*Solution;
proc sql;
	create table want as 
	select 
		pt_id
		, max(diabetes) as diabetes
		, max(hypertension) as hypertension 
	from have 
	group by pt_id 
	order by pt_id
	;
quit;

*Compare want to desired;
proc compare data=desired comp=want;
run;
-unison
sandrube
Fluorite | Level 6
Thank you very much!
andreas_lds
Jade | Level 19

Data and description don't match, please clarify: do you have 1=yes and 0=no or 1=yes and 2=no?

The follow step works if  0=no:

 

data want;
	set have;
	by pt_id;
	
	retain _diabetes _hypertension 0;
	
	if first.pt_id then do;
		_diabetes = diabetes;
		_hypertension = hypertension;
	end;
	else do;
		_diabetes = diabetes or _diabetes;
		_hypertension = hypertension or _hypertension;
	end;
	
	if last.pt_id then do;		
		output;
	end;
	
	drop diabetes hypertension;
	rename 
		_diabetes = diabetes
		_hypertension = hypertension
	;
run;
sandrube
Fluorite | Level 6
Than you very much!
Jagadishkatam
Amethyst | Level 16

The simple code below will work

 

proc sort data=have ;
by pt_id descending diabetes descending hypertension;
run;

data want;
set have;
by pt_id descending diabetes descending hypertension;
if first.pt_id;
run;
Thanks,
Jag
RichardDeVen
Barite | Level 11

The question subject is misstated as "Long to wide".  The question is really one of "Max over Group", or more generically "Result over Group"

 

The result over group for this question "presence within group".

A "DOW" loop is very effective for processing data sorted into BY groups, and outputting one row per group. (The technique can also be used to apply the computed result to each row in the group).

Example:

 

data want(keep=pt_id diabetes hypertension);
  do until (last.pt_id);
    SET have (rename=(diabetes=_dia hypertension=_htn));
    BY pt_id;

    diabetes = _dia OR diabetes;
    hypertension = _htn OR hypertension;
  end;
run;

The secret to the DOW loop is placing the SET and BY statements inside a DO LOOP.  An implicit OUTPUT occurs at the end of the step with the desired computed results.  The rename= is necessary because the programmer wants the aggregate result variables (diabetes/hypertension) to be the same name as the individual flag variables.

 

sandrube
Fluorite | Level 6
Thank you very much!!!

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1190 views
  • 2 likes
  • 5 in conversation