Solved: Long to wide format

sandrube · Posted 04-22-2020 12:05 AM

I have patients with multiple rows and variables diabetes (1=yes and 2=no) and hypertension (1=yes and 2=no). For each patient, if it is 1 in any row, it should be considered as 1. Please see the initial data set:

data have;
input pt_id	diabetes hypertension;
datalines;
1 0 0
1 1 0
1 1 0
1 0 0
2 0 0
2 0 0
2 0 0
2 0 0
3 1 1
3 0 1
3 0 1
run;

Final data set should look like this:

data have;
input pt_id	diabetes hypertension;
datalines;
1 1 0
2 0 0
3 1 1
run;

Thank you for your help!

RichardDeVen · Posted 04-24-2020 12:36 PM

The question subject is misstated as "Long to wide". The question is really one of "Max over Group", or more generically "Result over Group"

The result over group for this question "presence within group".

A "DOW" loop is very effective for processing data sorted into BY groups, and outputting one row per group. (The technique can also be used to apply the computed result to each row in the group).

Example:

data want(keep=pt_id diabetes hypertension);
  do until (last.pt_id);
    SET have (rename=(diabetes=_dia hypertension=_htn));
    BY pt_id;

    diabetes = _dia OR diabetes;
    hypertension = _htn OR hypertension;
  end;
run;

The secret to the DOW loop is placing the SET and BY statements inside a DO LOOP. An implicit OUTPUT occurs at the end of the step with the desired computed results. The rename= is necessary because the programmer wants the aggregate result variables (diabetes/hypertension) to be the same name as the individual flag variables.

View solution in original post

unison · Posted 04-22-2020 02:01 AM

Try proc sql -- you desire the max of each flag for each pt_id:

data have;
	input pt_id diabetes hypertension;
	datalines;
1 0 0
1 1 0
1 1 0
1 0 0
2 0 0
2 0 0
2 0 0
2 0 0
3 1 1
3 0 1
3 0 1
;
run;

data desired;
	input pt_id diabetes hypertension;
	datalines;
1 1 0
2 0 0
3 1 1
;
run;

*Solution;
proc sql;
	create table want as 
	select 
		pt_id
		, max(diabetes) as diabetes
		, max(hypertension) as hypertension 
	from have 
	group by pt_id 
	order by pt_id
	;
quit;

*Compare want to desired;
proc compare data=desired comp=want;
run;

-unison

sandrube · Posted 04-24-2020 08:27 AM

Thank you very much!

andreas_lds · Posted 04-22-2020 02:59 AM

Data and description don't match, please clarify: do you have 1=yes and 0=no or 1=yes and 2=no?

The follow step works if 0=no:

data want;
	set have;
	by pt_id;
	
	retain _diabetes _hypertension 0;
	
	if first.pt_id then do;
		_diabetes = diabetes;
		_hypertension = hypertension;
	end;
	else do;
		_diabetes = diabetes or _diabetes;
		_hypertension = hypertension or _hypertension;
	end;
	
	if last.pt_id then do;		
		output;
	end;
	
	drop diabetes hypertension;
	rename 
		_diabetes = diabetes
		_hypertension = hypertension
	;
run;

sandrube · Posted 04-24-2020 08:27 AM

Than you very much!

Jagadishkatam · Posted 04-22-2020 04:27 AM

The simple code below will work

proc sort data=have ;
by pt_id descending diabetes descending hypertension;
run;

data want;
set have;
by pt_id descending diabetes descending hypertension;
if first.pt_id;
run;

Thanks,
Jag

RichardDeVen · Posted 04-24-2020 12:36 PM

The question subject is misstated as "Long to wide". The question is really one of "Max over Group", or more generically "Result over Group"

The result over group for this question "presence within group".

A "DOW" loop is very effective for processing data sorted into BY groups, and outputting one row per group. (The technique can also be used to apply the computed result to each row in the group).

Example:

data want(keep=pt_id diabetes hypertension);
  do until (last.pt_id);
    SET have (rename=(diabetes=_dia hypertension=_htn));
    BY pt_id;

    diabetes = _dia OR diabetes;
    hypertension = _htn OR hypertension;
  end;
run;

The secret to the DOW loop is placing the SET and BY statements inside a DO LOOP. An implicit OUTPUT occurs at the end of the step with the desired computed results. The rename= is necessary because the programmer wants the aggregate result variables (diabetes/hypertension) to be the same name as the individual flag variables.

sandrube · Posted 04-26-2020 01:21 PM

Thank you very much!!!

Long to wide format

Re: Long to wide format

Re: Long to wide format

Re: Long to wide format

Re: Long to wide format

Re: Long to wide format

Re: Long to wide format

Re: Long to wide format

Re: Long to wide format

SAS Innovate 2025: Call for Content