data reshaping

nbora · Posted 12-12-2024 03:27 PM

Hi,

my have data is the following:

data have;
     input col_a col_b $ col_c :date9.;
datalines;
123 123456789ABC . 
123 123456789DEF 15AUG2014
123 123456789GHI 30SEP2014 
456 987654321ABC 20DEC2013
456 987654321DEF 31JAN2014
456 987654321GHI 02FEB2014
;

I would like to reshape this data where the values of col_b are transposed to new columns and the minimum of the date column per col_a is taken as date column. I tried multiple solution suggestions I found here in the communities but they didn't work out.

data want;
     input col_a col_b_1 $ col_b_2 $ col_b_3 $ col_c :date9.;
datalines;
123 123456789ABC 123456789DEF 123456789GHI 15AUG2014
456 987654321ABC 987654321DEF 987654321GHI 20DEC2013
;

Any help would be highly appreciated.

ballardw · Posted 12-12-2024 03:48 PM

First thing is do you really need that new data set? As in, how do you intend to use the result?

Often changing from "long" to "wide" is not a good choice for most analysis or reporting in SAS.

This works for your example data:

Proc transpose data=have out=trans (drop=_name_)
   prefix=Col_b_;
   by col_a;
   var col_b;
run;
proc summary data=have nway;
   by col_a;
   var col_c;
   output out=summary(drop=_:) min=;
run;

data want;
   merge trans summary ;
   by col_a;
run;

However there might be some issues around desired results if any of the Col_b are duplicated within the value of col_a of if the order of the Col_b values in the output needs to be in a specific order different than order of appearance in the Have dataset.

The Proc summary has a drop dataset option because there would be other variables included in the output set

PaigeMiller · Posted 12-12-2024 04:32 PM

Reshaping the data like this only make programming the next step(s) harder. Don't do it.

Tell us what you are going to do with this data after you re-shape it, and we can give you better ideas about how to handle this data.

--
Paige Miller

Tom · Posted 12-12-2024 06:06 PM

PROC SUMMARY can do that directly (as long as the number of values per BY group is small enough).

proc summary data=have ;
  by col_a;
  output out=want(drop=_type_ _freq_)  idgroup(out[3] (col_b)=) min(col_c)=;
run;

If you don't know if 3 is the right number of new COL_B variables then count first and put the number into macro variable.

proc sql noprint;
 select max(nobs) into :nobs trimmed
   from (select count(*) as nobs from have group by col_a)
 ;
quit;

proc summary data=have ;
  by col_a;
  output out=want(drop=_type_ _freq_)  idgroup(out[&nobs] (col_b)=) min(col_c)=;
run;

Read this classic paper: https://support.sas.com/resources/papers/proceedings10/102-2010.pdf

data reshaping

Re: data reshaping

Re: data reshaping

Re: data reshaping

data reshaping

Re: data reshaping

Re: data reshaping

Re: data reshaping

Registration is open