Solved: Re: transpose long data to wide while keeping numeric data

eawhit10 · Posted 12-06-2021 09:18 AM

I have a rather large dataset in long format that I need wide. I tried proc transpose, but all my numeric variables were changed to character. 1) I don't even know how to change them back and don't want to do so individually anyway as there are a lot of variables 2) is there a way to transpose without this happening?

Please see data (in reality there are about 63 variables, character and numeric, like vascular measures, yes/no questions, more physical measures, etc). Any solution I find involving an array seems to require typing out all variables....

event_name	stid	income	educ	sex	race	healthp	ht	wt	sbp	dbp	smoke
baseline	100	1	2	1	1	5	66	140.1	144	98	1
year_2	100	1	2	1	1	4	66	150.1	144	98	1
baseline	200	3	3	2	4	3	62.5	130.2	121	73	2
year_2	200	3	3	2	4	2	62.5	125.8	120	86	2
baseline	300	1	4	1	2	1	64	150.1	129	77	2
year_2	300	1	4	1	2	3	64	145.6	114	74	2

/* trying proc transpose*/

proc transpose data=longsort out=out1;
by stID event_name;
var income -- dbp;
run;
proc transpose data=out1 delimiter=_ out=new2(drop=_name_);
by stid;
var col1;
id _name_ event_name;
run;
/* proc contents shows all are now character variables even the ones supposed to be numeric */

Tom · Posted 12-06-2021 10:40 AM

So to go from semi-wide to really wide you can use a double transpose. If you have mixed numeric/character data then do it in two part and then merge them.

data have;
  input event_name :$32. stid $ income $ educ $ sex $ race $ healthp $ ht wt sbp dbp smoke $;
cards;
baseline 100 1 2 1 1 5 66 140.1 144 98 1
year_2 100 1 2 1 1 4 66 150.1 144 98 1
baseline 200 3 3 2 4 3 62.5 130.2 121 73 2
year_2 200 3 3 2 4 2 62.5 125.8 120 86 2
baseline 300 1 4 1 2 1 64 150.1 129 77 2
year_2 300 1 4 1 2 3 64 145.6 114 74 2
;
proc sort data=have;
  by stid event_name;
run;

proc transpose data=have out=num;
  by stid event_name ; 
  var _numeric_;
run;
proc transpose data=num out=num_wide delimiter=_;
  by stid ;
  where lowcase(_name_) not in ('stid'  'event_name');
  id _name_ event_name;
  var col1;
run;

proc transpose data=have out=char;
  by stid event_name ; 
  var _character_;
run;
proc transpose data=char out=char_wide delimiter=_;
  by stid ;
  where lowcase(_name_) not in ('stid'  'event_name');
  id _name_ event_name;
  var col1;
run;

data want;
  merge num_wide char_wide ;
  by stid ; 
  drop _name_;
run;

View solution in original post

PaigeMiller · Posted 12-06-2021 09:47 AM

Are you showing us the transposed data or the un-transposed data?

I have a rather large dataset in long format that I need wide

Usually converting long to wide is not a good idea, although there are exceptions. Why do you need it to be wide?

--
Paige Miller

eawhit10 · Posted 12-06-2021 11:09 AM

I thought that in order to run some longitudinal tests, I needed it in wide data? A simple example would be like to compare weight from baseline and year 2, I'd need 2 columns, instead of observation at baseline in 1 row and row 2 is the observation at year 2.

eawhit10 · Posted 12-06-2021 11:11 AM

I'm showing un-transposed. I wanted new columns with like ht_baseline, ht_year_2, bp_baseline, bp_year_2, etc. But the transpose code i originally posted changes all numeric vars to character.

data_null__ · Posted 12-06-2021 09:53 AM

var firstvar-character-lastvar;
var firstvar-numeric-lastvar;

One TRANSPOSE for character and another for numeric. Read up on "SAS Variable Lists".

Tom · Posted 12-06-2021 10:29 AM

But you are already starting with a WIDE dataset.

What is it you want to create? Something WIDER? How?

Tom · Posted 12-06-2021 10:40 AM

So to go from semi-wide to really wide you can use a double transpose. If you have mixed numeric/character data then do it in two part and then merge them.

data have;
  input event_name :$32. stid $ income $ educ $ sex $ race $ healthp $ ht wt sbp dbp smoke $;
cards;
baseline 100 1 2 1 1 5 66 140.1 144 98 1
year_2 100 1 2 1 1 4 66 150.1 144 98 1
baseline 200 3 3 2 4 3 62.5 130.2 121 73 2
year_2 200 3 3 2 4 2 62.5 125.8 120 86 2
baseline 300 1 4 1 2 1 64 150.1 129 77 2
year_2 300 1 4 1 2 3 64 145.6 114 74 2
;
proc sort data=have;
  by stid event_name;
run;

proc transpose data=have out=num;
  by stid event_name ; 
  var _numeric_;
run;
proc transpose data=num out=num_wide delimiter=_;
  by stid ;
  where lowcase(_name_) not in ('stid'  'event_name');
  id _name_ event_name;
  var col1;
run;

proc transpose data=have out=char;
  by stid event_name ; 
  var _character_;
run;
proc transpose data=char out=char_wide delimiter=_;
  by stid ;
  where lowcase(_name_) not in ('stid'  'event_name');
  id _name_ event_name;
  var col1;
run;

data want;
  merge num_wide char_wide ;
  by stid ; 
  drop _name_;
run;

eawhit10 · Posted 12-06-2021 11:39 AM

Thank you! This is what I was looking for. I know it seems excessively wide, but I didn't know how else to prepare the data to compare baseline measures vs year 2.

Tom · Posted 12-06-2021 12:08 PM

@eawhit10 wrote:

Thank you! This is what I was looking for. I know it seems excessively wide, but I didn't know how else to prepare the data to compare baseline measures vs year 2.

Change from baseline calculations are easier in the original format.

data want;
  set have;
  by id;
  if visit='Baseline' then wt_baseline=wt;
  if first.id the call missing(wt_baseline);
  retain wt_baseline;
  if not missing(wt_baseline) then wt_change=wt-wt_baseline;
run;

Ksharp · Posted 12-07-2021 06:44 AM

data have;
  input event_name :$32. stid $ income $ educ $ sex $ race $ healthp $ ht wt sbp dbp smoke $;
cards;
baseline 100 1 2 1 1 5 66 140.1 144 98 1
year_2 100 1 2 1 1 4 66 150.1 144 98 1
baseline 200 3 3 2 4 3 62.5 130.2 121 73 2
year_2 200 3 3 2 4 2 62.5 125.8 120 86 2
baseline 300 1 4 1 2 1 64 150.1 129 77 2
year_2 300 1 4 1 2 3 64 145.6 114 74 2
;
proc sort data=have;
  by stid event_name;
run;
proc transpose data=have(obs=0) out=vname;
var _all_;
run;
proc sql;
create table level as
select *
 from (select distinct event_name from have),
 (select * from vname where lowcase(_name_) not in ('stid'  'event_name'))
 order by 1;
quit;
data _null_;
 set level end=last;
 by event_name;
 if _n_=1 then call execute('data want;merge ');
 if first.event_name then call execute(catt('have(where=(event_name="',event_name,'")  rename=('));
 call execute(catt(_name_,'=',_name_,'_',event_name));
 if last.event_name then call execute('))');
 if last then call execute(';by stid;drop event_name;run;');
run;

Registration is open

SAS Training: Just a Click Away