I have a rather large dataset in long format that I need wide. I tried proc transpose, but all my numeric variables were changed to character. 1) I don't even know how to change them back and don't want to do so individually anyway as there are a lot of variables 2) is there a way to transpose without this happening?
Please see data (in reality there are about 63 variables, character and numeric, like vascular measures, yes/no questions, more physical measures, etc). Any solution I find involving an array seems to require typing out all variables....
event_name | stid | income | educ | sex | race | healthp | ht | wt | sbp | dbp | smoke |
baseline | 100 | 1 | 2 | 1 | 1 | 5 | 66 | 140.1 | 144 | 98 | 1 |
year_2 | 100 | 1 | 2 | 1 | 1 | 4 | 66 | 150.1 | 144 | 98 | 1 |
baseline | 200 | 3 | 3 | 2 | 4 | 3 | 62.5 | 130.2 | 121 | 73 | 2 |
year_2 | 200 | 3 | 3 | 2 | 4 | 2 | 62.5 | 125.8 | 120 | 86 | 2 |
baseline | 300 | 1 | 4 | 1 | 2 | 1 | 64 | 150.1 | 129 | 77 | 2 |
year_2 | 300 | 1 | 4 | 1 | 2 | 3 | 64 | 145.6 | 114 | 74 | 2 |
/* trying proc transpose*/
proc transpose data=longsort out=out1;
by stID event_name;
var income -- dbp;
run;
proc transpose data=out1 delimiter=_ out=new2(drop=_name_);
by stid;
var col1;
id _name_ event_name;
run;
/* proc contents shows all are now character variables even the ones supposed to be numeric */
So to go from semi-wide to really wide you can use a double transpose. If you have mixed numeric/character data then do it in two part and then merge them.
data have;
input event_name :$32. stid $ income $ educ $ sex $ race $ healthp $ ht wt sbp dbp smoke $;
cards;
baseline 100 1 2 1 1 5 66 140.1 144 98 1
year_2 100 1 2 1 1 4 66 150.1 144 98 1
baseline 200 3 3 2 4 3 62.5 130.2 121 73 2
year_2 200 3 3 2 4 2 62.5 125.8 120 86 2
baseline 300 1 4 1 2 1 64 150.1 129 77 2
year_2 300 1 4 1 2 3 64 145.6 114 74 2
;
proc sort data=have;
by stid event_name;
run;
proc transpose data=have out=num;
by stid event_name ;
var _numeric_;
run;
proc transpose data=num out=num_wide delimiter=_;
by stid ;
where lowcase(_name_) not in ('stid' 'event_name');
id _name_ event_name;
var col1;
run;
proc transpose data=have out=char;
by stid event_name ;
var _character_;
run;
proc transpose data=char out=char_wide delimiter=_;
by stid ;
where lowcase(_name_) not in ('stid' 'event_name');
id _name_ event_name;
var col1;
run;
data want;
merge num_wide char_wide ;
by stid ;
drop _name_;
run;
Are you showing us the transposed data or the un-transposed data?
I have a rather large dataset in long format that I need wide
Usually converting long to wide is not a good idea, although there are exceptions. Why do you need it to be wide?
I thought that in order to run some longitudinal tests, I needed it in wide data? A simple example would be like to compare weight from baseline and year 2, I'd need 2 columns, instead of observation at baseline in 1 row and row 2 is the observation at year 2.
I'm showing un-transposed. I wanted new columns with like ht_baseline, ht_year_2, bp_baseline, bp_year_2, etc. But the transpose code i originally posted changes all numeric vars to character.
var firstvar-character-lastvar;
var firstvar-numeric-lastvar;
One TRANSPOSE for character and another for numeric.
Read up on "SAS Variable Lists".
But you are already starting with a WIDE dataset.
What is it you want to create? Something WIDER? How?
So to go from semi-wide to really wide you can use a double transpose. If you have mixed numeric/character data then do it in two part and then merge them.
data have;
input event_name :$32. stid $ income $ educ $ sex $ race $ healthp $ ht wt sbp dbp smoke $;
cards;
baseline 100 1 2 1 1 5 66 140.1 144 98 1
year_2 100 1 2 1 1 4 66 150.1 144 98 1
baseline 200 3 3 2 4 3 62.5 130.2 121 73 2
year_2 200 3 3 2 4 2 62.5 125.8 120 86 2
baseline 300 1 4 1 2 1 64 150.1 129 77 2
year_2 300 1 4 1 2 3 64 145.6 114 74 2
;
proc sort data=have;
by stid event_name;
run;
proc transpose data=have out=num;
by stid event_name ;
var _numeric_;
run;
proc transpose data=num out=num_wide delimiter=_;
by stid ;
where lowcase(_name_) not in ('stid' 'event_name');
id _name_ event_name;
var col1;
run;
proc transpose data=have out=char;
by stid event_name ;
var _character_;
run;
proc transpose data=char out=char_wide delimiter=_;
by stid ;
where lowcase(_name_) not in ('stid' 'event_name');
id _name_ event_name;
var col1;
run;
data want;
merge num_wide char_wide ;
by stid ;
drop _name_;
run;
Thank you! This is what I was looking for. I know it seems excessively wide, but I didn't know how else to prepare the data to compare baseline measures vs year 2.
@eawhit10 wrote:
Thank you! This is what I was looking for. I know it seems excessively wide, but I didn't know how else to prepare the data to compare baseline measures vs year 2.
Change from baseline calculations are easier in the original format.
data want;
set have;
by id;
if visit='Baseline' then wt_baseline=wt;
if first.id the call missing(wt_baseline);
retain wt_baseline;
if not missing(wt_baseline) then wt_change=wt-wt_baseline;
run;
data have;
input event_name :$32. stid $ income $ educ $ sex $ race $ healthp $ ht wt sbp dbp smoke $;
cards;
baseline 100 1 2 1 1 5 66 140.1 144 98 1
year_2 100 1 2 1 1 4 66 150.1 144 98 1
baseline 200 3 3 2 4 3 62.5 130.2 121 73 2
year_2 200 3 3 2 4 2 62.5 125.8 120 86 2
baseline 300 1 4 1 2 1 64 150.1 129 77 2
year_2 300 1 4 1 2 3 64 145.6 114 74 2
;
proc sort data=have;
by stid event_name;
run;
proc transpose data=have(obs=0) out=vname;
var _all_;
run;
proc sql;
create table level as
select *
from (select distinct event_name from have),
(select * from vname where lowcase(_name_) not in ('stid' 'event_name'))
order by 1;
quit;
data _null_;
set level end=last;
by event_name;
if _n_=1 then call execute('data want;merge ');
if first.event_name then call execute(catt('have(where=(event_name="',event_name,'") rename=('));
call execute(catt(_name_,'=',_name_,'_',event_name));
if last.event_name then call execute('))');
if last then call execute(';by stid;drop event_name;run;');
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.