- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to use Proc Transpose to restructure my dataset. I've been given an Excel file that contains dates, blood pressures and notes (fictitious data). The data read the long way (across) but I need to get the file to read down. The same date appears three times to because each blood pressure is recorded as systolic, diastolic and notes.
Name | 22-Apr | 22-Apr | 22-Apr | 22-May | 22-May | May-22 |
SBP | DBP | TI | SBP | DBP | TI | |
AAA | 120 | 60 | ||||
BBB | 130 | 70 | 140 | 80 | ACT | |
CCC | 165 | 90 | start ACE | |||
DDD | 115 | 70 | ||||
EEE | 135 | 75 | BCT |
This is how I'd like the dataset to look.
Name | Date | SBP |
Name | Date | DBP |
Name | Date | TI |
Name | Date | SBP |
Name | Date | DBP |
Name | Date | TI |
I'm a bit stuck because of the data headers being in two rows. I've been looking at numerous examples but can't figure out how to do this. And help would be appreciated. Thank you.
- Tags:
- excel
- proc transpose
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have;
array _dates(*) startVar--endVar; /*start of dates and end of dates*/
length measure $15.;
do i=1 to dim(_dates) by 3;
date = vname(_dates(i));
measure = "Systolic";
value = _dates(i); output;
measure = "Diastolic";
value = _dates(i+1); output;
measure = "Notes"; values=_dates(I+2); output;
end;
run;
I would probably actually recommend this structure myself so you're storing numeric and character data in different columns and can do calculations within a measurement.
data want;
set have;
array _dates(*) startVar -- endVar;
do i=1 to dim(_dates) by 3;
Date = vname(_dates(I));
Systolic = _dates(i);
Diastolic = _dates(i+1);
Notes = _dates(i+2);
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Rename the variables in the file. Then transpose it.
If you can trust that it's always SBP, DBP, TI in that order you can simplify this a bit but I'm not a trusting person when it comes to badly formatted data.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for you response. Assuming the SBP, DBP and TI are in the same order, how can I simplify this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data want;
set have;
array _dates(*) startVar--endVar; /*start of dates and end of dates*/
length measure $15.;
do i=1 to dim(_dates) by 3;
date = vname(_dates(i));
measure = "Systolic";
value = _dates(i); output;
measure = "Diastolic";
value = _dates(i+1); output;
measure = "Notes"; values=_dates(I+2); output;
end;
run;
I would probably actually recommend this structure myself so you're storing numeric and character data in different columns and can do calculations within a measurement.
data want;
set have;
array _dates(*) startVar -- endVar;
do i=1 to dim(_dates) by 3;
Date = vname(_dates(I));
Systolic = _dates(i);
Diastolic = _dates(i+1);
Notes = _dates(i+2);
output;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Reeza. I used your second suggestion and it worked nicely. Appreciate the help.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
An actual XLSX file? Or a text file like a CSV file? If the later it is much easier as you can then just read it in the way you want to start with.
With an XLSX file read the file twice.
Once to get the first row with the dates. If you know how many columns (or can at least put an upper bound on how many you will accept) you can use the RANGE option to limit it to reading just the first row. For example to get the first 26 columns use
proc import file="&fname" dbms=xlsx out=dates_wide replace;
range='$A1:Z1' ;
getnames=NO;
run;
To read the first 702 columns use $A1:ZZ1. To get the first 18,278 columns use $A1:ZZZ1.
The second time to get the actual data.
proc import file="&fname" dbms=xlsx out=wide replace;
range='$A2:' ;
getnames=YES;
run;
Now transpose both of them:
proc transpose data=dates_wide out=dates(keep=col1 rename=(col1=date));
var _all_;
run;
proc transpose data=wide out=tall(rename=(col1=value)) ;
by A ;
var _all_;
run;
Now you can merge them together copying the DATE value from the first row onto every observation in the TALL dataset by using the POINT= option of the SET statement.
data want;
set tall;
by A ;
point+1;
if first.A then point=1;
set dates point=point;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you Tom. I'm going to give this one a try as well.