I have a raw data file that contains data in row-form that I need as columns. The file has 23 columns and 61 rows as-is. The first two are identifiers, which I need to keep as they are. The third column contains each of the column names I need post-transposition and the rest of the columns contain the values that correspond with the post-transposition variable found in the third column of each row. I need to transpose this entire file so that instead of having 23 columns and 61 rows, it has 61 columns and 20 rows (the two identifiers and one variable name column would not be observations), but I don't think SAS will let me read the data and do it with a PROC TRANSPOSE because the formats of each column as-is is inconsistent. I've attached a sample of the data with a few rows as reference.
I am not entirely sure I understand the exact format by which you expect your output to conform, but here is what my guess is that you are trying to achieve.
data have;
infile cards dlm='09'x;
input id (name var data1-data20) (:$20.);
cards;
7647 Dummy Date 11-Jun 11-Mar 10-Dec 10-Sep 10-Jun 10-Mar 9-Dec 9-Sep 9-Jun 9-Mar 8-Dec 8-Sep 8-Jun 8-Mar 7-Dec 7-Sep 7-Jun 7-Mar 6-Dec 6-Sep
7647 Dummy Cash 374,999.00 355,305.00 344,476.00 421,880.00 419,419.00 362,088.00 335,474.00 363,165.00 350,761.00 350,637.00 124,905.00 138,088.00 153,846.00 169,608.00 183,856.00 178,935.00 180,308.00 182,232.00 185,859.00 178,840.00
7647 Dummy Credit 903,945.00 892,582.00 898,555.00 890,329.00 910,922.00 929,207.00 862,928.00 878,434.00 908,463.00 947,960.00 908,375.00 922,330.00 853,334.00 858,979.00 864,756.00 784,002.00 749,575.00 714,901.00 697,474.00 660,277.00
7647 Dummy Supplies 22,567.00 27,059.00 37,032.00 35,536.00 40,547.00 37,694.00 46,079.00 42,035.00 52,795.00 41,942.00 33,281.00 29,246.00 593 494 351 192 108 89 69 76
7647 Dummy Other @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA
7647 Dummy Total @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA
7647 Dummy Equipment @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA
;
run;
proc transpose
data=have
out=want;
by id name var;
id var;
var data1-data20;
run;
Obs id name _NAME_ Date Cash Credit Supplies Other Total Equipment
1 7647 Dummy data1 11-Jun 374,999.00 903,945.00 22,567.00 @NA @NA @NA
2 7647 Dummy data2 11-Mar 355,305.00 892,582.00 27,059.00 @NA @NA @NA
3 7647 Dummy data3 10-Dec 344,476.00 898,555.00 37,032.00 @NA @NA @NA
4 7647 Dummy data4 10-Sep 421,880.00 890,329.00 35,536.00 @NA @NA @NA
5 7647 Dummy data5 10-Jun 419,419.00 910,922.00 40,547.00 @NA @NA @NA
6 7647 Dummy data6 10-Mar 362,088.00 929,207.00 37,694.00 @NA @NA @NA
7 7647 Dummy data7 9-Dec 335,474.00 862,928.00 46,079.00 @NA @NA @NA
8 7647 Dummy data8 9-Sep 363,165.00 878,434.00 42,035.00 @NA @NA @NA
9 7647 Dummy data9 9-Jun 350,761.00 908,463.00 52,795.00 @NA @NA @NA
10 7647 Dummy data10 9-Mar 350,637.00 947,960.00 41,942.00 @NA @NA @NA
11 7647 Dummy data11 8-Dec 124,905.00 908,375.00 33,281.00 @NA @NA @NA
12 7647 Dummy data12 8-Sep 138,088.00 922,330.00 29,246.00 @NA @NA @NA
13 7647 Dummy data13 8-Jun 153,846.00 853,334.00 593 @NA @NA @NA
14 7647 Dummy data14 8-Mar 169,608.00 858,979.00 494 @NA @NA @NA
15 7647 Dummy data15 7-Dec 183,856.00 864,756.00 351 @NA @NA @NA
16 7647 Dummy data16 7-Sep 178,935.00 784,002.00 192 @NA @NA @NA
17 7647 Dummy data17 7-Jun 180,308.00 749,575.00 108 @NA @NA @NA
18 7647 Dummy data18 7-Mar 182,232.00 714,901.00 89 @NA @NA @NA
19 7647 Dummy data19 6-Dec 185,859.00 697,474.00 69 @NA @NA @NA
20 7647 Dummy data20 6-Sep 178,840.00 660,277.00 76 @NA @NA @NA
That's the way I want the output to show up, but I want to read-in the file instead of creating a dataset in SAS. Using an infile statement or Proc Import will suffice. Will the fact that reading the data in before transposing it will result in mixed variable formatting cause problems?
The proc import procedure or whatever other method you would utilize would likely determine the input column formats as character, just as I have here. The fact that they are mixed on the input file is why. To adjust to the proper formats you would need to perform additional actions in a datastep after the transpose to fix it.
filename tmp temp;
data _null_;
infile cards missover;
input data $512.;
file tmp;
put _infile_;
cards;
7647 Dummy Date 11-Jun 11-Mar 10-Dec 10-Sep 10-Jun 10-Mar 9-Dec 9-Sep 9-Jun 9-Mar 8-Dec 8-Sep 8-Jun 8-Mar 7-Dec 7-Sep 7-Jun 7-Mar 6-Dec 6-Sep
7647 Dummy Cash 374,999.00 355,305.00 344,476.00 421,880.00 419,419.00 362,088.00 335,474.00 363,165.00 350,761.00 350,637.00 124,905.00 138,088.00 153,846.00 169,608.00 183,856.00 178,935.00 180,308.00 182,232.00 185,859.00 178,840.00
7647 Dummy Credit 903,945.00 892,582.00 898,555.00 890,329.00 910,922.00 929,207.00 862,928.00 878,434.00 908,463.00 947,960.00 908,375.00 922,330.00 853,334.00 858,979.00 864,756.00 784,002.00 749,575.00 714,901.00 697,474.00 660,277.00
7647 Dummy Supplies 22,567.00 27,059.00 37,032.00 35,536.00 40,547.00 37,694.00 46,079.00 42,035.00 52,795.00 41,942.00 33,281.00 29,246.00 593 494 351 192 108 89 69 76
7647 Dummy Other @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA
7647 Dummy Total @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA
7647 Dummy Equipment @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA @NA
;
run;
proc import
datafile=tmp
out=have
dbms=dlm
replace;
getnames=no;
delimiter='09'x;
run;
proc transpose
data=have
out=want(drop=_:)
prefix=xyz;
by var1 var2;
id var3;
var var4-var23;
run;
data want;
set want(rename=( var1=id var2=name xyzother=other xyztotal=total xyzequipment=equipment));
cash=input(xyzcash,comma10.);
credit=input(xyzcredit,comma10.);
supplies=input(xyzsupplies,comma10.);
xyzdate=strip(xyzdate) || '-2010';
date=input(xyzdate,date11.);
format cash credit supplies dollar12.2;
drop xyz:;
run;
If you table is small enough then why not just rotate it in Excel before reading it into SAS?
Tom,
I never thought I'd ever be saying something nice about Excel (other than I really do like the simplicity of its graph routines), but I had never seen its transpose feature before. Definitely a nice routine.
Avocado,
The one thing that the Excel transpose doesn't take care of is the first two columns. They don't have variable names and there aren't enough of them to spread across the various rows. Are they simply to be duplicated as done in FriedEgg's code?
It seems like doing it in Excel is probably the most efficient approach, as this will only have to be done a few times. The transpose feature in Excel basically does exactly what I need, so no reason to over-complicate things.
If the data are like the excel demo but in a plain CSV file, a DATA STEP can deliver the required result with little code. (not easy to create demo on iphone) - more later
my mis-judgement:
1
using @ and # to read from multiple lines would only be effective for a fixed layout (all raw data columns having the same width)
2
for a straightforward data step variabvle types and lengths must be able to be defined before runtime, so inforrmation only collected with input statements cannot affect the structure (names, types, lengths) of the output data set
so at least two steps would be needed
and so it will require more than a "little code"
peter
Hi ... here's another idea, transpose the spreadsheet using the libname engine then use SQL to assign variable names and types ...
libname x 'z:\rawdata.xls' getnames=no scantime=no;
proc transpose data=x.'sheet1$'n out=new;
var f4-f20;
by f1 f2;
run;
libname x clear;
proc sql;
create table xyz as
select f1 as id, f2 as name, col7 as date,
input(col6,comma10.) as cash,
input(col5,comma10.) as credit,
input(col4,comma10.) as supplies,
col3 as other, col2 as total, col1 as equipment
from new;
quit;
If those "@NA" variables are really supposed to be missing numeric data ...
proc format;
invalue fix @NA = . other=[comma10.];
value fixed . = '@NA' other=[10.];
run;
proc sql;
create table xyz as
select f1 as id, f2 as name, col7 as date,
input(col6,comma10.) as cash,
input(col5,comma10.) as credit,
input(col4,comma10.) as supplies,
input(col3,fix.) as other format=fixed.,
input(col2,fix.) as total format=fixed.,
input(col1,fix.) as equipment format=fixed.
from new;
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.