Hello,
I want to transform the dataset HAVE into WANT.
I've tried using the following Array, but keep getting errors. I'm not very familiar with arrays and so having issues trying to troubleshoot the error message.
I'm not bound by the array, so any other method you can think off would be helpful. Only doing an array as it seems efficient given the datasets I will be working on will be quite large.
Cheers,
data HAVE;
input ord_veh _name_ $ Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ ;
datalines;
1 Capacity . . . . .
1 Weight . . . . .
2 Capacity . . . . .
2 Weight . . . . .
3 Capacity A A A A A
3 Weight B B B B B
;
run;
data want;
input ord_veh Capacity1 $ Capacity2 $ Capacity3 $ Capacity4 $ Capacity5 $ Weight1 $ Weight2 $ Weight3 $ Weight4 $ Weight5 $;
datalines;
1 . . . . . . . . . .
2 . . . . . . . . . .
3 A A A A A B B B B B
;
run;
data WANT;
set HAVE;
by ord_veh;
array capacity_array{5} $ Capacity1-Capacity5;
array weight_array{5} $ Weight1-Weight5;
if first.ord_veh then do;
do i = 1 to 5;
capacity_array{i} = .;
weight_array{i} = .;
end;
end;
if _name_ = "Capacity" then do;
capacity_array{_n_} = Col1;
end;
else if _name_ = "Weight" then do;
weight_array{_n_} = Col1;
end;
drop Col1 Col2 Col3 Col4 Col5 i;
run;
ERROR: Array subscript out of range at line 44 column 7.
Holy POOR DATA LAYOUT, Batman!!!
Here's how I would approach this (warning: I do not think your data set named WANT is the best way to go, this is explained below):
data want1;
set have(where=(_name_='Capacity'));
rename col1-col5=capacity1-capacity5;
run;
You would create want2 similarly, by doing similar to the above for _name_='Weight' (this step left to you to program)
Then merge want1 and want2 to produce your data set named want (this step left to you to program)
BUT — I STRONGLY ADVISE AGAINST THIS LAYOUT OF DATA
For almost all SAS work, this is a better layout of the data:
data want;
input ord_veh time_period capacity $ weight $ ;
cards;
1 1 . .
1 2 . .
1 3 . .
1 4 . .
1 5 . .
3 1 A B
3 2 A B
3 3 A B
3 4 A B
3 5 A B
;
I have left out Ord_veh=2 to save myself some typing. I have assumed variables col1-col5 are the values for five different time periods, but if they represent something else (not time), the layout of the data above is the same, just change time_period to some other variable name.
This is a much better arrangement of data for most SAS purposes. Almost all SAS PROCs are designed to use data in a long narrow layout rather than a wide layout. This layout of the data also eliminates the need to program arrays.
I have to say that I am a little curious as to what a Weight of B represents.
Is your original data always two lines per id? Does it come as a text file that you read to create your existing data structure?
You can read the data as desired, at least for this example:
data HAVE; input ord_veh @; array capacity(5) $; array Weigth (5) $; input +8 @; do i=1 to 5; input capacity[i] @; end; input; input @9 @; do i=1 to 5; input Weigth[i] @; end; input; drop i; datalines; 1 Capacity . . . . . 1 Weight . . . . . 2 Capacity . . . . . 2 Weight . . . . . 3 Capacity A A A A A 3 Weight B B B B B ; run;
Some tricks with @ to hold the input row during input. Then adjusting columns to start reading past the words "Capacity" or "Weight", an Input; to advance the read pointer to the next row. Then more input to read the second block.
Nothing is written to the data set until all of the input statements finish (and any other code that may exist). So this reads data from two rows before output.
HOWEVER if there are one or 3 lines for some reason per Ord_veh won't work.
If you have longer values of Ord_veh then the input +8 or @9 would need to be adjusted to account for those columns or read the types into a dummy variable discarded.
Note: Do not separate ERROR messages from the remainder of the code. Include the code from the proc or data step along with the errors from the log so things like the LINE numbers are included. We have no way to know which is your "line 44" in the error.
_n_ is an automatic counter of the iterations of the data step and as such is typically a poor variable to use for array index as it can easily exceed the defined size of an array. Your HAVE data set has 6 observations, so when it gets to ord_veh 3 and Weight then _n_ = 6 and both of your arrays are limited to an index value of 5. Perhaps you meant to use ORD_VEH as the index instead of _n_ .
The proposed data structure is still a bit awkward for many purposes.
Just use PROC TRANSPOSE and you don't need to worry about using ARRAYs.
First convert your not-wide and not-all dataset into a TALL dataset.
data HAVE;
input ord_veh _name_ $ Col1 $ Col2 $ Col3 $ Col4 $ Col5 $ ;
datalines;
1 Capacity . . . . .
1 Weight . . . . .
2 Capacity . . . . .
2 Weight . . . . .
3 Capacity A A A A A
3 Weight B B B B B
;
proc transpose data=have out=tall name=variable;
by ord_veh _name_ ;
var col1-col5;
run;
You could probably just stop here as this TALL structure is a much more useful format to work with.
But to get to your wanted dataset just add another TRANSPOSE.
proc transpose data=tall out=want(drop=_name_);
by ord_veh;
id _name_ variable;
var col1;
run;
If you don't like that the variables are named WEIGHTCOL`1 instead of WEIGHT1 then just add a data step to remove the text COL from the value of variable named VARIABLE.
data tall
set tall;
variable =substr(variable,4);
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.