I have a dataset that I am trying to use a multi-dimensional array to transpose the data from 2 variables, however I am having issues because the array not only needs to be dynamic, but also allow me to group certain records together using multiple variables.
This is my source table:
id | action_date | status | status_date |
1 | 01/01/2024 | A | 01/01/2024 |
1 | 01/01/2024 | A | 02/01/2024 |
1 | 01/01/2024 | B | 03/01/2024 |
1 | 01/01/2024 | C | 04/01/2024 |
1 | 01/01/2024 | D | 05/01/2024 |
2 | 01/01/2024 | A | 01/01/2024 |
2 | 01/01/2024 | A | 02/01/2024 |
2 | 01/01/2024 | A | 03/01/2024 |
2 | 01/01/2024 | A | 04/01/2024 |
2 | 01/01/2024 | A | 05/01/2024 |
1 | 25/01/2024 | A | 25/01/2024 |
1 | 25/01/2024 | A | 26/01/2024 |
1 | 25/01/2024 | A | 27/01/2024 |
1 | 25/01/2024 | A | 28/01/2024 |
1 | 25/01/2024 | A | 29/01/2024 |
55 | 26/01/2024 | A | 26/01/2024 |
55 | 26/01/2024 | D | 27/01/2024 |
55 | 26/01/2024 | D | 28/01/2024 |
As you will see:
ID1 has 10 rows with 2 x different action_dates
ID2 has 5 rows with 1 x action date
ID55 only has 3 rows all with the same action date.
* Note: status and status_date are the variables I want to transpose into adjacent columns (i.e. status1, status_date1, status2, status_date2, status3, status_date3 etc.)
* Note: The same ID can appear more than once - I want these to be treated independently, grouping by the action_date for that ID.
* Note: Some ID's dont always have 5 rows which is why I need the array to be dynamic to ensure the output is consistent. (As displayed in the example below - if the ID only has 3 rows out of the maximum 5 I want it to populate the first 3 columns and leave the last 2 blank.)
This is my desired output:
id | action_date | status1 | status_date1 | status2 | status_date2 | status3 | status_date3 | status4 | status_date4 | status5 | status_date5 |
1 | 01/01/2024 | A | 01/01/2024 | A | 02/01/2024 | B | 03/01/2024 | C | 04/01/2024 | D | 05/01/2024 |
2 | 01/01/2024 | A | 01/01/2024 | A | 02/01/2024 | A | 03/01/2024 | A | 04/01/2024 | A | 05/01/2024 |
1 | 25/01/2024 | A | 25/01/2024 | A | 26/01/2024 | A | 27/01/2024 | A | 28/01/2024 | A | 28/01/2024 |
55 | 26/01/2024 | A | 26/01/2024 | D | 27/01/2024 | D | 28/01/2024 |
This is my starting array code.
data output;
set input;
array status[5] $2. status_1-status_5;
array status_day[5] status_day_1-status_day_5;
format status_day_1-status_day_5 date9.;
do i = 1 to 5;
status[i] = status;
status_day[i] = status_day;
end;
run;
The most safe way (most slow way) is using DOUBLE proc transpose . Of course,I tend to use it if you don't have a big table.
data have;
input id action_date :ddmmyy10. status $ status_date :ddmmyy10.;
format action_date status_date ddmmyy10. ;
datalines;
1 01/01/2024 A 01/01/2024
1 01/01/2024 A 02/01/2024
1 01/01/2024 B 03/01/2024
1 01/01/2024 C 04/01/2024
1 01/01/2024 D 05/01/2024
2 01/01/2024 A 01/01/2024
2 01/01/2024 A 02/01/2024
2 01/01/2024 A 03/01/2024
2 01/01/2024 A 04/01/2024
2 01/01/2024 A 05/01/2024
1 25/01/2024 A 25/01/2024
1 25/01/2024 A 26/01/2024
1 25/01/2024 A 27/01/2024
1 25/01/2024 A 28/01/2024
1 25/01/2024 A 29/01/2024
55 26/01/2024 A 26/01/2024
55 26/01/2024 D 27/01/2024
55 26/01/2024 D 28/01/2024
;
data temp;
set have;
by id action_date notsorted;
if first.action_date then n=0;
n+1;
run;
proc transpose data=temp out=temp2;
by id action_date n notsorted;
var status status_date;
run;
proc transpose data=temp2 out=want(drop=_name_);
by id action_date notsorted;
var col1;
id _name_ n;
run;
What is Status_day supposed to represent? You do not show it in your Input data set and then attempt to assign values of Status_day to other variables?
You can't have an array named Status as you already have a variable named Status.
data have; input id action_date :ddmmyy10. status $ status_date :ddmmyy10.; format action_date status_date ddmmyy10. ; datalines; 1 01/01/2024 A 01/01/2024 1 01/01/2024 A 02/01/2024 1 01/01/2024 B 03/01/2024 1 01/01/2024 C 04/01/2024 1 01/01/2024 D 05/01/2024 2 01/01/2024 A 01/01/2024 2 01/01/2024 A 02/01/2024 2 01/01/2024 A 03/01/2024 2 01/01/2024 A 04/01/2024 2 01/01/2024 A 05/01/2024 1 25/01/2024 A 25/01/2024 1 25/01/2024 A 26/01/2024 1 25/01/2024 A 27/01/2024 1 25/01/2024 A 28/01/2024 1 25/01/2024 A 29/01/2024 55 26/01/2024 A 26/01/2024 55 26/01/2024 D 27/01/2024 55 26/01/2024 D 28/01/2024 ; proc sort data=have; by id action_date status_date; run; data want; set have; by id action_date; array st[5] $2. status_1-status_5; array status_day[5] status_day_1-status_day_5; format status_day_1-status_day_5 date9.; retain status_1-status_5 status_day_1-status_day_5 counter; if first.action_date then counter=1; else counter+1; st[counter]=status; status_day[counter]=status_date; if last.action_date then do; output; call missing(of status_1-status_5, of status_day_1-status_day_5); end; keep id action_date status_1-status_5 status_day_1-status_day_5; run;
Arrays only exist for a single observation. Since you want to have the same array values across observations then all the values need to be RETAINed so they will persist. Which also means that you have to take responsibility for cleaning up the array values.
I sorted the data so there wasn't any ambiguity about ID Action_date combinations.
When you use a BY statement in a data step then SAS creates automatic variables, referenced with First.variablename and Last.Variable name that have values 1 (true) and 0 (false) indicating whether the current observation is the first or last of a by group. So those can be used to reset values at a change (the first of an Action_date group in this case for the variable counter used to indicate which row of data in the group is being processed) and to write out the result when the Last.Action_date as well as reset the arrays to missing after writing to the output set.
Now, what can you do with the second data set that you can't do with the first? Since the array had to be "dynamic" , isn't actually as you set the size to 5. Fine if you know that there will never be more than 5 status_date for an action_date but will require recoding everytime that changes.
More "dynamic"
proc transpose data=have out=transstatus(drop=_name_) prefix=Status_; by id action_date; var status; run; proc transpose data=have out=transdate(drop=_name_) prefix=Status_day_; by id action_date; var Status_date; run; data dynamicwant; merge transstatus transdate; by id action_date; run;
Long beats wide. Maxim 19
For most SAS purposes, the long data set is a better layout than the wide data set. Why? Because most SAS data analysis PROCs work better on long data sets, and work poorly or not at all on the wide data set. And because as pointed out above, the result has to be re-programmed if you have a different number of columns.
So what is the benefit of this transpose? Both @ballardw and I would like to know.
The most safe way (most slow way) is using DOUBLE proc transpose . Of course,I tend to use it if you don't have a big table.
data have;
input id action_date :ddmmyy10. status $ status_date :ddmmyy10.;
format action_date status_date ddmmyy10. ;
datalines;
1 01/01/2024 A 01/01/2024
1 01/01/2024 A 02/01/2024
1 01/01/2024 B 03/01/2024
1 01/01/2024 C 04/01/2024
1 01/01/2024 D 05/01/2024
2 01/01/2024 A 01/01/2024
2 01/01/2024 A 02/01/2024
2 01/01/2024 A 03/01/2024
2 01/01/2024 A 04/01/2024
2 01/01/2024 A 05/01/2024
1 25/01/2024 A 25/01/2024
1 25/01/2024 A 26/01/2024
1 25/01/2024 A 27/01/2024
1 25/01/2024 A 28/01/2024
1 25/01/2024 A 29/01/2024
55 26/01/2024 A 26/01/2024
55 26/01/2024 D 27/01/2024
55 26/01/2024 D 28/01/2024
;
data temp;
set have;
by id action_date notsorted;
if first.action_date then n=0;
n+1;
run;
proc transpose data=temp out=temp2;
by id action_date n notsorted;
var status status_date;
run;
proc transpose data=temp2 out=want(drop=_name_);
by id action_date notsorted;
var col1;
id _name_ n;
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.