BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
sasheadache
Obsidian | Level 7

I have a dataset that I am trying to use a multi-dimensional array to transpose the data from 2 variables, however I am having issues because the array not only needs to be dynamic, but also allow me to group certain records together using multiple variables.

 

This is my source table:

id  action_date  status  status_date 
1 01/01/2024 A  01/01/2024 
1 01/01/2024 A  02/01/2024 
1 01/01/2024 B  03/01/2024
1 01/01/2024 C  04/01/2024
1 01/01/2024 D  05/01/2024
2 01/01/2024 A  01/01/2024 
2 01/01/2024 A  02/01/2024 
2 01/01/2024 A  03/01/2024
2 01/01/2024 A  04/01/2024
2 01/01/2024 A  05/01/2024
1 25/01/2024 A  25/01/2024 
1 25/01/2024 A  26/01/2024 
1 25/01/2024 A  27/01/2024
1 25/01/2024 A  28/01/2024
1 25/01/2024 A  29/01/2024
55 26/01/2024 A 26/01/2024
55 26/01/2024 D 27/01/2024
55 26/01/2024 D 28/01/2024

 

As you will see:

ID1 has 10 rows with 2 x different action_dates

ID2 has 5 rows with 1 x action date

ID55 only has 3 rows all with the same action date.

 

* Note: status and status_date are the variables I want to transpose into adjacent columns (i.e. status1, status_date1, status2, status_date2, status3, status_date3 etc.)  

* Note: The same ID can appear more than once - I want these to be treated independently, grouping by the action_date for that ID.

* Note: Some ID's dont always have 5 rows which is why I need the array to be dynamic to ensure the output is consistent. (As displayed in the example below - if the ID only has 3 rows out of the maximum 5 I want it to populate the first 3 columns and leave the last 2 blank.)

 

This is my desired output:

id  action_date  status1  status_date1 status2  status_date2 status3  status_date3 status4  status_date4 status5  status_date5
1 01/01/2024 A  01/01/2024 A  02/01/2024  B  03/01/2024 C 04/01/2024 D 05/01/2024
2 01/01/2024 A  01/01/2024 A  02/01/2024  A  03/01/2024 A 04/01/2024 A 05/01/2024
1 25/01/2024 A 25/01/2024 A 26/01/2024 A 27/01/2024 A 28/01/2024 A 28/01/2024
55 26/01/2024 A 26/01/2024 D 27/01/2024 D 28/01/2024        

 

This is my starting array code.

 

data output; 
	set input; 
	array status[5] $2. status_1-status_5; 
	array status_day[5] status_day_1-status_day_5; 
	format status_day_1-status_day_5 date9.;

	do i = 1 to 5; 
		status[i] = status;
		status_day[i] = status_day;
	end; 

run; 
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

The most safe way (most slow way) is using DOUBLE proc transpose . Of course,I tend to use it if you don't have a big table.

data have;
  input id  	action_date :ddmmyy10. 	status $ 	 status_date :ddmmyy10.;
  format action_date  status_date ddmmyy10. ;
datalines;
1 	01/01/2024 	A 	 01/01/2024 
1 	01/01/2024 	A 	 02/01/2024 
1 	01/01/2024 	B 	 03/01/2024
1 	01/01/2024 	C 	 04/01/2024
1 	01/01/2024 	D 	 05/01/2024
2 	01/01/2024 	A 	 01/01/2024 
2 	01/01/2024 	A 	 02/01/2024 
2 	01/01/2024 	A 	 03/01/2024
2 	01/01/2024 	A 	 04/01/2024
2 	01/01/2024 	A 	 05/01/2024
1 	25/01/2024 	A 	 25/01/2024 
1 	25/01/2024 	A 	 26/01/2024 
1 	25/01/2024 	A 	 27/01/2024
1 	25/01/2024 	A 	 28/01/2024
1 	25/01/2024 	A 	 29/01/2024
55 	26/01/2024 	A 	26/01/2024
55 	26/01/2024 	D 	27/01/2024
55 	26/01/2024 	D 	28/01/2024
;
data temp;
 set have;
by id action_date notsorted;
if first.action_date then n=0;
n+1;
run;
proc transpose data=temp out=temp2;
by id action_date n notsorted;
var 	status 	 status_date;
run;
proc transpose data=temp2 out=want(drop=_name_);
by id action_date notsorted;
var col1;
id _name_ n;
run;

View solution in original post

4 REPLIES 4
ballardw
Super User

What is Status_day supposed to represent? You do not show it in your Input data set and then attempt to assign values of Status_day to other variables?

 

You can't have an array named Status as  you already have a variable named Status.

 

data have;
  input id  	action_date :ddmmyy10. 	status $ 	 status_date :ddmmyy10.;
  format action_date  status_date ddmmyy10. ;
datalines;
1 	01/01/2024 	A 	 01/01/2024 
1 	01/01/2024 	A 	 02/01/2024 
1 	01/01/2024 	B 	 03/01/2024
1 	01/01/2024 	C 	 04/01/2024
1 	01/01/2024 	D 	 05/01/2024
2 	01/01/2024 	A 	 01/01/2024 
2 	01/01/2024 	A 	 02/01/2024 
2 	01/01/2024 	A 	 03/01/2024
2 	01/01/2024 	A 	 04/01/2024
2 	01/01/2024 	A 	 05/01/2024
1 	25/01/2024 	A 	 25/01/2024 
1 	25/01/2024 	A 	 26/01/2024 
1 	25/01/2024 	A 	 27/01/2024
1 	25/01/2024 	A 	 28/01/2024
1 	25/01/2024 	A 	 29/01/2024
55 	26/01/2024 	A 	26/01/2024
55 	26/01/2024 	D 	27/01/2024
55 	26/01/2024 	D 	28/01/2024
;

proc sort data=have;
  by id action_date status_date;
run;

data want; 
	set have; 
   by id action_date;
	array st[5] $2. status_1-status_5; 
	array status_day[5] status_day_1-status_day_5; 
	format status_day_1-status_day_5 date9.;
   retain  status_1-status_5  status_day_1-status_day_5 counter;
   if first.action_date then counter=1;
   else counter+1;
   st[counter]=status;
   status_day[counter]=status_date;
   if last.action_date then do; 
      output;
      call missing(of status_1-status_5, of status_day_1-status_day_5);
   end;
   keep id action_date  status_1-status_5  status_day_1-status_day_5;
run;

Arrays only exist for a single observation. Since you want to have the same array values across observations then all the values need to be RETAINed so they will persist. Which also means that you have to take responsibility for cleaning up the array values.

 

I sorted the data so there wasn't any ambiguity about ID Action_date combinations.

When you use a BY statement in a data step then SAS creates automatic variables, referenced with First.variablename and Last.Variable name that have values 1 (true) and 0 (false) indicating whether the current observation is the first or last of a by group. So those can be used to reset values at a change (the first of an Action_date group in this case for the variable counter used to indicate which row of data in the group is being processed) and to write out the result when the Last.Action_date as well as reset the arrays to missing after writing to the output set.

 

Now, what can you do with the second data set that you can't do with the first? Since the array had to be "dynamic" , isn't actually as you set the size to 5. Fine if you know that there will never be more than 5 status_date for an action_date but will require recoding everytime that changes.

More "dynamic"

proc transpose data=have out=transstatus(drop=_name_)
      prefix=Status_;
   by id action_date;
   var status;
run;


proc transpose data=have out=transdate(drop=_name_)
      prefix=Status_day_;
   by id action_date;
   var Status_date;
run;


data dynamicwant;
   merge transstatus  transdate;
   by id action_date;
run;
PaigeMiller
Diamond | Level 26

Long beats wide. Maxim 19

 

For most SAS purposes, the long data set is a better layout than the wide data set. Why? Because most SAS data analysis PROCs work better on long data sets, and work poorly or not at all on the wide data set. And because as pointed out above, the result has to be re-programmed if you have a different number of columns.

 

So what is the benefit of this transpose? Both @ballardw and I would like to know.

--
Paige Miller
Ksharp
Super User

The most safe way (most slow way) is using DOUBLE proc transpose . Of course,I tend to use it if you don't have a big table.

data have;
  input id  	action_date :ddmmyy10. 	status $ 	 status_date :ddmmyy10.;
  format action_date  status_date ddmmyy10. ;
datalines;
1 	01/01/2024 	A 	 01/01/2024 
1 	01/01/2024 	A 	 02/01/2024 
1 	01/01/2024 	B 	 03/01/2024
1 	01/01/2024 	C 	 04/01/2024
1 	01/01/2024 	D 	 05/01/2024
2 	01/01/2024 	A 	 01/01/2024 
2 	01/01/2024 	A 	 02/01/2024 
2 	01/01/2024 	A 	 03/01/2024
2 	01/01/2024 	A 	 04/01/2024
2 	01/01/2024 	A 	 05/01/2024
1 	25/01/2024 	A 	 25/01/2024 
1 	25/01/2024 	A 	 26/01/2024 
1 	25/01/2024 	A 	 27/01/2024
1 	25/01/2024 	A 	 28/01/2024
1 	25/01/2024 	A 	 29/01/2024
55 	26/01/2024 	A 	26/01/2024
55 	26/01/2024 	D 	27/01/2024
55 	26/01/2024 	D 	28/01/2024
;
data temp;
 set have;
by id action_date notsorted;
if first.action_date then n=0;
n+1;
run;
proc transpose data=temp out=temp2;
by id action_date n notsorted;
var 	status 	 status_date;
run;
proc transpose data=temp2 out=want(drop=_name_);
by id action_date notsorted;
var col1;
id _name_ n;
run;
sasheadache
Obsidian | Level 7
Thanks! (Turns out I could have used proc transpose in the way I initially wanted before I ventured into Array's)

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 300 views
  • 1 like
  • 4 in conversation