BookmarkSubscribeRSS Feed
Alexander2
Fluorite | Level 6

This one is a doozy, at least for me. I want to create a variable called closed_reason_1 for the most recent date within each product_id. Then I want to create variables called closed_reason_2, closed_reason_3 etc. for the second most recent date within that product_id, 3rd, etc. Here is example data:

 

product_id, date, closed_reason_c

1, 1/1/19, Traffic

1, 1/2/19, Bad day

1, 1/3/19, weather

2, 1/1/19, Traffic

2, 1/2/19, Bad day

2, 1/3/19, weather

 

So imagine a dataset that looks something like this, only there are millions of rows, with millions of product_ids, and millions of closed_reason_cs. So far I know I will need to create an array with a do loop but am stuck beyond that. How do I tell SAS how to find the most recent, second most recent, 3rd most recent, 4th most recent, etc. within each product_id, and have it go until it reaches the least recent date, and retrieve each respective closed_reason_c for those dates? Any help would be greatly appreciated. This is with SAS version 9.4.

3 REPLIES 3
Tom
Super User Tom
Super User

Why not just make a new counter variable instead.

data want ;
  set have ;
  by product descending date ;
  if first.product then reason_no=0;
  reason_no+1;
run;

Then if you did want to roll up into that WIDE format you could use PROC TRANSPOSE.

proc transpose data=want out=wide prefix=closed_reason_;
  by product ;
  id reason_no;
  var closed_reason_c;
run;
  
Reeza
Super User

In general, this isn't a great way to store data sets because now everything will likely need to be a loop and it's harder to identify things, like how many closed because of "Bad Weather" which is a very easy task to do in the long format. In the wide format, you first need to identify which records have bad weather since it can be in any and then sort. OR to figure out which months are the worse for bad weather closures. All easy to do in a long format and hard in a wide format. 

end rant.

 

Here's how you can do it though:

 

data have;
infile cards dlm=',';
informat dat ddmmyy10.;
format dat date9.;
input product_id $ dat closed_reason_c $;
cards;
1, 1/1/19, Traffic
1, 1/2/19, Bad day
1, 1/3/19, weather
2, 1/1/19, Traffic
2, 1/2/19, Bad day
2, 1/3/19, weather
;;;;
run;

*sort for dates so that most recent date is first;
proc sort data=have;
by product_id descending dat;
run;

*transpose dates;
proc transpose data=have out=wide_dates  prefix=Date_Close_;
by product_id;
var dat;
run;

*transpose reasons;
proc transpose data=have out=wide_reason prefix=Reason_Close_;
by product_id;
var closed_reason_c;
run;

data final;
merge wide_dates (drop=_name_) wide_reason (drop=_name_);
by product_id;
run;

*clean up and delete wide data sets;
/* proc sql; */
/* drop table wide_dates, wide_reason; */
/* quit; */

Full tutorial on this approach is outlined here. 

https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

There's also a custom written SAS macro that could do this in one step but if you're not familiar with macros this is easier for now.


Edit; added link


@Alexander2 wrote:

This one is a doozy, at least for me. I want to create a variable called closed_reason_1 for the most recent date within each product_id. Then I want to create variables called closed_reason_2, closed_reason_3 etc. for the second most recent date within that product_id, 3rd, etc. Here is example data:

 

product_id, date, closed_reason_c

1, 1/1/19, Traffic

1, 1/2/19, Bad day

1, 1/3/19, weather

2, 1/1/19, Traffic

2, 1/2/19, Bad day

2, 1/3/19, weather

 

So imagine a dataset that looks something like this, only there are millions of rows, with millions of product_ids, and millions of closed_reason_cs. So far I know I will need to create an array with a do loop but am stuck beyond that. How do I tell SAS how to find the most recent, second most recent, 3rd most recent, 4th most recent, etc. within each product_id, and have it go until it reaches the least recent date, and retrieve each respective closed_reason_c for those dates? Any help would be greatly appreciated. This is with SAS version 9.4.


 

ballardw
Super User

Can you describe exactly how you will use the resulting data set?

 

It may be that you don't need to re-arrange data in such a manner.

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 372 views
  • 2 likes
  • 4 in conversation