BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
JenMMerc
Fluorite | Level 6

Good morning,

I have a line list of data (HAVE) and want to transform it (merge, modify?) so that there is a single personID with associated info (WANT, below). I know there has got to be a simple way to do this. Value of the Incident number is not always chronologically associated with the Incident Date and when the data transpose there will need to be multiple Incident and Incident Date vars. Please advise. Thank you, Jenny

HAVE  
Person IDIncidentIncident Date
A1016/3/2020
A1028/3/2020
A10110/2/2020
B2008/5/2020
C3008/12/2020
D4006/10/2020
D4016/15/2020
WANT      
Person IDIncident1Incident Date1Incident2Incident Date2Incident3Incident Date3
A1016/3/20201028/3/202010110/2/2020
B2008/5/2020    
C3008/12/2020    
D4006/10/20204016/15/2020  
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@JenMMerc wrote:

It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?

 


How does a wide format help with the highlighted text?

You need to carefully define some terms. Your first person has two incident ids, the latest date is associated with the same incident id as the first with a different id sandwiched between the two dates.

So is incidentid = 102 the "most current" , or do you just want to determine the intervals between the first record and each following record for the same person?

That is much easier in long form. The following assumes your data is sorted appropriately and that your dates are actually SAS numeric values with a format such as mmddyy10 applied. (Really suspect you are NOT showing us SAS data but from a spreadsheet or other document because the SAS versions would have leading zeroes for day of the month and the month number when the value is less than 10 and variables would not have spaces in the names).

data want;
   set have;
   by personid;
   retain firstdate;
   format firstdate date9.;
   if first.personid then firstdate=incidentdate;
   else daysfromfirst = incidentdate-firstdate;
run;

If you want an interval other than days that is possible as well with the intck function.

 

 

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

My usual advice is to leave your data set long and not to transform it. Why? Because most analyses in SAS are easier to program from a long data set, and tedious and difficult to program if you make the data set wide like you are asking.


What analysis are you going to do after you do this transformation?

--
Paige Miller
JenMMerc
Fluorite | Level 6

It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?

 

Reeza
Super User
Wouldn't that be the largest date minus the smallest date, which is the Range of a variable? You can calculate that automatically without aggregating or doing any manipulations for each personID. If you need a different grouping you can modify the BY or CLASS statement.

proc means data=have range noprint;
by personID;
var incidentDate;
output out=want range=duration;
run;
PaigeMiller
Diamond | Level 26

It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident

This is very easy without transforming your data (as explained by @Reeza ) And by the way, this is an analysis.

 

The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that.

This data cleaning can be done without transforming the data, and I don't see how transforming the data improves the situation. 

--
Paige Miller
ballardw
Super User

@JenMMerc wrote:

It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?

 


How does a wide format help with the highlighted text?

You need to carefully define some terms. Your first person has two incident ids, the latest date is associated with the same incident id as the first with a different id sandwiched between the two dates.

So is incidentid = 102 the "most current" , or do you just want to determine the intervals between the first record and each following record for the same person?

That is much easier in long form. The following assumes your data is sorted appropriately and that your dates are actually SAS numeric values with a format such as mmddyy10 applied. (Really suspect you are NOT showing us SAS data but from a spreadsheet or other document because the SAS versions would have leading zeroes for day of the month and the month number when the value is less than 10 and variables would not have spaces in the names).

data want;
   set have;
   by personid;
   retain firstdate;
   format firstdate date9.;
   if first.personid then firstdate=incidentdate;
   else daysfromfirst = incidentdate-firstdate;
run;

If you want an interval other than days that is possible as well with the intck function.

 

 

Reeza
Super User

Both of the tutorials below have examples for what you're trying to do. You can use either the data step or transpose approach. For a beginner, I suspect the transpose approach is much easier to understand and follow though a bit tedious. 

 

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/



@JenMMerc wrote:

Good morning,

I have a line list of data (HAVE) and want to transform it (merge, modify?) so that there is a single personID with associated info (WANT, below). I know there has got to be a simple way to do this. Value of the Incident number is not always chronologically associated with the Incident Date and when the data transpose there will need to be multiple Incident and Incident Date vars. Please advise. Thank you, Jenny

HAVE    
Person ID Incident Incident Date
A 101 6/3/2020
A 102 8/3/2020
A 101 10/2/2020
B 200 8/5/2020
C 300 8/12/2020
D 400 6/10/2020
D 401 6/15/2020
WANT            
Person ID Incident1 Incident Date1 Incident2 Incident Date2 Incident3 Incident Date3
A 101 6/3/2020 102 8/3/2020 101 10/2/2020
B 200 8/5/2020        
C 300 8/12/2020        
D 400 6/10/2020 401 6/15/2020    




SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1316 views
  • 1 like
  • 5 in conversation