Good morning,
I have a line list of data (HAVE) and want to transform it (merge, modify?) so that there is a single personID with associated info (WANT, below). I know there has got to be a simple way to do this. Value of the Incident number is not always chronologically associated with the Incident Date and when the data transpose there will need to be multiple Incident and Incident Date vars. Please advise. Thank you, Jenny
HAVE | ||
Person ID | Incident | Incident Date |
A | 101 | 6/3/2020 |
A | 102 | 8/3/2020 |
A | 101 | 10/2/2020 |
B | 200 | 8/5/2020 |
C | 300 | 8/12/2020 |
D | 400 | 6/10/2020 |
D | 401 | 6/15/2020 |
WANT | ||||||
Person ID | Incident1 | Incident Date1 | Incident2 | Incident Date2 | Incident3 | Incident Date3 |
A | 101 | 6/3/2020 | 102 | 8/3/2020 | 101 | 10/2/2020 |
B | 200 | 8/5/2020 | ||||
C | 300 | 8/12/2020 | ||||
D | 400 | 6/10/2020 | 401 | 6/15/2020 |
@JenMMerc wrote:
It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?
How does a wide format help with the highlighted text?
You need to carefully define some terms. Your first person has two incident ids, the latest date is associated with the same incident id as the first with a different id sandwiched between the two dates.
So is incidentid = 102 the "most current" , or do you just want to determine the intervals between the first record and each following record for the same person?
That is much easier in long form. The following assumes your data is sorted appropriately and that your dates are actually SAS numeric values with a format such as mmddyy10 applied. (Really suspect you are NOT showing us SAS data but from a spreadsheet or other document because the SAS versions would have leading zeroes for day of the month and the month number when the value is less than 10 and variables would not have spaces in the names).
data want; set have; by personid; retain firstdate; format firstdate date9.; if first.personid then firstdate=incidentdate; else daysfromfirst = incidentdate-firstdate; run;
If you want an interval other than days that is possible as well with the intck function.
My usual advice is to leave your data set long and not to transform it. Why? Because most analyses in SAS are easier to program from a long data set, and tedious and difficult to program if you make the data set wide like you are asking.
What analysis are you going to do after you do this transformation?
It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?
It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident
This is very easy without transforming your data (as explained by @Reeza ) And by the way, this is an analysis.
The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that.
This data cleaning can be done without transforming the data, and I don't see how transforming the data improves the situation.
@JenMMerc wrote:
It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?
How does a wide format help with the highlighted text?
You need to carefully define some terms. Your first person has two incident ids, the latest date is associated with the same incident id as the first with a different id sandwiched between the two dates.
So is incidentid = 102 the "most current" , or do you just want to determine the intervals between the first record and each following record for the same person?
That is much easier in long form. The following assumes your data is sorted appropriately and that your dates are actually SAS numeric values with a format such as mmddyy10 applied. (Really suspect you are NOT showing us SAS data but from a spreadsheet or other document because the SAS versions would have leading zeroes for day of the month and the month number when the value is less than 10 and variables would not have spaces in the names).
data want; set have; by personid; retain firstdate; format firstdate date9.; if first.personid then firstdate=incidentdate; else daysfromfirst = incidentdate-firstdate; run;
If you want an interval other than days that is possible as well with the intck function.
Both of the tutorials below have examples for what you're trying to do. You can use either the data step or transpose approach. For a beginner, I suspect the transpose approach is much easier to understand and follow though a bit tedious.
Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/
https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/
@JenMMerc wrote:
Good morning,
I have a line list of data (HAVE) and want to transform it (merge, modify?) so that there is a single personID with associated info (WANT, below). I know there has got to be a simple way to do this. Value of the Incident number is not always chronologically associated with the Incident Date and when the data transpose there will need to be multiple Incident and Incident Date vars. Please advise. Thank you, Jenny
HAVE Person ID Incident Incident Date A 101 6/3/2020 A 102 8/3/2020 A 101 10/2/2020 B 200 8/5/2020 C 300 8/12/2020 D 400 6/10/2020 D 401 6/15/2020
WANT Person ID Incident1 Incident Date1 Incident2 Incident Date2 Incident3 Incident Date3 A 101 6/3/2020 102 8/3/2020 101 10/2/2020 B 200 8/5/2020 C 300 8/12/2020 D 400 6/10/2020 401 6/15/2020
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.