Solved: TRANSFORMING DATA

JenMMerc · Posted 12-08-2020 12:57 PM

Good morning,

I have a line list of data (HAVE) and want to transform it (merge, modify?) so that there is a single personID with associated info (WANT, below). I know there has got to be a simple way to do this. Value of the Incident number is not always chronologically associated with the Incident Date and when the data transpose there will need to be multiple Incident and Incident Date vars. Please advise. Thank you, Jenny

HAVE
Person ID	Incident	Incident Date
A	101	6/3/2020
A	102	8/3/2020
A	101	10/2/2020
B	200	8/5/2020
C	300	8/12/2020
D	400	6/10/2020
D	401	6/15/2020

WANT
Person ID	Incident1	Incident Date1	Incident2	Incident Date2	Incident3	Incident Date3
A	101	6/3/2020	102	8/3/2020	101	10/2/2020
B	200	8/5/2020
C	300	8/12/2020
D	400	6/10/2020	401	6/15/2020

ballardw · Posted 12-08-2020 02:11 PM

@JenMMerc wrote:

It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?

How does a wide format help with the highlighted text?

You need to carefully define some terms. Your first person has two incident ids, the latest date is associated with the same incident id as the first with a different id sandwiched between the two dates.

So is incidentid = 102 the "most current" , or do you just want to determine the intervals between the first record and each following record for the same person?

That is much easier in long form. The following assumes your data is sorted appropriately and that your dates are actually SAS numeric values with a format such as mmddyy10 applied. (Really suspect you are NOT showing us SAS data but from a spreadsheet or other document because the SAS versions would have leading zeroes for day of the month and the month number when the value is less than 10 and variables would not have spaces in the names).

data want;
   set have;
   by personid;
   retain firstdate;
   format firstdate date9.;
   if first.personid then firstdate=incidentdate;
   else daysfromfirst = incidentdate-firstdate;
run;

If you want an interval other than days that is possible as well with the intck function.

View solution in original post

PaigeMiller · Posted 12-08-2020 01:01 PM

My usual advice is to leave your data set long and not to transform it. Why? Because most analyses in SAS are easier to program from a long data set, and tedious and difficult to program if you make the data set wide like you are asking.

What analysis are you going to do after you do this transformation?

--
Paige Miller

JenMMerc · Posted 12-08-2020 01:11 PM

It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?

Reeza · Posted 12-08-2020 01:18 PM

Wouldn't that be the largest date minus the smallest date, which is the Range of a variable? You can calculate that automatically without aggregating or doing any manipulations for each personID. If you need a different grouping you can modify the BY or CLASS statement.

proc means data=have range noprint;
by personID;
var incidentDate;
output out=want range=duration;
run;

PaigeMiller · Posted 12-08-2020 01:19 PM

It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident

This is very easy without transforming your data (as explained by @Reeza ) And by the way, this is an analysis.

The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that.

This data cleaning can be done without transforming the data, and I don't see how transforming the data improves the situation.

--
Paige Miller

ballardw · Posted 12-08-2020 02:11 PM

@JenMMerc wrote:

It's not for an analysis. I am trying to identify individuals and their speciifc incidents and time of incidents from a large data set. I need to determine the time period from the initial incident to the most current incident. The data are not entered into the database consistently (for any PersonID a new incident could be entered under the old incident number with a new date or be entered as a new incident number) and I am trying to account for that. Each separate incident (whether the number is the same as a previous incident number or a new incident number) needs to be associated with the PersonID. Does that make sense?

How does a wide format help with the highlighted text?

You need to carefully define some terms. Your first person has two incident ids, the latest date is associated with the same incident id as the first with a different id sandwiched between the two dates.

So is incidentid = 102 the "most current" , or do you just want to determine the intervals between the first record and each following record for the same person?

That is much easier in long form. The following assumes your data is sorted appropriately and that your dates are actually SAS numeric values with a format such as mmddyy10 applied. (Really suspect you are NOT showing us SAS data but from a spreadsheet or other document because the SAS versions would have leading zeroes for day of the month and the month number when the value is less than 10 and variables would not have spaces in the names).

data want;
   set have;
   by personid;
   retain firstdate;
   format firstdate date9.;
   if first.personid then firstdate=incidentdate;
   else daysfromfirst = incidentdate-firstdate;
run;

If you want an interval other than days that is possible as well with the intck function.

Reeza · Posted 12-08-2020 01:06 PM

Both of the tutorials below have examples for what you're trying to do. You can use either the data step or transpose approach. For a beginner, I suspect the transpose approach is much easier to understand and follow though a bit tedious.

Transposing data tutorials:
Long to Wide:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-long-to-wide-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-long-to-wide-using-the-data-step/

@JenMMerc wrote:

Good morning,

I have a line list of data (HAVE) and want to transform it (merge, modify?) so that there is a single personID with associated info (WANT, below). I know there has got to be a simple way to do this. Value of the Incident number is not always chronologically associated with the Incident Date and when the data transpose there will need to be multiple Incident and Incident Date vars. Please advise. Thank you, Jenny

HAVE

Person ID Incident Incident Date

A 101 6/3/2020

A 102 8/3/2020

A 101 10/2/2020

B 200 8/5/2020

C 300 8/12/2020

D 400 6/10/2020

D 401 6/15/2020

WANT

Person ID Incident1 Incident Date1 Incident2 Incident Date2 Incident3 Incident Date3

A 101 6/3/2020 102 8/3/2020 101 10/2/2020

B 200 8/5/2020

C 300 8/12/2020

D 400 6/10/2020 401 6/15/2020

Ksharp · Posted 12-09-2020 08:40 AM

"Transpose Datset by MERGE"
http://support.sas.com/resources/papers/proceedings15/2785-2015.pdf

TRANSFORMING DATA

Re: TRANSFORMING DATA

Re: TRANSFORMING DATA

Re: TRANSFORMING DATA

Re: TRANSFORMING DATA

Re: TRANSFORMING DATA

Re: TRANSFORMING DATA

Re: TRANSFORMING DATA

Re: TRANSFORMING DATA

Registration is open