01-09-2013 10:03 AM
I have yearly data including a variable for dates of Event1. The same person could have experienced Event1 multiple times and on different dates. Right now the data are like this:
Person A Date1
Person B Date1
Person A Date2
Person A Date3
Person C Date1
Person B Date2
However, I'd like it to be like this:
Person A Date1 Date2 Date3
Person B Date1 Date2 -------
Person C Date 1 ------- --------
Any help is much appreciated!
01-09-2013 10:32 AM
INPUT participant $ date: mmddyy8.;
FORMAT date date9.;
PROC SORT DATA = have;
BY participant date;
PROC TRANSPOSE DATA = have OUT = have2;
SET have2 (RENAME = (COL1 = date1 COL2 = Date2 COL3 = Date3));
01-09-2013 10:43 AM
I think the code below should work, I havent tested it so some adjustment may be required:
*find max number of dates a person can have;
create table temp as
select person, count(*) as cnt
group by person
select into max(cnt) into :maxcnt
*sort data by person and date;
proc sort data=data; by person date;
*use an array to collapse observations;
by person date;
array dates (&maxcnt) date1 - date&maxcnt ;
if first.person then do i = 1 to &maxcnt;
dates(i) = .;
row = i;
01-09-2013 11:41 AM
Just a few notes ...
Either approach is viable.
Katie's solution works as is, although it could eliminate the final step by adding prefix=DATE to the PROC TRANSPOSE.
Esjackso's solution has the right idea, but needs a tiny bit of tweaking. Instead of row=i, it should read row=1. And there should be another statement added after the END statement: else row + 1;
Finally, you may not need to do this at all. Depending on what you are ultimately trying to accomplish, there are plenty of programming techniques that will let you work with the data in its current form. In fact, if all you need is a report, you should be able to use PROC REPORT with (I think?) the ACROSS feature. If you are willing to spell out a little more about where you are headed after transposing the data, I'm sure you'll get plenty of suggestions.
01-09-2013 01:14 PM
Here are a couple of ideas to consider. While you might not use them this time around, these are basic, necessary tools to know in the long run.
If only the first and last dates are important, you can get them into a summary data set:
proc summary data=have nway;
output out=stats min=earliest_date max=latest_date;
Studying up on CLASS, NWAY, and the structure of the output data set, will be an important part of your learning.
If you need to examine each pair of dates, from earliest to latest:
proc sort data=have;
by participant date;
by participant date;
prior_date = lag(date);
if first.participant=0 then do;
* add statements to make a comparison;
More studying: the impact of a BY statement in a DATA step (vital), the LAG function (not as important, but useful in this case and trickier than it looks).