Re: creating new variables with dates

sophia_SAS · Posted 08-07-2012 03:12 PM

I would like to create new variables ("First_A, Second_A, Third_A, etc") that records the date of when the student received his/her first "A" (and second, third, etc). For example I have the below dataset.

data have;

input ID A_grade date_grade mmddyyy.; /*where A=1 and not-A=0*/

data lines;

1 1 05/06/2012

1 1 06/09/2010

1 0 06/11/2011

1 1 08/10/2012

2 1 02/16/2010

2 0 12/03/2011

3 0 19/11/2012

4 1 01/10/2010

4 1 07/28/2012

;;

run;

My final dataset should contain the following entries:

ID First_A Second_A Third_A

1 06/09/2010 05/06/2012 08/10/2012

2 02/16/2010 . .

4 01/10/2010 07/28/2012 .

Thank you!

Linlin · Posted 08-07-2012 03:21 PM

You need to rename the variables.

data have;

input ID A_grade date_grade mmddyy10.; /*where A=1 and not-A=0*/

cards;

1 1 05/06/2012

1 1 06/09/2010

1 0 06/11/2011

1 1 08/10/2012

2 1 02/16/2010

2 0 12/03/2011

3 0 19/11/2012

4 1 01/10/2010

4 1 07/28/2012

;

proc sort data=have(where=(a_grade=1));

by id date_grade;

run;

proc transpose data=have out=want(drop=_:) prefix=a_ ;

by id;

var date_grade;

format a_: mmddyy10.;

run;

proc print;run;

mkeintz · Posted 08-07-2012 03:23 PM

The tried and true way: two steps;

** Sort all the "A" grades by id and date **;

proc sort data=have out=have_a;

by id date;

where a=1;

run;

** Use DOW loop to put dates of A's into the corresponding A_DATE variable **;

data want (keep=id adate1-adate8);

do I=1 by 1 until (last.id);

set have_a;
by id;

array a_date {8} ;

format a_date: mmddyys10. ;

a_date{I}=date;

end;

run;

This program assumes no one has more than 8 A's, and that you don't mind having no records for students with no A's.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Linlin · Posted 08-07-2012 03:39 PM

or using array:

data have;
input ID A_grade date_grade mmddyy10.; /*where A=1 and not-A=0*/
format date_grade mmddyy10.;
cards;
1 1 05/06/2012
1 1 06/09/2010
1 0 06/11/2011
1 1 08/10/2012
2 1 02/16/2010
2 0 12/03/2011
3 0 19/11/2012
4 1 01/10/2010
4 1 07/28/2012
;
proc sort data=have(where=(a_grade=1));
by id date_grade;
run;

data temp(keep=id First_A Second_A Third_A);
   set have;
   retain First_A Second_A Third_A;
   by id;
   array _grade(*) First_A Second_A Third_A;
    if first.id then count=1;
   _grade(count)=date_grade;
   count+1;
   if last.id then do;output;
   call missing (of _grade(*));
   end;
   format First_A Second_A Third_A mmddyy10.;
proc print;run;
                       Obs    ID       First_A      Second_A       Third_A

                        1      1    06/09/2010    05/06/2012    08/10/2012
                        2      2    02/16/2010             .             .
                        3      4    01/10/2010    07/28/2012             .

Haikuo · Posted 08-07-2012 04:03 PM

If you don't want any restrictions on the quantity of 'A's that one ID can get, and you don't care about too much of naming convention 'first, second' etc, here is another approach based on Mark's code:

proc sort data=have(where=(a_grade=1));

by id date_grade;

run;

proc sql noprint;

select cats(max(ct)) into :ct from (select count(*) as ct from have group by id);

quit;

data want;

do _n_=1 by 1 until (last.id);

set have;

by id;

array a a_1-a_&ct;

format a_: mmddyy10.;

a(_n_)=date_grade;

end;

drop a_grade date_grade;

run;

Haikuo

creating new variables with dates