BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barkat
Pyrite | Level 9

Hello,

I have the following dataset

 

data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. date1 :MMDDYY10. date2 :MMDDYY10. date3 :MMDDYY10. date4 :MMDDYY10. date5 :MMDDYY10.;
datalines;
abc,010,01/10/2017,03/20/2016,.,01/10/2017,.
def,023,.,.,04/20/2018,.,04/20/2018
pqr,125,08/10/2013,.,05/20/2012,.,.
xyz,200,01/12/2016,01/21/2016,05/10/2017,09/01/2018,03/02/2016
;
run;

 

I would like to have an output from the above that will look like the output of the following dataset. Please note that there will be no duplicate dates for an observation, and the remaining dates will be sorted and shifted to the left columns in ascending or descending order. Missing values will be shifted to the right columns.

 

I would also like to add a column that will count the number of dates of each observation. Suppose number of dates for abc is 2, number of dates for def is 1, number of dates for pqr is 2, and number of dates for xyz is 5.

 

data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. date1 :MMDDYY10. date2 :MMDDYY10. date3 :MMDDYY10. date4 :MMDDYY10. date5 :MMDDYY10.;
datalines;
abc,010,03/20/2016,01/10/2017,.,.,.
def,023,04/20/2018,.,.,.,.,.
pqr,125,05/20/2012,08/10/2013,.,.,.
xyz,200,01/12/2016,01/21/2016,03/02/2016,05/10/2017,09/01/2018
;
run;

 

Thanks a lot.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You can declare an array to hold sorted values of the original dates, then traverse that array copying non-duplicate values back to the original data varables:

 

data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. date1 :MMDDYY10. date2 :MMDDYY10. date3 :MMDDYY10. date4 :MMDDYY10. date5 :MMDDYY10.;
format date:  mmddyy10. ;
datalines;
abc,010,01/10/2017,03/20/2016,.,01/10/2017,.
def,023,.,.,04/20/2018,.,04/20/2018
pqr,125,08/10/2013,.,05/20/2012,.,.
xyz,200,01/12/2016,01/21/2016,05/10/2017,09/01/2018,03/02/2016
run;

data want (drop=d t nd tmpdats: );
  set have;
  array dats {*} date: ;
  array tmpdats  {20} ;

  nd=n(of dats{*});
  if nd>0 then do;
    do d=1 to nd;
      tmpdats{d}=smallest(d,of dats{*});
    end;    
    call missing(of dats{*});

    dats{1}=tmpdats{1};
	d=1;
    if nd>1 then do t=2 to nd;
      if tmpdats{t}=tmpdats{t-1} then continue;
      d=d+1;
      dats{d}=tmpdats{t};
    end;
  end;
run;

The choice of 20 for the size of the tmpdats array is arbitrary.  Just chose a number as least as large as the number of original data variables.

 

The "if tmpdats{t}=tmpdats{t-1} then continue" check for consecutive duplicates.  If they are duplicate, then "CONTINUE" says to go to the next iteration of the do loop, thereby incrementing only t (indexing tmpdats), but not d (indexing the result array DATS).  d is incremented only when non-duplicates are found.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

5 REPLIES 5
andreas_lds
Jade | Level 19

Try:

proc transpose data=have out=transposed(drop=_name_);
	by name id;
	var date:;
run;

proc sort data=transposed(where=(not missing(col1))) out=sorted nodupkey;
	by name id col1;
run;

proc transpose data=sorted out=pre_want(drop=_name_) prefix=date;
	by name id;
	var col1;
run;

data want;
	set pre_want;
	
	array dates date:;	
	non_missing_dates = dim(dates) - nmiss(of dates[*]);
run;

Please note that there could be less than five date-variables in dataset "want", if not at least one name/id has five not missing and unique dates in dataset "have".

 

EDIT: Fixed missing creation of the variable containing the number of non-missing-dates.

mkeintz
PROC Star

You can declare an array to hold sorted values of the original dates, then traverse that array copying non-duplicate values back to the original data varables:

 

data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. date1 :MMDDYY10. date2 :MMDDYY10. date3 :MMDDYY10. date4 :MMDDYY10. date5 :MMDDYY10.;
format date:  mmddyy10. ;
datalines;
abc,010,01/10/2017,03/20/2016,.,01/10/2017,.
def,023,.,.,04/20/2018,.,04/20/2018
pqr,125,08/10/2013,.,05/20/2012,.,.
xyz,200,01/12/2016,01/21/2016,05/10/2017,09/01/2018,03/02/2016
run;

data want (drop=d t nd tmpdats: );
  set have;
  array dats {*} date: ;
  array tmpdats  {20} ;

  nd=n(of dats{*});
  if nd>0 then do;
    do d=1 to nd;
      tmpdats{d}=smallest(d,of dats{*});
    end;    
    call missing(of dats{*});

    dats{1}=tmpdats{1};
	d=1;
    if nd>1 then do t=2 to nd;
      if tmpdats{t}=tmpdats{t-1} then continue;
      d=d+1;
      dats{d}=tmpdats{t};
    end;
  end;
run;

The choice of 20 for the size of the tmpdats array is arbitrary.  Just chose a number as least as large as the number of original data variables.

 

The "if tmpdats{t}=tmpdats{t-1} then continue" check for consecutive duplicates.  If they are duplicate, then "CONTINUE" says to go to the next iteration of the do loop, thereby incrementing only t (indexing tmpdats), but not d (indexing the result array DATS).  d is incremented only when non-duplicates are found.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Barkat
Pyrite | Level 9
@mkeintz:
I was trying to understand the purpose of each statements in this program. Could you share some documents/materials, so I can learn this type of programming, please.
Kurt_Bremser
Super User
proc transpose
  data=have
  out=want1 (
    drop=_name_
    rename=(col1=date)
    where=(date ne .)
  )
;
by name id;
var date:;
run;

proc sort data=want1 nodupkey;
by name id date;
run;

proc sql;
create table want2 as
  select
    name,
    id,
    count(*) as count
  from want1
  group by name, id
;
quit;

I did not re-transpose to wide on purpose; the "long" dataset layout is always easier to work with, and it avoids waste of storage for missing values.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 987 views
  • 3 likes
  • 4 in conversation