Hello,
I have the following dataset
data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. date1 :MMDDYY10. date2 :MMDDYY10. date3 :MMDDYY10. date4 :MMDDYY10. date5 :MMDDYY10.;
datalines;
abc,010,01/10/2017,03/20/2016,.,01/10/2017,.
def,023,.,.,04/20/2018,.,04/20/2018
pqr,125,08/10/2013,.,05/20/2012,.,.
xyz,200,01/12/2016,01/21/2016,05/10/2017,09/01/2018,03/02/2016
;
run;
I would like to have an output from the above that will look like the output of the following dataset. Please note that there will be no duplicate dates for an observation, and the remaining dates will be sorted and shifted to the left columns in ascending or descending order. Missing values will be shifted to the right columns.
I would also like to add a column that will count the number of dates of each observation. Suppose number of dates for abc is 2, number of dates for def is 1, number of dates for pqr is 2, and number of dates for xyz is 5.
data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. date1 :MMDDYY10. date2 :MMDDYY10. date3 :MMDDYY10. date4 :MMDDYY10. date5 :MMDDYY10.;
datalines;
abc,010,03/20/2016,01/10/2017,.,.,.
def,023,04/20/2018,.,.,.,.,.
pqr,125,05/20/2012,08/10/2013,.,.,.
xyz,200,01/12/2016,01/21/2016,03/02/2016,05/10/2017,09/01/2018
;
run;
Thanks a lot.
You can declare an array to hold sorted values of the original dates, then traverse that array copying non-duplicate values back to the original data varables:
data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. date1 :MMDDYY10. date2 :MMDDYY10. date3 :MMDDYY10. date4 :MMDDYY10. date5 :MMDDYY10.;
format date: mmddyy10. ;
datalines;
abc,010,01/10/2017,03/20/2016,.,01/10/2017,.
def,023,.,.,04/20/2018,.,04/20/2018
pqr,125,08/10/2013,.,05/20/2012,.,.
xyz,200,01/12/2016,01/21/2016,05/10/2017,09/01/2018,03/02/2016
run;
data want (drop=d t nd tmpdats: );
set have;
array dats {*} date: ;
array tmpdats {20} ;
nd=n(of dats{*});
if nd>0 then do;
do d=1 to nd;
tmpdats{d}=smallest(d,of dats{*});
end;
call missing(of dats{*});
dats{1}=tmpdats{1};
d=1;
if nd>1 then do t=2 to nd;
if tmpdats{t}=tmpdats{t-1} then continue;
d=d+1;
dats{d}=tmpdats{t};
end;
end;
run;
The choice of 20 for the size of the tmpdats array is arbitrary. Just chose a number as least as large as the number of original data variables.
The "if tmpdats{t}=tmpdats{t-1} then continue" check for consecutive duplicates. If they are duplicate, then "CONTINUE" says to go to the next iteration of the do loop, thereby incrementing only t (indexing tmpdats), but not d (indexing the result array DATS). d is incremented only when non-duplicates are found.
Try:
proc transpose data=have out=transposed(drop=_name_);
by name id;
var date:;
run;
proc sort data=transposed(where=(not missing(col1))) out=sorted nodupkey;
by name id col1;
run;
proc transpose data=sorted out=pre_want(drop=_name_) prefix=date;
by name id;
var col1;
run;
data want;
set pre_want;
array dates date:;
non_missing_dates = dim(dates) - nmiss(of dates[*]);
run;
Please note that there could be less than five date-variables in dataset "want", if not at least one name/id has five not missing and unique dates in dataset "have".
EDIT: Fixed missing creation of the variable containing the number of non-missing-dates.
You can declare an array to hold sorted values of the original dates, then traverse that array copying non-duplicate values back to the original data varables:
data have;
infile datalines dsd dlm=',' ;
input name :$3. id :$3. date1 :MMDDYY10. date2 :MMDDYY10. date3 :MMDDYY10. date4 :MMDDYY10. date5 :MMDDYY10.;
format date: mmddyy10. ;
datalines;
abc,010,01/10/2017,03/20/2016,.,01/10/2017,.
def,023,.,.,04/20/2018,.,04/20/2018
pqr,125,08/10/2013,.,05/20/2012,.,.
xyz,200,01/12/2016,01/21/2016,05/10/2017,09/01/2018,03/02/2016
run;
data want (drop=d t nd tmpdats: );
set have;
array dats {*} date: ;
array tmpdats {20} ;
nd=n(of dats{*});
if nd>0 then do;
do d=1 to nd;
tmpdats{d}=smallest(d,of dats{*});
end;
call missing(of dats{*});
dats{1}=tmpdats{1};
d=1;
if nd>1 then do t=2 to nd;
if tmpdats{t}=tmpdats{t-1} then continue;
d=d+1;
dats{d}=tmpdats{t};
end;
end;
run;
The choice of 20 for the size of the tmpdats array is arbitrary. Just chose a number as least as large as the number of original data variables.
The "if tmpdats{t}=tmpdats{t-1} then continue" check for consecutive duplicates. If they are duplicate, then "CONTINUE" says to go to the next iteration of the do loop, thereby incrementing only t (indexing tmpdats), but not d (indexing the result array DATS). d is incremented only when non-duplicates are found.
Go here, and select "Data Step Programming".
proc transpose
data=have
out=want1 (
drop=_name_
rename=(col1=date)
where=(date ne .)
)
;
by name id;
var date:;
run;
proc sort data=want1 nodupkey;
by name id date;
run;
proc sql;
create table want2 as
select
name,
id,
count(*) as count
from want1
group by name, id
;
quit;
I did not re-transpose to wide on purpose; the "long" dataset layout is always easier to work with, and it avoids waste of storage for missing values.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.