BookmarkSubscribeRSS Feed
iced_tea
Obsidian | Level 7

Hello Community,

I have a following example table:

iced_tea_1-1617825349036.png

 

I want to use this dataset to create a new one, where Obs 3 and 4 are combined and output on the same row because they were on the same date, and Obs 6 and 7 are on the same row because again, they were on the same date.

/*Example of initial dataset.*/
data example;
input employee_id $ date :DATE8. AdvancedCourse IntermediaryCourse BeginnerCourse 8.;
format date DATE8.;
format AdvancedCourse IntermediaryCourse BeginnerCourse 8.;
datalines;
9781 29MAR20 1 . .
9781 09APR20 . . 1
9781 21JUN20 1 . .
9781 21JUN20 . . 1
9781 26JUN20 . . 1
1234 12MAR19 . . 1
1234 12MAR19 . 1 .
1234 15MAR19 1 . .
;
run;
proc print data=example;
run;

/*Example of eventual dataset.*/
data want;
input employee_id $ date :DATE8. AdvancedCourse IntermediaryCourse BeginnerCourse 8.;
format date DATE8.;
format AdvancedCourse IntermediaryCourse BeginnerCourse 8.;
datalines;
9781 29MAR20 1 . .
9781 09APR20 . . 1
9781 21JUN20 1 . 1
9781 26JUN20 . . 1
1234 12MAR19 . 1 1
1234 15MAR19 1 . .
;
run;
proc print data=want;
run;

Any suggestions?

Thank you.

 

1 REPLY 1
mkeintz
PROC Star

If (1) you are ok with re-ordered the data by employee_id/date, and (2) you want to keep the latest non-missing values for each employee_id/date, then UPDATE does what you want:

 

data example;
input employee_id $ date :DATE8. AdvancedCourse IntermediaryCourse BeginnerCourse 8.;
format date DATE9.;
format AdvancedCourse IntermediaryCourse BeginnerCourse 8.;
datalines;
9781 29MAR20 1 . .
9781 09APR20 . . 1
9781 21JUN20 1 . .
9781 21JUN20 . . 1
9781 26JUN20 . . 1
1234 12MAR19 . . 1
1234 12MAR19 . 1 .
1234 15MAR19 1 . .
;
run;

proc sort data=example out=need equals;
  by employee_id date;
run;
data want;
  update need (obs=0) need;
  by employee_id date;
run;

The "equals" keeps all ties in the sort keys in original order, although that shouldn't be a problem here.

 

Think of the UPDATE statement as updating a masterfile with transactions - except the only transactions that are recorded are the non-missing values - no transaction can modify a variable from non-missing to missing.  I use the "obs=0" in the left side of the update statement, because the initial "master" file (the first NEED in the update statement has multiple records per byvars, but we only want one such record in the result.

 

Unlike the above, UPDATE was originally intended for such separate master and transaction data sets, as in:

data newmaster;
  update master   transgrp1 ;
  by unique_id;
urn;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 1 reply
  • 771 views
  • 0 likes
  • 2 in conversation