Dear Team,
Please provide the output like for this data.
data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
100 10Jun2020 .
100 . 10Jul2021
101 10Jun2021 .
101 . 10May2021
;
run;
output required:
id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021
The first dataset in the UPDATE statement, ds(obs=0), is an empty (zero-observation) version of the dataset ds, i.e., it has all the variables, but does not contain any values. It is used as the "master file" (as it is called in the documentation). The second dataset in the UPDATE statement, ds, i.e., the full version of your input data with four observations containing missing or non-missing values of the variables, serves as the "transaction dataset" in the update process.
This means: For each BY group (i.e., group of observations with the same value of variable id) the first observation read from the transaction dataset becomes the basis for an observation in the output data set WANT because there is no observation in the master dataset. (Otherwise, the existing observation in the master dataset for that BY group would be "updated" with the first observation of the transaction dataset.) The update process then continues with the second observation of the transaction dataset: Non-missing values, here: the value of variable date2, replace existing values in the same variable (here: the missing value of date2 from the first observation). Missing values, however, do not overwrite existing values. This is why the missing value of date1 in the second observation of each BY group leaves the existing non-missing value (that was copied from the first observation) unchanged. After the last observation of a BY group has been processed, the observation created by the update process is written to dataset WANT. So, for each ID, variable date1 contains the last non-missing date1 value read from the transaction dataset and, similarly, date2 the last non-missing date2 value.
See DATA Step Processing with the UPDATE Statement for more details and also the examples in the documentation of the UPDATE statement.
Hello @u59166072 and welcome to the SAS Support Communities!
Try the UPDATE statement:
data want;
update ds(obs=0) ds;
by id;
run;
Before we change the input data, let's first run my suggested DATA step on your original data and a PROC PRINT step to show dataset WANT.
Log:
11 data want; 12 update ds(obs=0) ds; 13 by id; 14 run; NOTE: There were 0 observations read from the data set WORK.DS. NOTE: There were 4 observations read from the data set WORK.DS. NOTE: The data set WORK.WANT has 2 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.01 seconds 15 16 proc print data=want noobs; 17 run; NOTE: There were 2 observations read from the data set WORK.WANT. NOTE: PROCEDURE PRINT used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
Output:
id date1 date2 100 10Jun2020 10Jul2021 101 10Jun2021 10May2021
Now compare this to the "output required":
@u59166072 wrote:
output required:
id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021
Using the new data (copied into your DATA step) ...
data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
101 10Jun2020 .
101 . 11July2021
102 30May2020 .
102 . 10May2021
;
... the log looks the same and the output is:
id date1 date2 101 10Jun2020 11July2021 102 30May2020 10May2021
The first dataset in the UPDATE statement, ds(obs=0), is an empty (zero-observation) version of the dataset ds, i.e., it has all the variables, but does not contain any values. It is used as the "master file" (as it is called in the documentation). The second dataset in the UPDATE statement, ds, i.e., the full version of your input data with four observations containing missing or non-missing values of the variables, serves as the "transaction dataset" in the update process.
This means: For each BY group (i.e., group of observations with the same value of variable id) the first observation read from the transaction dataset becomes the basis for an observation in the output data set WANT because there is no observation in the master dataset. (Otherwise, the existing observation in the master dataset for that BY group would be "updated" with the first observation of the transaction dataset.) The update process then continues with the second observation of the transaction dataset: Non-missing values, here: the value of variable date2, replace existing values in the same variable (here: the missing value of date2 from the first observation). Missing values, however, do not overwrite existing values. This is why the missing value of date1 in the second observation of each BY group leaves the existing non-missing value (that was copied from the first observation) unchanged. After the last observation of a BY group has been processed, the observation created by the update process is written to dataset WANT. So, for each ID, variable date1 contains the last non-missing date1 value read from the transaction dataset and, similarly, date2 the last non-missing date2 value.
See DATA Step Processing with the UPDATE Statement for more details and also the examples in the documentation of the UPDATE statement.
proc sql;
create table want as
select
id,
max(date1) as date1 format=date9.,
max(date2) as date2 format=date9.
from ds
group by id
;
quit;
or
proc summary data=ds;
by id;
var date1 date2
output out=want (drop=_:) max()=;
run;
@u59166072 wrote:
Dear Team,
Please provide the output like for this data.
data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
100 10Jun2020 .
100 . 10Jul2021
101 10Jun2021 .
101 . 10May2021
;
run;
output required:
id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021
Generic comment on dates: "Date" values as character, as you do above, almost always leads to extra work if you ever expect to use the values.
Character values do not sort properly 10Jul2021 will come before 01Jan2021 for example.
All of the functions that let you do things like check if a date is less than, greater than or between one or more dates won't work.
The Formats that allow you to create groups of related values such as Month of the year, Month and Year, Year and calendar quarter and such won't work either. Which means you may have to spend a lot of time adding additional not needed variables than if they were proper date values.
https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/... has a PDF with much information about dates.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.