BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
u59166072
Calcite | Level 5

Dear Team,

Please provide the output like for this data.

data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
100 10Jun2020 .
100 . 10Jul2021
101 10Jun2021 .
101 . 10May2021
;
run;

 


output required:


id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

The first dataset in the UPDATE statement, ds(obs=0), is an empty (zero-observation) version of the dataset ds, i.e., it has all the variables, but does not contain any values. It is used as the "master file" (as it is called in the documentation). The second dataset in the UPDATE statement, ds, i.e., the full version of your input data with four observations containing missing or non-missing values of the variables, serves as the "transaction dataset" in the update process.

 

This means: For each BY group (i.e., group of observations with the same value of variable id) the first observation read from the transaction dataset becomes the basis for an observation in the output data set WANT because there is no observation in the master dataset. (Otherwise, the existing observation in the master dataset for that BY group would be "updated" with the first observation of the transaction dataset.) The update process then continues with the second observation of the transaction dataset: Non-missing values, here: the value of variable date2, replace existing values in the same variable (here: the missing value of date2 from the first observation). Missing values, however, do not overwrite existing values. This is why the missing value of date1 in the second observation of each BY group leaves the existing non-missing value (that was copied from the first observation) unchanged. After the last observation of a BY group has been processed, the observation created by the update process is written to dataset WANT. So, for each ID, variable date1 contains the last non-missing date1 value read from the transaction dataset and, similarly, date2 the last non-missing date2 value.

 

See DATA Step Processing with the UPDATE Statement for more details and also the examples in the documentation of the UPDATE statement.

View solution in original post

7 REPLIES 7
FreelanceReinh
Jade | Level 19

Hello @u59166072 and welcome to the SAS Support Communities!

 

Try the UPDATE statement:

data want;
update ds(obs=0) ds;
by id;
run;
u59166072
Calcite | Level 5
There is no data found the variables to date1 and date2. Please help me in this.
data below:

101 10Jun2020 .
101 . 11July2021
102 30May2020 .
102 . 10May2021
FreelanceReinh
Jade | Level 19

Before we change the input data, let's first run my suggested DATA step on your original data and a PROC PRINT step to show dataset WANT.

 

Log:

11   data want;
12   update ds(obs=0) ds;
13   by id;
14   run;

NOTE: There were 0 observations read from the data set WORK.DS.
NOTE: There were 4 observations read from the data set WORK.DS.
NOTE: The data set WORK.WANT has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


15
16   proc print data=want noobs;
17   run;

NOTE: There were 2 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Output:

 id      date1        date2

100    10Jun2020    10Jul2021
101    10Jun2021    10May2021

Now compare this to the "output required":

@u59166072 wrote:

output required:


id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021


Using the new data (copied into your DATA step) ...

data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
101 10Jun2020 .
101 . 11July2021
102 30May2020 .
102 . 10May2021
;

... the log looks the same and the output is:

 id      date1        date2

101    10Jun2020    11July2021
102    30May2020    10May2021
u59166072
Calcite | Level 5
could you plz explain me in detail.
FreelanceReinh
Jade | Level 19

The first dataset in the UPDATE statement, ds(obs=0), is an empty (zero-observation) version of the dataset ds, i.e., it has all the variables, but does not contain any values. It is used as the "master file" (as it is called in the documentation). The second dataset in the UPDATE statement, ds, i.e., the full version of your input data with four observations containing missing or non-missing values of the variables, serves as the "transaction dataset" in the update process.

 

This means: For each BY group (i.e., group of observations with the same value of variable id) the first observation read from the transaction dataset becomes the basis for an observation in the output data set WANT because there is no observation in the master dataset. (Otherwise, the existing observation in the master dataset for that BY group would be "updated" with the first observation of the transaction dataset.) The update process then continues with the second observation of the transaction dataset: Non-missing values, here: the value of variable date2, replace existing values in the same variable (here: the missing value of date2 from the first observation). Missing values, however, do not overwrite existing values. This is why the missing value of date1 in the second observation of each BY group leaves the existing non-missing value (that was copied from the first observation) unchanged. After the last observation of a BY group has been processed, the observation created by the update process is written to dataset WANT. So, for each ID, variable date1 contains the last non-missing date1 value read from the transaction dataset and, similarly, date2 the last non-missing date2 value.

 

See DATA Step Processing with the UPDATE Statement for more details and also the examples in the documentation of the UPDATE statement.

Kurt_Bremser
Super User
proc sql;
create table want as
  select
    id,
    max(date1) as date1 format=date9.,
    max(date2) as date2 format=date9.
  from ds
  group by id
;
quit;

or

proc summary data=ds;
by id;
var date1 date2
output out=want (drop=_:) max()=;
run;
ballardw
Super User

@u59166072 wrote:

Dear Team,

Please provide the output like for this data.

data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
100 10Jun2020 .
100 . 10Jul2021
101 10Jun2021 .
101 . 10May2021
;
run;

 


output required:


id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021


Generic comment on dates: "Date" values as character, as you do above, almost always leads to extra work if you ever expect to use the values.

Character values do not sort properly  10Jul2021 will come before 01Jan2021 for example.

All of the functions that let you do things like check if a date is less than, greater than or between one or more dates won't work.

The Formats that allow you to create groups of related values such as Month of the year, Month and Year, Year and calendar quarter and such won't work either. Which means you may have to spend a lot of time adding additional not needed variables than if they were proper date values.

 

https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/... has a PDF with much information about dates.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1032 views
  • 0 likes
  • 4 in conversation