Solved: Base sas

u59166072 · Posted 08-26-2021 03:19 PM

Dear Team,

Please provide the output like for this data.

data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
100 10Jun2020 .
100 . 10Jul2021
101 10Jun2021 .
101 . 10May2021
;
run;

output required:

id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021

FreelanceReinh · Posted 08-26-2021 05:39 PM

The first dataset in the UPDATE statement, ds(obs=0), is an empty (zero-observation) version of the dataset ds, i.e., it has all the variables, but does not contain any values. It is used as the "master file" (as it is called in the documentation). The second dataset in the UPDATE statement, ds, i.e., the full version of your input data with four observations containing missing or non-missing values of the variables, serves as the "transaction dataset" in the update process.

This means: For each BY group (i.e., group of observations with the same value of variable id) the first observation read from the transaction dataset becomes the basis for an observation in the output data set WANT because there is no observation in the master dataset. (Otherwise, the existing observation in the master dataset for that BY group would be "updated" with the first observation of the transaction dataset.) The update process then continues with the second observation of the transaction dataset: Non-missing values, here: the value of variable date2, replace existing values in the same variable (here: the missing value of date2 from the first observation). Missing values, however, do not overwrite existing values. This is why the missing value of date1 in the second observation of each BY group leaves the existing non-missing value (that was copied from the first observation) unchanged. After the last observation of a BY group has been processed, the observation created by the update process is written to dataset WANT. So, for each ID, variable date1 contains the last non-missing date1 value read from the transaction dataset and, similarly, date2 the last non-missing date2 value.

See DATA Step Processing with the UPDATE Statement for more details and also the examples in the documentation of the UPDATE statement.

View solution in original post

FreelanceReinh · Posted 08-26-2021 03:55 PM

Hello @u59166072 and welcome to the SAS Support Communities!

Try the UPDATE statement:

data want;
update ds(obs=0) ds;
by id;
run;

u59166072 · Posted 08-26-2021 04:21 PM

There is no data found the variables to date1 and date2. Please help me in this.
data below:

101 10Jun2020 .
101 . 11July2021
102 30May2020 .
102 . 10May2021

FreelanceReinh · Posted 08-26-2021 04:41 PM

Before we change the input data, let's first run my suggested DATA step on your original data and a PROC PRINT step to show dataset WANT.

Log:

11   data want;
12   update ds(obs=0) ds;
13   by id;
14   run;

NOTE: There were 0 observations read from the data set WORK.DS.
NOTE: There were 4 observations read from the data set WORK.DS.
NOTE: The data set WORK.WANT has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


15
16   proc print data=want noobs;
17   run;

NOTE: There were 2 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

Output:

 id      date1        date2

100    10Jun2020    10Jul2021
101    10Jun2021    10May2021

Now compare this to the "output required":

@u59166072 wrote:

output required:

id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021

Using the new data (copied into your DATA step) ...

data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
101 10Jun2020 .
101 . 11July2021
102 30May2020 .
102 . 10May2021
;

... the log looks the same and the output is:

 id      date1        date2

101    10Jun2020    11July2021
102    30May2020    10May2021

u59166072 · Posted 08-26-2021 04:49 PM

could you plz explain me in detail.

FreelanceReinh · Posted 08-26-2021 05:39 PM

The first dataset in the UPDATE statement, ds(obs=0), is an empty (zero-observation) version of the dataset ds, i.e., it has all the variables, but does not contain any values. It is used as the "master file" (as it is called in the documentation). The second dataset in the UPDATE statement, ds, i.e., the full version of your input data with four observations containing missing or non-missing values of the variables, serves as the "transaction dataset" in the update process.

This means: For each BY group (i.e., group of observations with the same value of variable id) the first observation read from the transaction dataset becomes the basis for an observation in the output data set WANT because there is no observation in the master dataset. (Otherwise, the existing observation in the master dataset for that BY group would be "updated" with the first observation of the transaction dataset.) The update process then continues with the second observation of the transaction dataset: Non-missing values, here: the value of variable date2, replace existing values in the same variable (here: the missing value of date2 from the first observation). Missing values, however, do not overwrite existing values. This is why the missing value of date1 in the second observation of each BY group leaves the existing non-missing value (that was copied from the first observation) unchanged. After the last observation of a BY group has been processed, the observation created by the update process is written to dataset WANT. So, for each ID, variable date1 contains the last non-missing date1 value read from the transaction dataset and, similarly, date2 the last non-missing date2 value.

See DATA Step Processing with the UPDATE Statement for more details and also the examples in the documentation of the UPDATE statement.

Kurt_Bremser · Posted 08-26-2021 04:01 PM

proc sql;
create table want as
  select
    id,
    max(date1) as date1 format=date9.,
    max(date2) as date2 format=date9.
  from ds
  group by id
;
quit;

or

proc summary data=ds;
by id;
var date1 date2
output out=want (drop=_:) max()=;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

ballardw · Posted 08-26-2021 04:57 PM

@u59166072 wrote:

Dear Team,

Please provide the output like for this data.

data ds;
infile datalines;
input id date1: $10. date2: $10.;
datalines;
100 10Jun2020 .
100 . 10Jul2021
101 10Jun2021 .
101 . 10May2021
;
run;

output required:

id date1 date2
100 10Jun2020 10Jul2021
101 10Jun2021 10May2021

Generic comment on dates: "Date" values as character, as you do above, almost always leads to extra work if you ever expect to use the values.

Character values do not sort properly 10Jul2021 will come before 01Jan2021 for example.

All of the functions that let you do things like check if a date is less than, greater than or between one or more dates won't work.

The Formats that allow you to create groups of related values such as Month of the year, Month and Year, Year and calendar quarter and such won't work either. Which means you may have to spend a lot of time adding additional not needed variables than if they were proper date values.

https://communities.sas.com/t5/SAS-Communities-Library/Working-with-Dates-and-Times-in-SAS-Tutorial/... has a PDF with much information about dates.

Base sas

Re: Base sas

Re: Base sas

Re: Base sas

Re: Base sas

Re: Base sas

Re: Base sas

Re: Base sas

Re: Base sas

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away