Solved: merging

Terekhin · Posted 10-16-2018 09:34 AM

HI!

SAS 9.4

i'm trying to merge 2 datasets.

When i merges this tables i want to change column "name" with condition "st = "POMPLE" and DATE < "01MAR2017"d" , but program changes all observation. Why it's happen? I know that if i do it in a new data step, everything will be executed correctly. I want to undenstand how SAS creating PDV in merging.

data first_data;
input st $ date date9. ;
attrib date format=date9.;
datalines;
POMPLE 28FEB2017
POMPLE 28FEB2017
POMPLE 01MAR2017
POMPLE 01MAR2017
POMPLE 02MAR2017
POMPLE 02MAR2017
MINR 28FEB2017
MINR 02MAR2017
;
run;
data sec_data;
input st $ name $;
datalines;
POMPLE ANNA
MINR GERRY
;
run;
proc sort data=first_data;
by st;
run;
proc sort data=sec_data;
by st;
run;
data merges_clon;
merge first_data(in=a) sec_data(in=b);
by st;
run;
data merges;
merge first_data(in=a) sec_data(in=b);
by st;
IF st = "POMPLE" and DATE < "01MAR2017"d THEN
name = "MOLLY";
run;

Kurt_Bremser · Posted 10-16-2018 10:06 AM

@Terekhin wrote:

HI!

SAS 9.4

i'm trying to merge 2 datasets.

When i merges this tables i want to change column "name" with condition "st = "POMPLE" and DATE < "01MAR2017"d" , but program changes all observation. Why it's happen? I know that if i do it in a new data step, everything will be executed correctly. I want to undenstand how SAS creating PDV in merging.

data first_data;
input st $ date date9. ;
attrib date format=date9.;
datalines;
POMPLE 28FEB2017
POMPLE 28FEB2017
POMPLE 01MAR2017
POMPLE 01MAR2017
POMPLE 02MAR2017
POMPLE 02MAR2017
MINR 28FEB2017
MINR 02MAR2017
;
run;
data sec_data;
input st $ name $;
datalines;
POMPLE ANNA
MINR GERRY
;
run;
proc sort data=first_data;
by st;
run;
proc sort data=sec_data;
by st;
run;
data merges_clon;
merge first_data(in=a) sec_data(in=b);
by st;
run;
data merges;
merge first_data(in=a) sec_data(in=b);
by st;
IF st = "POMPLE" and DATE < "01MAR2017"d THEN
name = "MOLLY";
run;

This is part of the mechanics of the data step.

Variables from input datasets are always retained (not set to missing), so what happens is this:

In the first iteration of the data step, both datasets are read, the condition is found to be true, and name is set to MOLLY. From the second iteration on, no further observations for st = 'POMPLE' are present in the second dataset, so the value 'MOLLY' is retained and ends up in all output rows.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

V_27 · Posted 10-16-2018 09:59 AM

Hi @Terekhin

Is this what you are trying to do?

data first_data;
input st $ date date9. ;
attrib date format=date9.;
datalines;
POMPLE 28FEB2017
POMPLE 28FEB2017
POMPLE 01MAR2017
POMPLE 01MAR2017
POMPLE 02MAR2017
POMPLE 02MAR2017
MINR 28FEB2017
MINR 02MAR2017
;
run;

data sec_data;
input st $ name $;
datalines;
POMPLE ANNA
MINR GERRY
;
run;

proc sort data=first_data;
by st;
run;
proc sort data=sec_data;
by st;
run;
data merges_clon;
merge first_data(in=a where=(st="POMPLE" and date<"01MAR2017"d)) sec_data(in=b);
by st;
run;

Astounding · Posted 10-16-2018 10:02 AM

Believe it or not, that's a standard result from MERGE. The problem is this.

SAS reads each observation only once as it merges the data. That means it reads each observation from SEC_DATA only once. It retains the values it has, which is why NAME repeats in the merged data.

Once you change "ANNA" to "MOLLY" SAS retains the value "MOLLY". It never goes back and re-reads "ANNA" from the incoming data.

There are several ways to fix this. The easiest way is to create a new variable (NAME2) based on NAME plus your other conditions.

Kurt_Bremser · Posted 10-16-2018 10:06 AM

@Terekhin wrote:

HI!

SAS 9.4

i'm trying to merge 2 datasets.

When i merges this tables i want to change column "name" with condition "st = "POMPLE" and DATE < "01MAR2017"d" , but program changes all observation. Why it's happen? I know that if i do it in a new data step, everything will be executed correctly. I want to undenstand how SAS creating PDV in merging.

data first_data;
input st $ date date9. ;
attrib date format=date9.;
datalines;
POMPLE 28FEB2017
POMPLE 28FEB2017
POMPLE 01MAR2017
POMPLE 01MAR2017
POMPLE 02MAR2017
POMPLE 02MAR2017
MINR 28FEB2017
MINR 02MAR2017
;
run;
data sec_data;
input st $ name $;
datalines;
POMPLE ANNA
MINR GERRY
;
run;
proc sort data=first_data;
by st;
run;
proc sort data=sec_data;
by st;
run;
data merges_clon;
merge first_data(in=a) sec_data(in=b);
by st;
run;
data merges;
merge first_data(in=a) sec_data(in=b);
by st;
IF st = "POMPLE" and DATE < "01MAR2017"d THEN
name = "MOLLY";
run;

This is part of the mechanics of the data step.

Variables from input datasets are always retained (not set to missing), so what happens is this:

In the first iteration of the data step, both datasets are read, the condition is found to be true, and name is set to MOLLY. From the second iteration on, no further observations for st = 'POMPLE' are present in the second dataset, so the value 'MOLLY' is retained and ends up in all output rows.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

merging

Re: merging

Re: merging

Re: merging

Re: merging

merging

Re: merging

Re: merging

Re: merging

Re: merging

The 2025 SAS Hackathon has begun!