BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Terekhin
Calcite | Level 5

HI! 

SAS 9.4

 

i'm trying to merge 2 datasets. 

When i merges this tables i want to change column "name" with condition "st = "POMPLE" and DATE < "01MAR2017"d" , but program changes all observation. Why it's happen? I know that if i do it in a new data step, everything will be executed correctly. I want to undenstand how SAS creating PDV in merging.

 

 

 

data first_data;
input st $ date date9. ;
attrib date format=date9.;
datalines;
POMPLE 28FEB2017
POMPLE 28FEB2017
POMPLE 01MAR2017
POMPLE 01MAR2017
POMPLE 02MAR2017
POMPLE 02MAR2017
MINR 28FEB2017
MINR 02MAR2017
;
run;
data sec_data;
input st $ name $;
datalines;
POMPLE ANNA
MINR GERRY
;
run;
proc sort data=first_data;
by st;
run;
proc sort data=sec_data;
by st;
run;
data merges_clon;
merge first_data(in=a) sec_data(in=b);
by st;
run;
data merges;
merge first_data(in=a) sec_data(in=b);
by st;
IF st = "POMPLE" and DATE < "01MAR2017"d THEN
name = "MOLLY";
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@Terekhin wrote:

HI! 

SAS 9.4

 

i'm trying to merge 2 datasets. 

When i merges this tables i want to change column "name" with condition "st = "POMPLE" and DATE < "01MAR2017"d" , but program changes all observation. Why it's happen? I know that if i do it in a new data step, everything will be executed correctly. I want to undenstand how SAS creating PDV in merging.

 

 

 

data first_data;
input st $ date date9. ;
attrib date format=date9.;
datalines;
POMPLE 28FEB2017
POMPLE 28FEB2017
POMPLE 01MAR2017
POMPLE 01MAR2017
POMPLE 02MAR2017
POMPLE 02MAR2017
MINR 28FEB2017
MINR 02MAR2017
;
run;
data sec_data;
input st $ name $;
datalines;
POMPLE ANNA
MINR GERRY
;
run;
proc sort data=first_data;
by st;
run;
proc sort data=sec_data;
by st;
run;
data merges_clon;
merge first_data(in=a) sec_data(in=b);
by st;
run;
data merges;
merge first_data(in=a) sec_data(in=b);
by st;
IF st = "POMPLE" and DATE < "01MAR2017"d THEN
name = "MOLLY";
run;


This is part of the mechanics of the data step.

Variables from input datasets are always retained (not set to missing), so what happens is this:

In the first iteration of the data step, both datasets are read, the condition is found to be true, and name is set to MOLLY. From the second iteration on, no further observations for st = 'POMPLE' are present in the second dataset, so the value 'MOLLY' is retained and ends up in all output rows.

View solution in original post

3 REPLIES 3
V_27
Obsidian | Level 7

 

 

 Hi @Terekhin

 

Is this what you are trying to do?

data first_data;
input st $ date date9. ;
attrib date format=date9.;
datalines;
POMPLE 28FEB2017
POMPLE 28FEB2017
POMPLE 01MAR2017
POMPLE 01MAR2017
POMPLE 02MAR2017
POMPLE 02MAR2017
MINR 28FEB2017
MINR 02MAR2017
;
run;

data sec_data;
input st $ name $;
datalines;
POMPLE ANNA
MINR GERRY
;
run;

proc sort data=first_data;
by st;
run;
proc sort data=sec_data;
by st;
run;
data merges_clon;
merge first_data(in=a where=(st="POMPLE" and date<"01MAR2017"d)) sec_data(in=b);
by st;
run;

 

Astounding
PROC Star

Believe it or not, that's a standard result from MERGE.  The problem is this.

 

SAS reads each observation only once as it merges the data.  That means it reads each observation from SEC_DATA only once.  It retains the values it has, which is why NAME repeats in the merged data.

 

Once you change "ANNA" to "MOLLY" SAS retains the value "MOLLY".  It never goes back and re-reads "ANNA" from the incoming data.

 

There are several ways to fix this.  The easiest way is to create a new variable (NAME2) based on NAME plus your other conditions.

Kurt_Bremser
Super User

@Terekhin wrote:

HI! 

SAS 9.4

 

i'm trying to merge 2 datasets. 

When i merges this tables i want to change column "name" with condition "st = "POMPLE" and DATE < "01MAR2017"d" , but program changes all observation. Why it's happen? I know that if i do it in a new data step, everything will be executed correctly. I want to undenstand how SAS creating PDV in merging.

 

 

 

data first_data;
input st $ date date9. ;
attrib date format=date9.;
datalines;
POMPLE 28FEB2017
POMPLE 28FEB2017
POMPLE 01MAR2017
POMPLE 01MAR2017
POMPLE 02MAR2017
POMPLE 02MAR2017
MINR 28FEB2017
MINR 02MAR2017
;
run;
data sec_data;
input st $ name $;
datalines;
POMPLE ANNA
MINR GERRY
;
run;
proc sort data=first_data;
by st;
run;
proc sort data=sec_data;
by st;
run;
data merges_clon;
merge first_data(in=a) sec_data(in=b);
by st;
run;
data merges;
merge first_data(in=a) sec_data(in=b);
by st;
IF st = "POMPLE" and DATE < "01MAR2017"d THEN
name = "MOLLY";
run;


This is part of the mechanics of the data step.

Variables from input datasets are always retained (not set to missing), so what happens is this:

In the first iteration of the data step, both datasets are read, the condition is found to be true, and name is set to MOLLY. From the second iteration on, no further observations for st = 'POMPLE' are present in the second dataset, so the value 'MOLLY' is retained and ends up in all output rows.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1304 views
  • 0 likes
  • 4 in conversation