- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Posted 04-15-2011 07:11 AM
(950 views)
I ran the two pieces of code below and they give me different outputs.
When I execute Code A when it starts processing data from the second table the first observation calculates the variable 'customer_id' correctly, however subsequent observations results in the 'customer_id' being retained for all remaining observations.
Code B produces values for customer_id_3 as I'd expect without retaining values.
Is there a fundamental reason why Code A retains values for the second table however code B does not? Thanks in advance!
/* Code A */
data acc_cust_match_test;
set table1 (keep= customer_id)
table2 (keep= customer_id rename=(customer_id = customer_id_2));
if customer_id = '' then customer_id = '000'||substr(customer_id_2,4);
run;
/* Code B */
data acc_cust_match_test2;
set table1 (keep= customer_id probeacctnumber)
table2 (keep= customer_id rename=(customer_id = customer_id_2));
length customer_id3 $20.;
if customer_id = '' then customer_id_3 = '000'||substr(customer_id_2,4);else
customer_id_3 = customer_id;
run;
Note: all customer_id variables and variants are character variables of length 20.
When I execute Code A when it starts processing data from the second table the first observation calculates the variable 'customer_id' correctly, however subsequent observations results in the 'customer_id' being retained for all remaining observations.
Code B produces values for customer_id_3 as I'd expect without retaining values.
Is there a fundamental reason why Code A retains values for the second table however code B does not? Thanks in advance!
/* Code A */
data acc_cust_match_test;
set table1 (keep= customer_id)
table2 (keep= customer_id rename=(customer_id = customer_id_2));
if customer_id = '' then customer_id = '000'||substr(customer_id_2,4);
run;
/* Code B */
data acc_cust_match_test2;
set table1 (keep= customer_id probeacctnumber)
table2 (keep= customer_id rename=(customer_id = customer_id_2));
length customer_id3 $20.;
if customer_id = '' then customer_id_3 = '000'||substr(customer_id_2,4);else
customer_id_3 = customer_id;
run;
Note: all customer_id variables and variants are character variables of length 20.
3 REPLIES 3
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
With your SET statement, you are concatenating the two input files -- note that you have a RENAME= only on the second file, so for those contributing observations the customer_id variable will not be initialized from the SET so the value will always be blank. And with that said, your second code piece will never assign customer_id as in the first code, because you have assigning customer_id_3, not customer_id, within the DATA step.
Suggest you can do some self-diagnosis with your DATA step, using this line:
PUTLOG _ALL_;
Also, be careful using the reference "retain" because it is more accurate that your code is doing a SAS variable "assignment" -- SAS language has a RETAIN statement which is used for other purposes that what is demonstrated in your code.
Scott Barry
SBBWorks, Inc.
Suggest you can do some self-diagnosis with your DATA step, using this line:
PUTLOG _ALL_;
Also, be careful using the reference "retain" because it is more accurate that your code is doing a SAS variable "assignment" -- SAS language has a RETAIN statement which is used for other purposes that what is demonstrated in your code.
Scott Barry
SBBWorks, Inc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Let's make the code a bit simpler.
/* test data sets */
data table1;
input id :$8. @@;
cards;
1 2 3
;
run;
data table2;
input id :$8. @@;
cards;
4 5
;
run;
data a;
set table1 table2(rename=(id=id2));
if missing(id) then id = "000" || id2;
run;
proc print; run;
/* on lst
Obs id id2
1 1
2 2
3 3
4 0004 4
5 0004 5 <- id is not reset here
*/
The question is: why the id for the fifth obs "0004", not "0005"?
This is because when the SET statement is reading observations from TABLE2, it does not reset ID variable to missing (because it is not in the TABLE2 due to RENAME) after the initial pdv resetting as the SET starts handling TABLE2. Thus once the ID is assigned "0004", it remains "0004". Since "0004" is not missing, the IF condition evaluates to FALSE and no further assignments are done.
For more detailed explanation on how data step works, see Howard Schreier's excellent SUGI30 paper titled "Let Your Data Power Your DATA Step: Making Effective
Use of the SET, MERGE, UPDATE, and MODIFY Stat....
/* test data sets */
data table1;
input id :$8. @@;
cards;
1 2 3
;
run;
data table2;
input id :$8. @@;
cards;
4 5
;
run;
data a;
set table1 table2(rename=(id=id2));
if missing(id) then id = "000" || id2;
run;
proc print; run;
/* on lst
Obs id id2
1 1
2 2
3 3
4 0004 4
5 0004 5 <- id is not reset here
*/
The question is: why the id for the fifth obs "0004", not "0005"?
This is because when the SET statement is reading observations from TABLE2, it does not reset ID variable to missing (because it is not in the TABLE2 due to RENAME) after the initial pdv resetting as the SET starts handling TABLE2. Thus once the ID is assigned "0004", it remains "0004". Since "0004" is not missing, the IF condition evaluates to FALSE and no further assignments are done.
For more detailed explanation on how data step works, see Howard Schreier's excellent SUGI30 paper titled "Let Your Data Power Your DATA Step: Making Effective
Use of the SET, MERGE, UPDATE, and MODIFY Stat....
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
There are some information I maybe want to say.
After reading the first dataset.SAS will set all the variables in PDV missing,
So 4 will be missing.and you set it 004.
After that, because id is from first dataset ,so it will be retain in PDV (any variable in dataset will be retain ) ,so 5 will not be missing ,it is also 004.
I do not know whether it is the reason to make so strange result.
Ksharp
After reading the first dataset.SAS will set all the variables in PDV missing,
So 4 will be missing.and you set it 004.
After that, because id is from first dataset ,so it will be retain in PDV (any variable in dataset will be retain ) ,so 5 will not be missing ,it is also 004.
I do not know whether it is the reason to make so strange result.
Ksharp