Hello ,
I have 2 different datasets, listed below. I have requirement - for every account in dataset1 it should iterate through each row in dataset2 and check if dataset1.groupid=dataset2.groupid and if it matches update dataset2.status_code value in dataset1.reject_code. Here only the clause reject_code should be updated with first matching row of dataset2 and then it can break the dataset2 loop and skip to next account in dataset1.
Dataset1
Group_Id Account_Number Reject_code
1 8561234
1 8881246
2 9412381
4 8718912
Dataset2
Group_Id Reject_Code Status_Code
1 8561234 c01
1 8881246 d85
2 48981211 e48
Here is the pseudo code am thinking of but couldn't do working implementation in sas.
loopcnt is no of rows of dataset1 and loopcntr is no of rows in dataset2. Please ignore syntax mistakes and treat it as pseudo code.
Can you please advise on how to implement this as am stuck with it ?
let %i=1;
let %j=1;
%do %until (&i < &loopcnt);
%do %until (&j <&loopcntr)
if dataset1[i].group_id=dataset2[j].group_id
then update dataset1.reject_code=dataset.status_code.
%end
%end
If those are the input data sets, what is the expected output? Since you mention that only the first one should be updated, please include a few cases where a record may not be updated or there are multiple records to consider and only the first should be chosen.
@sasuser_01 wrote:
Hello ,
I have 2 different datasets, listed below. I have requirement - for every account in dataset1 it should iterate through each row in dataset2 and check if dataset1.groupid=dataset2.groupid and if it matches update dataset2.status_code value in dataset1.reject_code. Here only the clause reject_code should be updated with first matching row of dataset2 and then it can break the dataset2 loop and skip to next account in dataset1.
Dataset1
Group_Id Account_Number Reject_code
1 8561234
1 8881246
2 9412381
4 8718912
Dataset2
Group_Id Reject_Code Status_Code
1 8561234 c01
1 8881246 d85
2 48981211 e48
Here is the pseudo code am thinking of but couldn't do working implementation in sas.
loopcnt is no of rows of dataset1 and loopcntr is no of rows in dataset2. Please ignore syntax mistakes and treat it as pseudo code.
Can you please advise on how to implement this as am stuck with it ?
let %i=1;
let %j=1;
%do %until (&i < &loopcnt);
%do %until (&j <&loopcntr)if dataset1[i].group_id=dataset2[j].group_id
then update dataset1.reject_code=dataset.status_code.
%end
%end
If I understand your requirement correctly, your desired result looks like this, right?
Group_Id Account_Number Reject_code
1 8561234 c01
1 8881246 c01
2 9412381 e48
4 8718912
In which case, try this
data Dataset1;
input Group_Id Account_Number Reject_code $;
infile datalines missover;
datalines;
1 8561234
1 8881246
2 9412381
4 8718912
;
data Dataset2;
input Group_Id Account_Number Status_Code $;
datalines;
1 8561234 c01
1 8881246 d85
2 48981211 e48
;
data want;
if _N_ = 1 then do;
dcl hash h(dataset : 'Dataset2(rename = Status_Code = Reject_code)');
h.definekey('Group_Id');
h.definedata('Reject_code');
h.definedone();
end;
set Dataset1;
_N_ = h.find();
run;
Yes that's correct. Here is how output dataset , dataset1 & datset2 looks like . Please let me know
output dataset:
Group_Id Account_Number Reject_Code
1 8561234 c01
1 8881246 c01
2 9412381 e48
4 8718912 g9
Dataset1
Group_Id Account_Number
1 8561234
1 8881246
2 9412381
4 8718912
Dataset2:
Group_Id Reject_Code Status_Code
1 reject1 c01
1 reject2 d85
2 reject3 e48
2 reject4 f12
4 reject5 g9
Did you try my code above? Also, the data you posted just not is not the same as the original data from your post.
Merge works fine for that example.
First let's create some actual datasets from the listings you posted.
data expect;
input Group_Id Account_Number Reject_Code $;
cards;
1 8561234 c01
1 8881246 c01
2 9412381 e48
4 8718912 g9
;
data accounts;
input Group_Id Account_Number;
cards;
1 8561234
1 8881246
2 9412381
4 8718912
;
data codes ;
input Group_Id Reject_Code $ Status_Code $;
cards;
1 reject1 c01
1 reject2 d85
2 reject3 e48
2 reject4 f12
4 reject5 g9
;
And make sure they are sorted in the proper order. I assume the REJECT_CODE variable is the ordering variable in the CODES dataset.
proc sort data=codes ;
by group_id reject_code ;
run;
proc sort data=accounts;
by group_id account_number;
run;
Now just merge them and keep track of the first value of STATUS_CODE. Since the variable name in your expected result is already being used in the CODES dataset let's just use the name WANT in the data step. We can rename the two variables when it is written out. You can drop the REJECT_CODE and STATUS_CODE variables read from the CODES dataset if you want to get expected output exactly.
data want;
merge accounts(in=in1) codes(in=in2);
by group_id ;
retain found want ;
if first.group_id then found=0;
if in2 and not found then do;
found=1;
want=status_code ;
end;
if in1 then output;
in1=0;
rename reject_code = reject_codeB want=reject_code;
drop found;
run;
Results:
Account_ reject_ Status_ reject_ Obs Group_Id Number codeB Code code 1 1 8561234 reject1 c01 c01 2 1 8881246 reject2 d85 c01 3 2 9412381 reject3 e48 e48 4 4 8718912 reject5 g9 g9
Of course it would be even easier to just eliminate the unwanted codes before the merge.
data codes2;
set codes;
by group_id;
if first.group_id;
run;
data want;
merge accounts(in=in1) codes2;
by group_id;
if in1;
run;
proc print;
run;
Again if you want you can DROP the REJECT_CODE variable and rename the STATUS_CODE variable to REJECT_CODE.
Since it looks like you are working with actual data in datasets this does not look like something where macro code would be of any value.
What is wrong with simply merging the two datasets by the common key variable?
data one;
input Group_Id Account_Number;
cards;
1 8561234
1 8881246
2 9412381
4 8718912
;
data two;
input Group_Id Account_Number Status_Code $ ;
cards;
1 8561234 c01
1 8881246 d85
2 48981211 e48
;
data want;
merge one two;
by group_id account_number;
run;
Result:
Account_ Status_ Obs Group_Id Number Code 1 1 8561234 c01 2 1 8881246 d85 3 2 9412381 4 2 48981211 e48 5 4 8718912
Issue with merging is :
For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.
By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.
@sasuser_01 wrote:
Issue with merging is :
For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.
By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.
That does not make any sense. Every observation in dataset1 will obviously be included unless you add something to remove them.
Please show some data that actually reflect the issues you are having with merging.
If you want to match by just GROUP_ID then do that. If you only want the first match then do that.
So this step will take the first value of STATUS_CODE and store it into the variable WANT. It will output every distinct value of GROUP_ID that appears in the ACCOUNT dataset.
data want;
merge account(in=in1) codes(in=in2);
by group_id;
retain found ;
if first.group_id then found=0;
if in2 and not found then do;
found=1;
want = status_code;
retain want;
end;
if last.group_id and in1;
run;
@sasuser_01 wrote:
Issue with merging is :
For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.
By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.
Can you include an example of that scenario in your sample data
Hi @sasuser_01,
This thread has gone a while with no activity. Were you able to attempt the solutions provided from other community members? Are you able to provide the requested data?
Thanks,
Joe
Join us for SAS Community Trivia
SAS Bowl XLVIII, All Things Models
Wednesday, February 19, 2024, at 10:00 a.m. ET | #SASBowl
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.