Iterate over two datasets and update the flag

sasuser_01 · Posted 11-24-2022 01:10 PM

Hello ,

I have 2 different datasets, listed below. I have requirement - for every account in dataset1 it should iterate through each row in dataset2 and check if dataset1.groupid=dataset2.groupid and if it matches update dataset2.status_code value in dataset1.reject_code. Here only the clause reject_code should be updated with first matching row of dataset2 and then it can break the dataset2 loop and skip to next account in dataset1.

Dataset1

Group_Id Account_Number Reject_code

1 8561234

1 8881246

2 9412381

4 8718912

Dataset2

Group_Id Reject_Code Status_Code

1 8561234 c01

1 8881246 d85

2 48981211 e48

Here is the pseudo code am thinking of but couldn't do working implementation in sas.

loopcnt is no of rows of dataset1 and loopcntr is no of rows in dataset2. Please ignore syntax mistakes and treat it as pseudo code.

Can you please advise on how to implement this as am stuck with it ?

let %i=1;

let %j=1;

%do %until (&i < &loopcnt);
%do %until (&j <&loopcntr)

if dataset1[i].group_id=dataset2[j].group_id

then update dataset1.reject_code=dataset.status_code.

%end

Reeza · Posted 11-24-2022 01:14 PM

If those are the input data sets, what is the expected output? Since you mention that only the first one should be updated, please include a few cases where a record may not be updated or there are multiple records to consider and only the first should be chosen.

@sasuser_01 wrote:

Hello ,

I have 2 different datasets, listed below. I have requirement - for every account in dataset1 it should iterate through each row in dataset2 and check if dataset1.groupid=dataset2.groupid and if it matches update dataset2.status_code value in dataset1.reject_code. Here only the clause reject_code should be updated with first matching row of dataset2 and then it can break the dataset2 loop and skip to next account in dataset1.

Dataset1

Group_Id Account_Number  Reject_code

1 8561234

1 8881246

2 9412381

4 8718912

Dataset2

Group_Id Reject_Code Status_Code

1 8561234 c01

1 8881246 d85

2 48981211 e48

Here is the pseudo code am thinking of but couldn't do working implementation in sas.

loopcnt is no of rows of dataset1 and loopcntr is no of rows in dataset2. Please ignore syntax mistakes and treat it as pseudo code.

Can you please advise on how to implement this as am stuck with it ?

let %i=1;

let %j=1;

%do %until (&i < &loopcnt);
%do %until (&j <&loopcntr)

if dataset1[i].group_id=dataset2[j].group_id

then update dataset1.reject_code=dataset.status_code.

%end

%end

PeterClemmensen · Posted 11-24-2022 01:25 PM

If I understand your requirement correctly, your desired result looks like this, right?

Group_Id Account_Number  Reject_code
1        8561234         c01
1        8881246         c01
2        9412381         e48
4        8718912

The DATA to DATA Step Macro
Blog: SASnrd

PeterClemmensen · Posted 11-24-2022 01:37 PM

In which case, try this

data Dataset1;
input Group_Id Account_Number Reject_code $;
infile datalines missover;
datalines;
1 8561234 
1 8881246 
2 9412381 
4 8718912 
;

data Dataset2;
input Group_Id Account_Number Status_Code $;
datalines;
1 8561234  c01 
1 8881246  d85 
2 48981211 e48 
;

data want;

   if _N_ = 1 then do;
      dcl hash h(dataset : 'Dataset2(rename = Status_Code = Reject_code)');
      h.definekey('Group_Id');
      h.definedata('Reject_code');
      h.definedone();
   end; 

   set Dataset1;

   _N_ = h.find();
   
run;

The DATA to DATA Step Macro
Blog: SASnrd

sasuser_01 · Posted 11-24-2022 01:48 PM

Yes that's correct. Here is how output dataset , dataset1 & datset2 looks like . Please let me know

output dataset:
Group_Id Account_Number Reject_Code
1 8561234 c01
1 8881246 c01
2 9412381 e48
4 8718912 g9

Dataset1
Group_Id Account_Number
1 8561234
1 8881246
2 9412381
4 8718912

Dataset2:
Group_Id Reject_Code Status_Code
1 reject1 c01
1 reject2 d85
2 reject3 e48
2 reject4 f12
4 reject5 g9

PeterClemmensen · Posted 11-24-2022 02:34 PM

Did you try my code above? Also, the data you posted just not is not the same as the original data from your post.

The DATA to DATA Step Macro
Blog: SASnrd

Tom · Posted 11-24-2022 03:04 PM

Merge works fine for that example.

First let's create some actual datasets from the listings you posted.

data expect;
  input Group_Id Account_Number Reject_Code $;
cards;
1 8561234 c01
1 8881246 c01
2 9412381 e48
4 8718912 g9
;

data accounts;
  input Group_Id Account_Number;
cards;
1 8561234
1 8881246
2 9412381
4 8718912
;

data codes ;
  input Group_Id Reject_Code $ Status_Code $;
cards;
1 reject1 c01
1 reject2 d85
2 reject3 e48
2 reject4 f12
4 reject5 g9
;

And make sure they are sorted in the proper order. I assume the REJECT_CODE variable is the ordering variable in the CODES dataset.

proc sort data=codes ;
  by group_id reject_code ;
run;

proc sort data=accounts;
  by group_id account_number;
run;

Now just merge them and keep track of the first value of STATUS_CODE. Since the variable name in your expected result is already being used in the CODES dataset let's just use the name WANT in the data step. We can rename the two variables when it is written out. You can drop the REJECT_CODE and STATUS_CODE variables read from the CODES dataset if you want to get expected output exactly.

data want;
  merge accounts(in=in1) codes(in=in2);
  by group_id ;
  retain found want ;
  if first.group_id then found=0;
  if in2 and not found then do;
    found=1;
    want=status_code ;
  end;
  if in1 then output;
  in1=0;
  rename reject_code = reject_codeB want=reject_code;
  drop found;
run;

Results:

                   Account_    reject_    Status_    reject_
Obs    Group_Id     Number      codeB      Code       code

 1         1        8561234    reject1      c01        c01
 2         1        8881246    reject2      d85        c01
 3         2        9412381    reject3      e48        e48
 4         4        8718912    reject5      g9         g9

Of course it would be even easier to just eliminate the unwanted codes before the merge.

data codes2;
  set codes;
  by group_id;
  if first.group_id;
run;

data want;
  merge accounts(in=in1) codes2;
  by group_id;
  if in1;
run;

proc print;
run;

Again if you want you can DROP the REJECT_CODE variable and rename the STATUS_CODE variable to REJECT_CODE.

Tom · Posted 11-24-2022 01:28 PM

Since it looks like you are working with actual data in datasets this does not look like something where macro code would be of any value.

What is wrong with simply merging the two datasets by the common key variable?

data one;
  input Group_Id Account_Number;
cards;
1 8561234
1 8881246
2 9412381
4 8718912
;

data two;
  input Group_Id Account_Number Status_Code $ ;
cards;
1 8561234  c01
1 8881246  d85
2 48981211 e48
;

data want;
  merge one two;
  by group_id account_number;
run;

Result:

                   Account_    Status_
Obs    Group_Id     Number      Code

 1         1        8561234      c01
 2         1        8881246      d85
 3         2        9412381
 4         2       48981211      e48
 5         4        8718912

sasuser_01 · Posted 11-24-2022 02:06 PM

Issue with merging is :

For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.

By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.

Tom · Posted 11-24-2022 02:46 PM

@sasuser_01 wrote:

Issue with merging is :

For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.

By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.

That does not make any sense. Every observation in dataset1 will obviously be included unless you add something to remove them.

Please show some data that actually reflect the issues you are having with merging.

If you want to match by just GROUP_ID then do that. If you only want the first match then do that.

So this step will take the first value of STATUS_CODE and store it into the variable WANT. It will output every distinct value of GROUP_ID that appears in the ACCOUNT dataset.

data want;
  merge account(in=in1) codes(in=in2);
  by group_id;
  retain found ;
  if first.group_id then found=0;
  if in2 and not found then do;
    found=1;
    want = status_code;
    retain want;
  end;
  if last.group_id and in1;
run;

Reeza · Posted 11-24-2022 02:59 PM

@sasuser_01 wrote:

Issue with merging is :

For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.

By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.

Can you include an example of that scenario in your sample data

joeFurbee · Posted 01-04-2023 03:26 PM

Hi @sasuser_01,

This thread has gone a while with no activity. Were you able to attempt the solutions provided from other community members? Are you able to provide the requested data?

Thanks,

Joe

Join us for SAS Community Trivia
SAS Bowl XLIII, The New SAS Developer Portal
Wednesday, August 14, 2024, at 10 a.m. ET | #SASBowl

AlanC · Posted 11-24-2022 04:16 PM

See my example on my tips and tricks page: https://github.com/savian-net/SasTipsTricks#use-a-dataset-as-a-lookup-in-another-dataset

https://github.com/savian-net

Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

Re: Iterate over two datasets and update the flag

SAS Innovate 2025: Call for Content

Classroom Training Available!