BookmarkSubscribeRSS Feed
sasuser_01
Fluorite | Level 6

Hello ,

 

I have 2 different datasets, listed below. I have requirement - for every account in dataset1 it should iterate through each row in dataset2 and check if dataset1.groupid=dataset2.groupid and if it matches update  dataset2.status_code value in dataset1.reject_code. Here only the clause reject_code should be updated with first matching row of dataset2 and then it can break the dataset2 loop and skip to next account in dataset1.

 

Dataset1

Group_Id Account_Number  Reject_code

1               8561234                

1               8881246                

2               9412381                

4               8718912                

 

Dataset2

Group_Id Reject_Code Status_Code 

1               8561234                c01

1               8881246                d85

2               48981211               e48

 

Here is the pseudo code am thinking of but couldn't do working implementation in sas.

loopcnt is no of rows of dataset1 and loopcntr is no of rows in dataset2. Please ignore syntax mistakes and treat it as pseudo code.


Can you please advise on how to implement this as am stuck with it ?

 

let %i=1;

let %j=1;

%do %until (&i < &loopcnt);
        %do %until (&j <&loopcntr)

            if dataset1[i].group_id=dataset2[j].group_id

             then update dataset1.reject_code=dataset.status_code.

         %end

%end

 

12 REPLIES 12
Reeza
Super User

If those are the input data sets, what is the expected output? Since you mention that only the first one should be updated, please include a few cases where a record may not be updated or there are multiple records to consider and only the first should be chosen. 

 


@sasuser_01 wrote:

Hello ,

 

I have 2 different datasets, listed below. I have requirement - for every account in dataset1 it should iterate through each row in dataset2 and check if dataset1.groupid=dataset2.groupid and if it matches update  dataset2.status_code value in dataset1.reject_code. Here only the clause reject_code should be updated with first matching row of dataset2 and then it can break the dataset2 loop and skip to next account in dataset1.

 

Dataset1

Group_Id Account_Number  Reject_code

1               8561234                

1               8881246                

2               9412381                

4               8718912                

 

Dataset2

Group_Id Reject_Code Status_Code 

1               8561234                c01

1               8881246                d85

2               48981211               e48

 

Here is the pseudo code am thinking of but couldn't do working implementation in sas.

loopcnt is no of rows of dataset1 and loopcntr is no of rows in dataset2. Please ignore syntax mistakes and treat it as pseudo code.


Can you please advise on how to implement this as am stuck with it ?

 

let %i=1;

let %j=1;

%do %until (&i < &loopcnt);
        %do %until (&j <&loopcntr)

            if dataset1[i].group_id=dataset2[j].group_id

             then update dataset1.reject_code=dataset.status_code.

         %end

%end

 


 

PeterClemmensen
Tourmaline | Level 20

If I understand your requirement correctly, your desired result looks like this, right?

 

Group_Id Account_Number  Reject_code
1        8561234         c01
1        8881246         c01
2        9412381         e48
4        8718912          
PeterClemmensen
Tourmaline | Level 20

In which case, try this

 

data Dataset1;
input Group_Id Account_Number Reject_code $;
infile datalines missover;
datalines;
1 8561234 
1 8881246 
2 9412381 
4 8718912 
;

data Dataset2;
input Group_Id Account_Number Status_Code $;
datalines;
1 8561234  c01 
1 8881246  d85 
2 48981211 e48 
;

data want;

   if _N_ = 1 then do;
      dcl hash h(dataset : 'Dataset2(rename = Status_Code = Reject_code)');
      h.definekey('Group_Id');
      h.definedata('Reject_code');
      h.definedone();
   end; 

   set Dataset1;

   _N_ = h.find();
   
run;
sasuser_01
Fluorite | Level 6


Yes that's correct. Here is how output dataset , dataset1 & datset2 looks like . Please let me know

output dataset:
Group_Id Account_Number Reject_Code
1              8561234                c01
1              8881246                c01
2              9412381                e48
4              8718912                g9

Dataset1
Group_Id Account_Number
1                8561234
1                8881246
2                9412381
4                8718912

Dataset2:
Group_Id Reject_Code Status_Code
1               reject1           c01
1               reject2           d85
2               reject3           e48
2               reject4           f12
4               reject5           g9

PeterClemmensen
Tourmaline | Level 20

Did you try my code above? Also, the data you posted just not is not the same as the original data from your post.

Tom
Super User Tom
Super User

Merge works fine for that example.

First let's create some actual datasets from the listings you posted.

data expect;
  input Group_Id Account_Number Reject_Code $;
cards;
1 8561234 c01
1 8881246 c01
2 9412381 e48
4 8718912 g9
;

data accounts;
  input Group_Id Account_Number;
cards;
1 8561234
1 8881246
2 9412381
4 8718912
;

data codes ;
  input Group_Id Reject_Code $ Status_Code $;
cards;
1 reject1 c01
1 reject2 d85
2 reject3 e48
2 reject4 f12
4 reject5 g9
;

And make sure they are sorted in the proper order.  I assume the REJECT_CODE variable is the ordering variable in the CODES dataset.

proc sort data=codes ;
  by group_id reject_code ;
run;

proc sort data=accounts;
  by group_id account_number;
run;

Now just merge them and keep track of the first value of STATUS_CODE.  Since the variable name in your expected result is already being used in the CODES dataset let's just use the name WANT in the data step.  We can rename the two variables when it is written out.  You can drop the REJECT_CODE and STATUS_CODE variables read from the CODES dataset if you want to get expected output exactly.

data want;
  merge accounts(in=in1) codes(in=in2);
  by group_id ;
  retain found want ;
  if first.group_id then found=0;
  if in2 and not found then do;
    found=1;
    want=status_code ;
  end;
  if in1 then output;
  in1=0;
  rename reject_code = reject_codeB want=reject_code;
  drop found;
run;

Results:

                   Account_    reject_    Status_    reject_
Obs    Group_Id     Number      codeB      Code       code

 1         1        8561234    reject1      c01        c01
 2         1        8881246    reject2      d85        c01
 3         2        9412381    reject3      e48        e48
 4         4        8718912    reject5      g9         g9

Of course it would be even easier to just eliminate the unwanted codes before the merge.

data codes2;
  set codes;
  by group_id;
  if first.group_id;
run;

data want;
  merge accounts(in=in1) codes2;
  by group_id;
  if in1;
run;

proc print;
run;

Again if you want you can DROP the REJECT_CODE variable and rename the STATUS_CODE variable to REJECT_CODE.

Tom
Super User Tom
Super User

Since it looks like you are working with actual data in datasets this does not look like something where macro code would be of any value.

 

What is wrong with simply merging the two datasets by the common key variable?

data one;
  input Group_Id Account_Number;
cards;
1 8561234
1 8881246
2 9412381
4 8718912
;

data two;
  input Group_Id Account_Number Status_Code $ ;
cards;
1 8561234  c01
1 8881246  d85
2 48981211 e48
;

data want;
  merge one two;
  by group_id account_number;
run;

Result:

                   Account_    Status_
Obs    Group_Id     Number      Code

 1         1        8561234      c01
 2         1        8881246      d85
 3         2        9412381
 4         2       48981211      e48
 5         4        8718912

sasuser_01
Fluorite | Level 6

 

Issue with merging is : 

For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code  ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.

 

By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.

 

 

 

 

 

 

 

 

 

Tom
Super User Tom
Super User

@sasuser_01 wrote:

 

Issue with merging is : 

For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code  ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.

 

By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.


That does not make any sense.  Every observation in dataset1 will obviously be included unless you add something to remove them.

 

Please show some data that actually reflect the issues you are having with merging.

 

If you want to match by just GROUP_ID then do that.  If you only want the first match then do that.

So this step will take the first value of STATUS_CODE and store it into the variable WANT.  It will output every distinct value of GROUP_ID that appears in the ACCOUNT dataset.

data want;
  merge account(in=in1) codes(in=in2);
  by group_id;
  retain found ;
  if first.group_id then found=0;
  if in2 and not found then do;
    found=1;
    want = status_code;
    retain want;
  end;
  if last.group_id and in1;
run;
  

 

Reeza
Super User

@sasuser_01 wrote:

 

Issue with merging is : 

For every account number in dataset1 it should iterate through dataset2 and get the first matching status_code  ( based on dataset1.group_id =dataset2.group_id) . I have updated how output dataset, dataset1 and dataset2 should look like in other response. You can refer to that.

 

By merging, it doesn't check first matching condition and it just merges based on clause of groupid and account number. Also we want every account_number in datset1 even though corresponding group id is not present in dataset2.

 

 


Can you include an example of that scenario in your sample data

joeFurbee
Community Manager

Hi @sasuser_01,

This thread has gone a while with no activity. Were you able to attempt the solutions provided from other community members? Are you able to provide the requested data? 

 

Thanks,

Joe


Join us for SAS Community Trivia
SAS Bowl XL, SAS Innovate 2024 Recap
Wednesday, May 15, 2024, at 10 a.m. ET | #SASBowl

AlanC
Barite | Level 11
See my example on my tips and tricks page: https://github.com/savian-net/SasTipsTricks#use-a-dataset-as-a-lookup-in-another-dataset
https://github.com/savian-net

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1907 views
  • 0 likes
  • 6 in conversation