BookmarkSubscribeRSS Feed
Gloveman71
Calcite | Level 5

I have a sas dataset used as a master file that is accessed with a set of 4 variables to form a key. Each record in the dataset is unique based on the 4 variables, but there can be multiple records with the same values for some of the key variables:  EX:

Key Variables:

 Host, port, PluginID, CVE

   abc   1        1             1

   abc   1        1             2

   abc   1        2             2

 

I have a 2nd file that has some additional variables I wish to add to the master file. The key in this file is just  'HOST' and each record is unique.  EX:

 Host    NewVar1      NewVar2      NewVar3

   abc         111               222              333

   aaa         111               333              444

 

I have tried many different ideas on how to apply the updates from file 2 to ALL matching records in file 1     Merge, Update, Modify.  All of these only apply the update to the 1st occurance in the master file.  All other matching occurrances in the master file are not updated.

 

Question(Finally):  How can I accomplish the above task of updating ALL of the matching records in the master file with the 2nd file?

 

Thanx for the assist,

Jim Glover

13 REPLIES 13
Kurt_Bremser
Super User

This done with a simple data step merge:

data master;
input host $ port plugin cve;
cards;
abc 1 1 1
abc 1 1 2
abc 1 2 2
;
run;

data updt;
input host $ newvar1 newvar2 newvar3;
cards;
abc 111 222 333
aaa 111 333 444
;
run;

proc sort data=updt;
by host;
run;

proc sort data=master;
by host port plugin cve;
run;

data want;
merge
  master (in=a)
  updt
;
by host;
if a;
run;

proc print data=want noobs;
run;

Result:

host    port    plugin    cve    newvar1    newvar2    newvar3

abc       1        1       1       111        222        333  
abc       1        1       2       111        222        333  
abc       1        2       2       111        222        333  
Gloveman71
Calcite | Level 5

Sorry,

This solution works the same as all the ones I tried.  It only updates the 1st occurrance of the key match in the master file. The rest of the entries in the master with the same Host key were not updated.

 

Thanx,

Jim Glover

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep in the body of the post using the code window - {i} above post window.  Also show what you want as output so it is clear.  Not sure from what you have posted what is to be updated where, perhaps a simple merge of table 2 onto table 1 by host, and then if first.host check the merge data and update the others:

data want;
  merge base (in=a) upd (in=b);
  by host;
  if a;
  if first.host then do;
    post=char(newvar1,1);
    plugin=char(newvar1,2);
    cve=char(newvar1,3);
  end;
run;
Gloveman71
Calcite | Level 5

Sorry, the output example was not listed,  Please see below:

Key Variables - Master File  are Host,  port, Pluginid, CVE

 Host, port, PluginID, CVE   Var1    Var2  Varx

   abc   1        1             1

   abc   1        1             2

   abc   1        2             2

 

I have a 2nd file that has some additional variables I wish to add to the master file. The key in this file is just  'HOST' and each record is unique.  EX:

Key variable  -  Update File    Host Only

 Host    NewVar1      NewVar2      NewVar3

   abc         111               222              333

   aaa         111               333              444

 

Output for Updated Master:

Host    Port   Pluginid   CVE   Var1   Var2   Varx  NewVar1   NewVar2   NewVar3

abc       1           1            1        A        B         C        111            222           333

abc       1           1            2        A        B         C        111            222           333

abc       1           2            2        A        B         C        111            222           333

aaa       1           2           2        A         B         C        111            222           333

 

As you see, there are move than 1 occurrance of Host("ABC") in the master file, all need to be updated with the data from file2.  All the techniques I try only update the 1st occurrance of "ABC" in the master file.

 

Hope this explains it better.

 

Thanx,

Jim Glover

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post examples as datasteps so that we can run it on our machines.  For the example you give, @Kurt_Bremser has provided you the solution.  You merge the two datasets based on the key variable host, and the default is many to many join, then you apply the if a condition to restrict it only to the rows in the main table.  This is exactly how it should work, therefore please supply examples of exactly what data it is and what you are doing, as you can't be following that logic, or your data doesn't look as provided.

Gloveman71
Calcite | Level 5

Here is the code I used:

dm 'clear log;';

filename mycode "/sasdata/home/jagc/sasuser.v94/sas_apm/SASCODE1" ;

libname secr "/sasdata/win_shares/corpfs01/SHARED/CAPACITY/SAS/SAS_APM/Security" ;

data work.master_101017_update ;

merge

WORK.MASTER_101017_SORT1_NEW (in=a)

WORK.OWNER_SORT1

;

by host;

if a;

run;

 

File 2 Contains:

Host           NewVar1      NewVar2       NewVar310.4.66.182     Jimmy        Tommy         Bobby    

 

Here is a sample of the output:

   Host     Port  Pluginid  CVE          Var1  Var2    Var3     Var4       NewVar1    NewVar2    NewVar3

10.4.66.182   0   35453                 OpenN 570461 10/10/17 ESS WINDOWS  Jimmy      Tommy       Bobby
10.4.66.182  445  11457                 Open 244202  09/15/17    
10.4.66.182  445  21725                 Open 244203  09/15/17 ESS WINDOWS   
10.4.66.182  445  25371 CVE-2007-0328   Open 244204  09/15/17 ESS WINDOWS   
10.4.66.18   445  27599 CVE-2007-5660   Open 244205  09/15/17 ESS WINDOWS   

 

You can see that the host(10.4.66.182) has multiple entries in the master file, but only the 1st occurrance of that Host was updated with the new variables(NewVar1,NewVar2,NewVar3) The other occurrances of that host were not updated

 

Does this help?

Jim Glover

RW9
Diamond | Level 26 RW9
Diamond | Level 26

The below code works correctly for me.  Unless you can provide your data as a datastep (see my example), then I can only guess.  Perhaps there are spaces in there, or special characters, maybe the data isn't like that at all?

data main;
  length host port plugin cve var1-var4 $200;
  input host $ port $ plugin $ cve $ var1 $ var2 $ var3 $ var4 $;
datalines;
10.4.66.182 0 35453 _ OpenN 570461 10/10/17 ESS_WINDOWS
10.4.66.182 445 11457 _ Open 244202 09/15/17 _
10.4.66.182 445 21725 _ Open 244202 9/15/17 ESS_WINDOWS   
10.4.66.182 445 25371 CVE-2007-0328 Open 244204 09/15/17 ESS_WINDOWS   
10.4.66.18 445 27599 CVE-2007-5660 Open 244205 09/15/17 ESS_WINDOWS 
;
run;

data upd;
  length host newvar1 newvar2 newvar3 $200;
  input host $ newvar1 $ newvar2 $ newvar3 $;
datalines;
10.4.66.182 Jimmy Tommy Bobby
;
run;

proc sort data=main;
  by host;
run;

data want;
  merge main (in=a) upd;
  by host;
  if a;
run;
Gloveman71
Calcite | Level 5
Can we talk on the phone, I can setup a Webex to show you what I am doing. Please let me know, and I will provide the info.
Thanx,
Jim Glover
ballardw
Super User

Sounds like you want a left join in SQL:

 

data master;
input host $ port plugin cve;
cards;
abc 1 1 1
abc 1 1 2
abc 1 2 2
aaa 2 3 4
;
run;

data updt;
input host $ newvar1 newvar2 newvar3;
cards;
abc 111 222 333
aaa 111 333 444
;
run;

proc sql;
   create table want as
   select a.*, b.newvar1, b.newvar2, b.newvar3
   from master as a
        left join
        updt as b
        on a.host=b.host
   ;
run;
Gloveman71
Calcite | Level 5

Awesome, Looks like you solution work as I need.  Thanx very much for all your help.

 

Thanx again,

Jim Glover

Tom
Super User Tom
Super User

I assume that you problem is that you are trying to change values of EXISTING variables in the "master" dataset.  If you use a data step MERGE to combine a 1 to Many situation then the "1" record is only read once.  So you need to merge on NEW variables from the "1" table if you want the values to be retained onto all of the "many" records from the other table.

 

If you want to update all of the records then just drop the old variables.  So if your UPDATES table has just A,VAR1 and VAR2 you could write like this.  If there are other variables add them to the drop list.

data want ;
  merge master(drop=var1 var2) updates ;
  by A;
run;

If the updates are only for some of the observations in MASTER and you want the others to keep their existing values then you need to work a little harder.  Rename the version coming from UPDATES dataset and then conditionally replace the values in the original variable VAR1 with the value from that renamed variable. Drop the extra column(s).

data want ;
  merge master(in=in1) updates(in=in2 rename=(var1=newvar1 var2=newvar2));
  by A;
  if in2 then do;
    var1=newvar1;
    var2=newvar2;
  end;
  drop newvar1 newvar2;
run;

If you want to update in place the you probable can figure out how to do it with MODIFY or you could switch to SQL.

proc sql ;
update master
set var1=(select var1 from updates where updates.A=master.A)
  , var2=(select var2 from updates where updates.A=master.A)
;
quit;

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4555 views
  • 3 likes
  • 5 in conversation