BookmarkSubscribeRSS Feed
bourdeax
Fluorite | Level 6

Hello Everyone,

 

SAS 9.4

 

I have missing data in Table A, and Table B contains that missing data. How do I go about taking data from Table B and populating it into Table A? I have attached an over-simplified Excel file to help visualize what I am talking about. Table A is the original data source, Table B is the data source I want to use to populate the missing data in Table A, and Table C is what I would like the end result to be.

 

To be more specific, I am missing some StudentID's from my original data source (Table A). I want to use another data source (Table B) to populate the missing StudentID's in Table A by using a shared column between the two tables, PersonID.

 

Here is what I have tried so far, and I don't think it's working as I want it to.

 

 

proc sort data=TableA out=TableA_sort;
     by PersonID;
run;

proc sort data=TableB out=TableB_sort;
     by PersonID;
run;

data TableC;
     merge TableA_sort (in=inTableA_sort)
TableB_sort (in=inTableB_sort);
by PersonID; if inTableA_sort=1 and inTableB_sort=1; run;

 

 

Thank you

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20

You're close 🙂

 


data TableA;
input StudentID PersonID;
datalines;
. 32974
. 79961
. 76233
. 21620
12564 82765
64836 36324
63206 40631
74116 12069
67036 40832
47612 11502
;

data TableB;
input StudentID PersonID;
datalines;
24973 32974
91362 79961
10051 76233
19188 21620
;

proc sort data=TableA out=TableA_sort;
     by PersonID;
run;

proc sort data=TableB out=TableB_sort;
     by PersonID;
run;

data TableC;
     merge TableA_sort (in=inTableA_sort)
           TableB_sort (in=inTableB_sort);
     by PersonID;
run;
    
PeterClemmensen
Tourmaline | Level 20

Or

 

data TableC(drop=rc);
    if _N_=1 then do;
        declare hash h(dataset:'TableB');
        h.definekey('PersonID');
        h.definedata('StudentID');
        h.definedone();
    end;

    set TableA;

    rc=h.find();
run;
Tom
Super User Tom
Super User

Looks like a straight forward application of the UPDATE statement.  TABLEB is your master table. TABLEA is your set of transactions to use to make the updates, some of which have missing values that will be ignored.  PERSONID is your key variable.

data want;
 update tableb tablea;
 by personid;
run;

If you only want to keep the person's listed in the transaction dataset then use the IN= option and a subsetting if.

data want;
 update tableb tablea(in=intrans);
 by personid;
 if intrans;
run;

PS  Not clear why some feel it is easier to post spreadsheets instead of data steps to show sample data. I find it much easier to type data into lines of text that navigate a spreadsheet. See the first reply to your question for an example.

bourdeax
Fluorite | Level 6

Thanks for the response. I tried your method and got a dozen warnings stating "WARNING: The MASTER data set contains more than one observation for a BY group." As I am still relatively new to SAS, I am not sure what this means.. Your help is appreciated. 

 

PS tomayto, tomahto in my opinion. 

Tom
Super User Tom
Super User

That means your table has multiple values STUDENT ids for the same PERSON id. Which also means that it is useless for your original purpose of using it as the source for the right value of STUDENT id to use to replace the missing values in TABLEA.
What value do you want to use when the person has multiple values?

Perhaps there is some other variable like STATUS or DATE that you can use to filter it down to the one true value for STUDENT id per PERSON id?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1888 views
  • 0 likes
  • 3 in conversation