Solved: Merging/Joining 2 Datasets Where 1 Column has Blanks

billsut1961 · Posted 03-20-2023 10:31 AM

Hello. I'm using SAS 9.4. I have 2 datasets with an example shown below.

Dataset 1 Dataset 2

Person Lang_SourceA Person Lang_SourceB

Bob RU Bob SP

Bob SP Bob RU

Bob Bob GE

Bob Bob UK

Sam Sam CM

Sam CM Sam GE

Sam Sam UK

Sally RU Sally SP

Sally Sally UK

Sally Sally RU

I want the final merged dataset to look like this. The order of Lang_SourceB must stay the same. Lang_SourceA needs to be reordered within each person to match the value in Lang_SourceB:

Person Lang_SourceA Lang_SourceB

Bob SP SP

Bob RU RU

Bob GE

Bob UK

Sam CM CM

Sam GE

Sam UK

Sally SP

Sally UK

Sally RU RU

I have tried many variations of merge and Proc SQL with no good results. I would greatly appreciate any suggestions you can provide.

Thank you in advance.

billsut1961 · Posted 03-21-2023 01:28 PM

I apologize for taking so long to respond. I was under pressure to get this completed so I went to another guru in my building. He gave me a good start and the result is below. Thank you for trying to help but I don't think I articulated the problem well enough to you.

PROC SQL;

CREATE TABLE LANGUAGE_COMBINED AS
SELECT CASE WHEN A.ASC IS NOT NULL AND B.ASC IS NOT NULL THEN 'MATCH'
WHEN A.ASC IS NOT NULL THEN 'PULL_OUT'
WHEN A.ASC IS NULL AND B.ASC IS NULL THEN 'DELETE'
ELSE 'REFRESH'
END AS MATCH_STATUS
,COALESCE(A.ASC,B.ASC) AS ASC
,COALESCE(A.LANGUAGE_CODE,B.LANGUAGE) AS CODE
,A.*
,B.*
FROM PULL_OUT_DATA as a
FULL OUTER JOIN REFRESH_FIXED_OUTPUT AS B
ON A.ASC EQ B.ASC
AND A.LANGUAGE_CODE EQ B.LANGUAGE

;

QUIT;

View solution in original post

PaigeMiller · Posted 03-20-2023 10:41 AM

Since the order of Lang_Source2 in data set 2 cannot be changed, you could assign a consecutive integer to the values in data set two. Then do the merge, and sort by this consecutive integer value.

--
Paige Miller

Tom · Posted 03-20-2023 10:49 AM

Are there other variables in the LEFT dataset? If not then just drop the empty records and merge by both variables.

Is the RIGHT dataset already sorted by PERSON and LANG_SOURCE? If not is there another variable that can be used to determine the order? If not then make one first.

Your example has the exact some number of observations in each group. If this is true then a simple merge will work.

data want;
  merge left right ;
  by person;
run;

If it is possible that some PERSON's have a different number of observations in one dataset versus the other then you will need to do more work to prevent the last value of the extra variables from the shorter one from being carried forward.

data want;
  merge left right ;
  by person;
  output;
  call missing(of _all_);
run;

billsut1961 · Posted 03-20-2023 11:12 AM

There are more variables in both datasets. I only showed these for simplification. I have to keep the blank values which is why I said I need my output to look like the merged dataset. Each person has a different number of records. I need the values in Lang_SourceA to line up with the values in Lang_sourceB without changing the order in Lang_SourceB because these are in order for a reason. Dataset 1 is input from the customer and the order they used was just random. I'm now trying to come up with code that will loop through each group/person. I'm sure I'll need to use arrays but haven't figured it out yet.

PaigeMiller · Posted 03-20-2023 11:24 AM

@billsut1961 wrote:

I have to keep the blank values which is why I said I need my output to look like the merged dataset.

I don't think this is true. You can delete the blank values, and still get the output to look like the merged data set.

Both Tom and I have suggested ways to do this, have you tried either?

--
Paige Miller

Tom · Posted 03-20-2023 11:25 AM

I don't understand what real world problem you are trying to solve here.

If the order of observations in the "B" dataset is important then you should really have a variable that indicates the order.

Either an absolute order.

data b_fixed;
  set b ;
  row + 1;
run;

Or one within a person.

data b_fixed;
  set b ;
  by person;
  row + 1;
  if first.person then row=1;
run;

Now you can use ROW to help you put the output back into the order you want after you have figured out how to combine the datasets.

Are you trying to match the observations based on both PERSON and LANG_SOURCE? If so how do you want to match those that do not have a lang_source of one of the datasets?

billsut1961 · Posted 03-21-2023 01:28 PM

I apologize for taking so long to respond. I was under pressure to get this completed so I went to another guru in my building. He gave me a good start and the result is below. Thank you for trying to help but I don't think I articulated the problem well enough to you.

PROC SQL;

CREATE TABLE LANGUAGE_COMBINED AS
SELECT CASE WHEN A.ASC IS NOT NULL AND B.ASC IS NOT NULL THEN 'MATCH'
WHEN A.ASC IS NOT NULL THEN 'PULL_OUT'
WHEN A.ASC IS NULL AND B.ASC IS NULL THEN 'DELETE'
ELSE 'REFRESH'
END AS MATCH_STATUS
,COALESCE(A.ASC,B.ASC) AS ASC
,COALESCE(A.LANGUAGE_CODE,B.LANGUAGE) AS CODE
,A.*
,B.*
FROM PULL_OUT_DATA as a
FULL OUTER JOIN REFRESH_FIXED_OUTPUT AS B
ON A.ASC EQ B.ASC
AND A.LANGUAGE_CODE EQ B.LANGUAGE

;

QUIT;

Ksharp · Posted 03-22-2023 07:39 AM

data dataset1;
infile cards truncover;
input Person $ Lang_SourceA  $ ;
cards;
Bob       RU           
Bob       SP           
Bob                    
Bob                    
Sam                    
Sam      CM            
Sam                    
Sally      RU          
Sally                  
Sally                  
;

data dataset2;
infile cards truncover;
input Person  $ Lang_SourceB $;
cards;
           Bob       SP
           Bob       RU
              Bob       GE
              Bob      UK
             Sam     CM
         Sam     GE
             Sam     UK
            Sally    SP
               Sally    UK
               Sally    RU
;

data want;
 if _n_=1 then do;
  if 0 then set dataset1;
  declare hash h(dataset:'dataset1');
  h.definekey('Person','Lang_SourceA');
  h.definedone();
 end;
 set dataset2;
call missing(Lang_SourceA);
if h.check(key:Person,key:Lang_SourceB)=0 then Lang_SourceA=Lang_SourceB;
 run;

Merging/Joining 2 Datasets Where 1 Column has Blanks

Re: Merging/Joining 2 Datasets Where 1 Column has Blanks

Re: Merging/Joining 2 Datasets Where 1 Column has Blanks

Re: Merging/Joining 2 Datasets Where 1 Column has Blanks

Re: Merging/Joining 2 Datasets Where 1 Column has Blanks

Re: Merging/Joining 2 Datasets Where 1 Column has Blanks

Re: Merging/Joining 2 Datasets Where 1 Column has Blanks

Re: Merging/Joining 2 Datasets Where 1 Column has Blanks

Re: Merging/Joining 2 Datasets Where 1 Column has Blanks

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away