BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
billsut1961
Fluorite | Level 6

Hello.  I'm using SAS 9.4.  I have 2 datasets with an example shown below.

 

Dataset 1                                                                 Dataset 2

 

Person  Lang_SourceA                                          Person  Lang_SourceB

Bob       RU                                                             Bob       SP

Bob       SP                                                             Bob       RU

Bob                                                                         Bob       GE

Bob                                                                         Bob      UK

Sam                                                                        Sam     CM

Sam      CM                                                            Sam     GE

Sam                                                                        Sam     UK

Sally      RU                                                             Sally    SP

Sally                                                                        Sally    UK

Sally                                                                        Sally    RU

 

I want the final merged dataset to look like this.  The order of Lang_SourceB must stay the same.  Lang_SourceA needs to be reordered within each person to match the value in Lang_SourceB:

 

Person  Lang_SourceA  Lang_SourceB

Bob       SP                     SP

Bob        RU                    RU

Bob                                 GE

Bob                                 UK

Sam      CM                    CM

Sam                               GE

Sam                               UK

Sally                               SP

Sally                               UK

Sally     RU                     RU

 

I have tried many variations of merge and Proc SQL with no good results.  I would greatly appreciate any suggestions you can provide.

 

Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
billsut1961
Fluorite | Level 6

I apologize for taking so long to respond.  I was under pressure to get this completed so I went to another guru in my building.  He gave me a good start and the result is below.  Thank you for trying to help but I don't think I articulated the problem well enough to you.

 

PROC SQL;

CREATE TABLE LANGUAGE_COMBINED AS
SELECT CASE WHEN A.ASC IS NOT NULL AND B.ASC IS NOT NULL THEN 'MATCH'
WHEN A.ASC IS NOT NULL THEN 'PULL_OUT'
WHEN A.ASC IS NULL AND B.ASC IS NULL THEN 'DELETE'
ELSE 'REFRESH'
END AS MATCH_STATUS
,COALESCE(A.ASC,B.ASC) AS ASC
,COALESCE(A.LANGUAGE_CODE,B.LANGUAGE) AS CODE
,A.*
,B.*
FROM PULL_OUT_DATA as a
FULL OUTER JOIN REFRESH_FIXED_OUTPUT AS B
ON A.ASC EQ B.ASC
AND A.LANGUAGE_CODE EQ B.LANGUAGE

;

QUIT;

View solution in original post

7 REPLIES 7
PaigeMiller
Diamond | Level 26

Since the order of Lang_Source2 in data set 2 cannot be changed, you could assign a consecutive integer to the values in data set two. Then do the merge, and sort by this consecutive integer value.

--
Paige Miller
Tom
Super User Tom
Super User

Are there other variables in the LEFT dataset?  If not then just drop the empty records and merge by both variables. 

 

Is the RIGHT dataset already sorted by PERSON and LANG_SOURCE? If not is there another variable that can be used to determine the order?  If not then make one first.

 

Your example has the exact some number of observations in each group.  If this is true then a simple merge will work.

data want;
  merge left right ;
  by person;
run;

If it is possible that some PERSON's have a different number of observations in one dataset versus the other then you will need to do more work to prevent the last value of the extra variables from the shorter one from being carried forward.

 

data want;
  merge left right ;
  by person;
  output;
  call missing(of _all_);
run;

 

 

 

 

billsut1961
Fluorite | Level 6

There are more variables in both datasets.  I only showed these for simplification.  I have to keep the blank values which is why I said I need my output to look like the merged dataset.  Each person has a different number of records.  I need the values in Lang_SourceA to line up with the values in Lang_sourceB without changing the order in Lang_SourceB because these are in order for a reason.  Dataset 1 is input from the customer and the order they used was just random.  I'm now trying to come up with code that will loop through each group/person.  I'm sure I'll need to use arrays but haven't figured it out yet.

PaigeMiller
Diamond | Level 26

@billsut1961 wrote:

I have to keep the blank values which is why I said I need my output to look like the merged dataset.  


I don't think this is true. You can delete the blank values, and still get the output to look like the merged data set.

 

Both Tom and I have suggested ways to do this, have you tried either?

--
Paige Miller
Tom
Super User Tom
Super User

I don't understand what real world problem you are trying to solve here.

 

If the order of observations in the "B" dataset is important then you should really have a variable that indicates the order.

Either an absolute order.

data b_fixed;
  set b ;
  row + 1;
run;

Or one within a person.

data b_fixed;
  set b ;
  by person;
  row + 1;
  if first.person then row=1;
run;

Now you can use ROW to help you put the output back into the order you want after you have figured out how to combine the datasets.

 

Are you trying to match the observations based on both PERSON and LANG_SOURCE?  If so how do you want to match those that do not have a lang_source of one of the datasets?

 

billsut1961
Fluorite | Level 6

I apologize for taking so long to respond.  I was under pressure to get this completed so I went to another guru in my building.  He gave me a good start and the result is below.  Thank you for trying to help but I don't think I articulated the problem well enough to you.

 

PROC SQL;

CREATE TABLE LANGUAGE_COMBINED AS
SELECT CASE WHEN A.ASC IS NOT NULL AND B.ASC IS NOT NULL THEN 'MATCH'
WHEN A.ASC IS NOT NULL THEN 'PULL_OUT'
WHEN A.ASC IS NULL AND B.ASC IS NULL THEN 'DELETE'
ELSE 'REFRESH'
END AS MATCH_STATUS
,COALESCE(A.ASC,B.ASC) AS ASC
,COALESCE(A.LANGUAGE_CODE,B.LANGUAGE) AS CODE
,A.*
,B.*
FROM PULL_OUT_DATA as a
FULL OUTER JOIN REFRESH_FIXED_OUTPUT AS B
ON A.ASC EQ B.ASC
AND A.LANGUAGE_CODE EQ B.LANGUAGE

;

QUIT;

Ksharp
Super User
data dataset1;
infile cards truncover;
input Person $ Lang_SourceA  $ ;
cards;
Bob       RU           
Bob       SP           
Bob                    
Bob                    
Sam                    
Sam      CM            
Sam                    
Sally      RU          
Sally                  
Sally                  
;

data dataset2;
infile cards truncover;
input Person  $ Lang_SourceB $;
cards;
           Bob       SP
           Bob       RU
              Bob       GE
              Bob      UK
             Sam     CM
         Sam     GE
             Sam     UK
            Sally    SP
               Sally    UK
               Sally    RU
;

data want;
 if _n_=1 then do;
  if 0 then set dataset1;
  declare hash h(dataset:'dataset1');
  h.definekey('Person','Lang_SourceA');
  h.definedone();
 end;
 set dataset2;
call missing(Lang_SourceA);
if h.check(key:Person,key:Lang_SourceB)=0 then Lang_SourceA=Lang_SourceB;
 run;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2536 views
  • 0 likes
  • 4 in conversation