Hello. I'm using SAS 9.4. I have 2 datasets with an example shown below.
Dataset 1 Dataset 2
Person Lang_SourceA Person Lang_SourceB
Bob RU Bob SP
Bob SP Bob RU
Bob Bob GE
Bob Bob UK
Sam Sam CM
Sam CM Sam GE
Sam Sam UK
Sally RU Sally SP
Sally Sally UK
Sally Sally RU
I want the final merged dataset to look like this. The order of Lang_SourceB must stay the same. Lang_SourceA needs to be reordered within each person to match the value in Lang_SourceB:
Person Lang_SourceA Lang_SourceB
Bob SP SP
Bob RU RU
Bob GE
Bob UK
Sam CM CM
Sam GE
Sam UK
Sally SP
Sally UK
Sally RU RU
I have tried many variations of merge and Proc SQL with no good results. I would greatly appreciate any suggestions you can provide.
Thank you in advance.
I apologize for taking so long to respond. I was under pressure to get this completed so I went to another guru in my building. He gave me a good start and the result is below. Thank you for trying to help but I don't think I articulated the problem well enough to you.
PROC SQL;
CREATE TABLE LANGUAGE_COMBINED AS
SELECT CASE WHEN A.ASC IS NOT NULL AND B.ASC IS NOT NULL THEN 'MATCH'
WHEN A.ASC IS NOT NULL THEN 'PULL_OUT'
WHEN A.ASC IS NULL AND B.ASC IS NULL THEN 'DELETE'
ELSE 'REFRESH'
END AS MATCH_STATUS
,COALESCE(A.ASC,B.ASC) AS ASC
,COALESCE(A.LANGUAGE_CODE,B.LANGUAGE) AS CODE
,A.*
,B.*
FROM PULL_OUT_DATA as a
FULL OUTER JOIN REFRESH_FIXED_OUTPUT AS B
ON A.ASC EQ B.ASC
AND A.LANGUAGE_CODE EQ B.LANGUAGE
;
QUIT;
Since the order of Lang_Source2 in data set 2 cannot be changed, you could assign a consecutive integer to the values in data set two. Then do the merge, and sort by this consecutive integer value.
Are there other variables in the LEFT dataset? If not then just drop the empty records and merge by both variables.
Is the RIGHT dataset already sorted by PERSON and LANG_SOURCE? If not is there another variable that can be used to determine the order? If not then make one first.
Your example has the exact some number of observations in each group. If this is true then a simple merge will work.
data want;
merge left right ;
by person;
run;
If it is possible that some PERSON's have a different number of observations in one dataset versus the other then you will need to do more work to prevent the last value of the extra variables from the shorter one from being carried forward.
data want;
merge left right ;
by person;
output;
call missing(of _all_);
run;
There are more variables in both datasets. I only showed these for simplification. I have to keep the blank values which is why I said I need my output to look like the merged dataset. Each person has a different number of records. I need the values in Lang_SourceA to line up with the values in Lang_sourceB without changing the order in Lang_SourceB because these are in order for a reason. Dataset 1 is input from the customer and the order they used was just random. I'm now trying to come up with code that will loop through each group/person. I'm sure I'll need to use arrays but haven't figured it out yet.
@billsut1961 wrote:
I have to keep the blank values which is why I said I need my output to look like the merged dataset.
I don't think this is true. You can delete the blank values, and still get the output to look like the merged data set.
Both Tom and I have suggested ways to do this, have you tried either?
I don't understand what real world problem you are trying to solve here.
If the order of observations in the "B" dataset is important then you should really have a variable that indicates the order.
Either an absolute order.
data b_fixed;
set b ;
row + 1;
run;
Or one within a person.
data b_fixed;
set b ;
by person;
row + 1;
if first.person then row=1;
run;
Now you can use ROW to help you put the output back into the order you want after you have figured out how to combine the datasets.
Are you trying to match the observations based on both PERSON and LANG_SOURCE? If so how do you want to match those that do not have a lang_source of one of the datasets?
I apologize for taking so long to respond. I was under pressure to get this completed so I went to another guru in my building. He gave me a good start and the result is below. Thank you for trying to help but I don't think I articulated the problem well enough to you.
PROC SQL;
CREATE TABLE LANGUAGE_COMBINED AS
SELECT CASE WHEN A.ASC IS NOT NULL AND B.ASC IS NOT NULL THEN 'MATCH'
WHEN A.ASC IS NOT NULL THEN 'PULL_OUT'
WHEN A.ASC IS NULL AND B.ASC IS NULL THEN 'DELETE'
ELSE 'REFRESH'
END AS MATCH_STATUS
,COALESCE(A.ASC,B.ASC) AS ASC
,COALESCE(A.LANGUAGE_CODE,B.LANGUAGE) AS CODE
,A.*
,B.*
FROM PULL_OUT_DATA as a
FULL OUTER JOIN REFRESH_FIXED_OUTPUT AS B
ON A.ASC EQ B.ASC
AND A.LANGUAGE_CODE EQ B.LANGUAGE
;
QUIT;
data dataset1;
infile cards truncover;
input Person $ Lang_SourceA $ ;
cards;
Bob RU
Bob SP
Bob
Bob
Sam
Sam CM
Sam
Sally RU
Sally
Sally
;
data dataset2;
infile cards truncover;
input Person $ Lang_SourceB $;
cards;
Bob SP
Bob RU
Bob GE
Bob UK
Sam CM
Sam GE
Sam UK
Sally SP
Sally UK
Sally RU
;
data want;
if _n_=1 then do;
if 0 then set dataset1;
declare hash h(dataset:'dataset1');
h.definekey('Person','Lang_SourceA');
h.definedone();
end;
set dataset2;
call missing(Lang_SourceA);
if h.check(key:Person,key:Lang_SourceB)=0 then Lang_SourceA=Lang_SourceB;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.