I am required to merge 5 data sets into one. Not all data sets had the same variables so I have linked the datasets as best as I can. I am now having issues with a couple of the observations, for example:
Bahamas | 60 | 9 | ||||||||
Bahamas (the) | BS | BHS | 2.9 | 367168 | ||||||
Bahamas, The | BHS | 29930 | High Income |
This is clearly all data relating to the same country, however it is not in one row.
How can I make the data look like this?
Bahamas | BS | BHS | 29930 | High Income | 2.9 | 367168 | 60 | 9 |
Thanks
If you have different countries, convert the following code into a macro.
data original;
input country$14. val1$ val2$ val3 val4$11. val5$ val6$ val7$ val8$ val9$ val10$;
infile datalines missover dsd dlm="";
lines;
Bahamas 60 9
Bahamas (the) BS BHS 2.9 367168
Bahamas, The BHS 29930 High Income
;
run;
data _NULL_;
set original;
if not missing(val1) then call symputx("val1",val1);
else if not missing(val2) then call symputx("val2",val2);
else if not missing(val3) then call symputx("val3",val3);
else if not missing(val4) then call symputx("val4",val4);
else if not missing(val5) then call symputx("val5",val5);
else if not missing(val6) then call symputx("val6",val6);
else if not missing(val7) then call symputx("val7",val7);
else if not missing(val8) then call symputx("val8",val8);
else if not missing(val9) then call symputx("val9",val9);
else if not missing(val10) then call symputx("val10",val10);
run;
data final;
country = "Bahamas";
val1 = "&val1";
val2 = "&val2";
val3 = "&val3";
val4 = "&val4";
val5 = "&val5";
val6 = "&val6";
val7 = "&val7";
val8 = "&val8";
val9 = "&val9";
val10 = "&val10";
run;
You need to standardize the columns you're using to merge the data. SAS got products for this (DataFlux/Data Quality Server) but for your case it's likely good enough if you just analyse the result after your first go (as you already did) and then implement standardization as below. Then use the standardized column for merging.
proc format;
invalue $country_stdz(upcase)
'BAHAMAS' = 'Bahamas'
'BAHAMAS (THE)' = 'Bahamas'
'BAHAMAS, THE' = 'Bahamas'
;
run;
data demo;
infile datalines truncover;
input country $20.;
length country_stdz $20.;
country_stdz=input(country, $country_stdz.);
datalines;
Bahamas
Bahamas (the)
Bahamas, The
;
proc print;
run;
If you're reading your data from an external file then you can use the informat also directly in the input statement.
proc format;
invalue $country_stdz(upcase)
'BAHAMAS' = 'Bahamas'
'BAHAMAS (THE)' = 'Bahamas'
'BAHAMAS, THE' = 'Bahamas'
;
run;
data demo;
infile datalines truncover;
input country $country_stdz20.;
datalines;
Bahamas
Bahamas (the)
Bahamas, The
Other country
;
proc print;
run;
data original;
input country$14. val1$ val2$ val3 val4$11. val5$ val6$ val7$ val8$ val9$ val10$;
infile datalines missover dsd dlm="";
cards;
Bahamas 60 9
Bahamas (the) BS BHS 2.9 367168
Bahamas, The BHS 29930 High Income
;
run;
data temp;
set original;
id=scan(country,1,,'ka');
run;
data want;
update temp(obs=0) temp;
by id;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.