I am required to merge 5 data sets into one. Not all data sets had the same variables so I have linked the datasets as best as I can. I am now having issues with a couple of the observations, for example:
Bahamas | 60 | 9 | ||||||||
Bahamas (the) | BS | BHS | 2.9 | 367168 | ||||||
Bahamas, The | BHS | 29930 | High Income |
This is clearly all data relating to the same country, however it is not in one row.
How can I make the data look like this?
Bahamas | BS | BHS | 29930 | High Income | 2.9 | 367168 | 60 | 9 |
Thanks
If you have different countries, convert the following code into a macro.
data original;
input country$14. val1$ val2$ val3 val4$11. val5$ val6$ val7$ val8$ val9$ val10$;
infile datalines missover dsd dlm="";
lines;
Bahamas 60 9
Bahamas (the) BS BHS 2.9 367168
Bahamas, The BHS 29930 High Income
;
run;
data _NULL_;
set original;
if not missing(val1) then call symputx("val1",val1);
else if not missing(val2) then call symputx("val2",val2);
else if not missing(val3) then call symputx("val3",val3);
else if not missing(val4) then call symputx("val4",val4);
else if not missing(val5) then call symputx("val5",val5);
else if not missing(val6) then call symputx("val6",val6);
else if not missing(val7) then call symputx("val7",val7);
else if not missing(val8) then call symputx("val8",val8);
else if not missing(val9) then call symputx("val9",val9);
else if not missing(val10) then call symputx("val10",val10);
run;
data final;
country = "Bahamas";
val1 = "&val1";
val2 = "&val2";
val3 = "&val3";
val4 = "&val4";
val5 = "&val5";
val6 = "&val6";
val7 = "&val7";
val8 = "&val8";
val9 = "&val9";
val10 = "&val10";
run;
You need to standardize the columns you're using to merge the data. SAS got products for this (DataFlux/Data Quality Server) but for your case it's likely good enough if you just analyse the result after your first go (as you already did) and then implement standardization as below. Then use the standardized column for merging.
proc format;
invalue $country_stdz(upcase)
'BAHAMAS' = 'Bahamas'
'BAHAMAS (THE)' = 'Bahamas'
'BAHAMAS, THE' = 'Bahamas'
;
run;
data demo;
infile datalines truncover;
input country $20.;
length country_stdz $20.;
country_stdz=input(country, $country_stdz.);
datalines;
Bahamas
Bahamas (the)
Bahamas, The
;
proc print;
run;
If you're reading your data from an external file then you can use the informat also directly in the input statement.
proc format;
invalue $country_stdz(upcase)
'BAHAMAS' = 'Bahamas'
'BAHAMAS (THE)' = 'Bahamas'
'BAHAMAS, THE' = 'Bahamas'
;
run;
data demo;
infile datalines truncover;
input country $country_stdz20.;
datalines;
Bahamas
Bahamas (the)
Bahamas, The
Other country
;
proc print;
run;
data original;
input country$14. val1$ val2$ val3 val4$11. val5$ val6$ val7$ val8$ val9$ val10$;
infile datalines missover dsd dlm="";
cards;
Bahamas 60 9
Bahamas (the) BS BHS 2.9 367168
Bahamas, The BHS 29930 High Income
;
run;
data temp;
set original;
id=scan(country,1,,'ka');
run;
data want;
update temp(obs=0) temp;
by id;
run;
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.