BookmarkSubscribeRSS Feed
breebreebree
Calcite | Level 5

I am required to merge 5 data sets into one. Not all data sets had the same variables so I have linked the datasets as best as I can. I am now having issues with a couple of the observations, for example: 

 

Bahamas       609 
Bahamas (the)BSBHS   2.9367168   
Bahamas, The BHS29930High Income      

 

This is clearly all data relating to the same country, however it is not in one row. 

 

How can I make the data look like this?

 

BahamasBSBHS29930High Income 2.9367168609 

 

Thanks

3 REPLIES 3
VENKATAMAHESH
Calcite | Level 5

If you have different countries, convert the following code into a macro.

data original;
input country$14. val1$ val2$ val3 val4$11. val5$ val6$ val7$ val8$ val9$ val10$;
infile datalines missover dsd dlm="";
lines;
Bahamas                        60 9 
Bahamas (the) BS BHS              2.9 367168
Bahamas, The   BHS 29930 High Income
;
run;

data _NULL_;
set original;
if not missing(val1) then call symputx("val1",val1);
else if not missing(val2) then call symputx("val2",val2);
else if not missing(val3) then call symputx("val3",val3);
else if not missing(val4) then call symputx("val4",val4);
else if not missing(val5) then call symputx("val5",val5);
else if not missing(val6) then call symputx("val6",val6);
else if not missing(val7) then call symputx("val7",val7);
else if not missing(val8) then call symputx("val8",val8);
else if not missing(val9) then call symputx("val9",val9);
else if not missing(val10) then call symputx("val10",val10);
run;

data final;
country = "Bahamas";
val1 = "&val1";
val2 = "&val2";
val3 = "&val3";
val4 = "&val4";
val5 = "&val5";
val6 = "&val6";
val7 = "&val7";
val8 = "&val8";
val9 = "&val9";
val10 = "&val10";
run;
Patrick
Opal | Level 21

You need to standardize the columns you're using to merge the data. SAS got products for this (DataFlux/Data Quality Server) but for your case it's likely good enough if you just analyse the result after your first go (as you already did) and then implement standardization as below. Then use the standardized column for merging.

proc format;
  invalue $country_stdz(upcase)
    'BAHAMAS'       = 'Bahamas'     
    'BAHAMAS (THE)' = 'Bahamas'
    'BAHAMAS, THE'  = 'Bahamas'
  ;
run;

data demo;
  infile datalines truncover;
  input country $20.;
  length country_stdz $20.;
  country_stdz=input(country, $country_stdz.);
  datalines;
Bahamas      
Bahamas (the)
Bahamas, The 
;

proc print;
run;

 

If you're reading your data from an external file then you can use the informat also directly in the input statement.

proc format;
  invalue $country_stdz(upcase)
    'BAHAMAS'       = 'Bahamas'     
    'BAHAMAS (THE)' = 'Bahamas'
    'BAHAMAS, THE'  = 'Bahamas'
  ;
run;

data demo;
  infile datalines truncover;
  input country $country_stdz20.;
  datalines;
Bahamas      
Bahamas (the)
Bahamas, The 
Other country
;

proc print;
run;

Patrick_0-1591579642172.png

 

Ksharp
Super User
data original;
input country$14. val1$ val2$ val3 val4$11. val5$ val6$ val7$ val8$ val9$ val10$;
infile datalines missover dsd dlm="";
cards;
Bahamas                        60 9 
Bahamas (the) BS BHS              2.9 367168
Bahamas, The   BHS 29930 High Income
;
run;
data temp;
set original;
id=scan(country,1,,'ka');
run;
data want;
 update temp(obs=0) temp;
 by id;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 659 views
  • 3 likes
  • 4 in conversation