Combining Observations

breebreebree · Posted 06-07-2020 12:00 AM

I am required to merge 5 data sets into one. Not all data sets had the same variables so I have linked the datasets as best as I can. I am now having issues with a couple of the observations, for example:

Bahamas							60	9
Bahamas (the)	BS	BHS			2.9	367168
Bahamas, The		BHS	29930	High Income

This is clearly all data relating to the same country, however it is not in one row.

How can I make the data look like this?

Bahamas

BS

BHS

29930

High Income

2.9

367168

60

9

Thanks

VENKATAMAHESH · Posted 06-07-2020 01:38 AM

If you have different countries, convert the following code into a macro.

data original;
input country$14. val1$ val2$ val3 val4$11. val5$ val6$ val7$ val8$ val9$ val10$;
infile datalines missover dsd dlm="";
lines;
Bahamas                        60 9 
Bahamas (the) BS BHS              2.9 367168
Bahamas, The   BHS 29930 High Income
;
run;

data _NULL_;
set original;
if not missing(val1) then call symputx("val1",val1);
else if not missing(val2) then call symputx("val2",val2);
else if not missing(val3) then call symputx("val3",val3);
else if not missing(val4) then call symputx("val4",val4);
else if not missing(val5) then call symputx("val5",val5);
else if not missing(val6) then call symputx("val6",val6);
else if not missing(val7) then call symputx("val7",val7);
else if not missing(val8) then call symputx("val8",val8);
else if not missing(val9) then call symputx("val9",val9);
else if not missing(val10) then call symputx("val10",val10);
run;

data final;
country = "Bahamas";
val1 = "&val1";
val2 = "&val2";
val3 = "&val3";
val4 = "&val4";
val5 = "&val5";
val6 = "&val6";
val7 = "&val7";
val8 = "&val8";
val9 = "&val9";
val10 = "&val10";
run;

Patrick · Posted 06-07-2020 03:20 AM

You need to standardize the columns you're using to merge the data. SAS got products for this (DataFlux/Data Quality Server) but for your case it's likely good enough if you just analyse the result after your first go (as you already did) and then implement standardization as below. Then use the standardized column for merging.

proc format;
  invalue $country_stdz(upcase)
    'BAHAMAS'       = 'Bahamas'     
    'BAHAMAS (THE)' = 'Bahamas'
    'BAHAMAS, THE'  = 'Bahamas'
  ;
run;

data demo;
  infile datalines truncover;
  input country $20.;
  length country_stdz $20.;
  country_stdz=input(country, $country_stdz.);
  datalines;
Bahamas      
Bahamas (the)
Bahamas, The 
;

proc print;
run;

If you're reading your data from an external file then you can use the informat also directly in the input statement.

proc format;
  invalue $country_stdz(upcase)
    'BAHAMAS'       = 'Bahamas'     
    'BAHAMAS (THE)' = 'Bahamas'
    'BAHAMAS, THE'  = 'Bahamas'
  ;
run;

data demo;
  infile datalines truncover;
  input country $country_stdz20.;
  datalines;
Bahamas      
Bahamas (the)
Bahamas, The 
Other country
;

proc print;
run;

Ksharp · Posted 06-07-2020 07:38 AM

data original;
input country$14. val1$ val2$ val3 val4$11. val5$ val6$ val7$ val8$ val9$ val10$;
infile datalines missover dsd dlm="";
cards;
Bahamas                        60 9 
Bahamas (the) BS BHS              2.9 367168
Bahamas, The   BHS 29930 High Income
;
run;
data temp;
set original;
id=scan(country,1,,'ka');
run;
data want;
 update temp(obs=0) temp;
 by id;
run;

Combining Observations

Re: Combining Observations

Re: Combining Observations

Re: Combining Observations

Combining Observations

Re: Combining Observations

Re: Combining Observations

Re: Combining Observations

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!