DATA Step, Macro, Functions and more

How to merge three datasets

Reply
Occasional Contributor
Posts: 8

How to merge three datasets

[ Edited ]

Actually, I got three datasets related to the healthcare data of the 58 counties in California. But in every dataset, the county name of the 58 counties are somewhat different. For example, Los Angelas can be input as "Los Angel" or "Los Angelas". So how am I supposed to merge all the datasets by county name? The sas code and the csv files have been attached.

Thank you all !

Here is my sas code:

/*REHABILITATION*/
libname PROJECT 'C:\Users\Administrator\Desktop\SAS\proposal';
FILENAME REFFILE 'C:\Users\Administrator\Desktop\SAS\proposal\Vocational_Rehabilitation_Successful_Closures_Average_Hours_Worked_By_County__SFY_2014.csv';

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
REPLACE
OUT=PROJECT.REHABILITATION;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=PROJECT.REHABILITATION VARNUM; RUN;

proc print data=project.REHABILITATION (obs = 50); run;

DATA REHABILITATION2;
SET PROJECT.REHABILITATION;
County_name=upcase(County);
run;
proc print data= REHABILITATION2 (obs=50) ;run;

proc means data = REHABILITATION2 noprint;
class County_name;
var Average_Hours_Worked_Per_Week;
output out = REHABILITATION3 mean=ave_hour sum=total_hour;
run;
proc print data= REHABILITATION3(where = (_TYPE_ = 1));
run;

 

/*DISCHARGE*/
libname PROJECT 'C:\Users\Administrator\Desktop\SAS\proposal';
FILENAME REFFILE 'C:\Users\Administrator\Desktop\SAS\proposal\Patient_Discharge_Data_By_Disposition.csv';

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
REPLACE
OUT=PROJECT.DISCHARGE;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=PROJECT.DISCHARGE VARNUM; RUN;

proc print data=project.discharge (obs = 50); run;

data discharge2 ;
informat County_Name $20.;
set project.discharge;
where year =2014;
keep Year County_Name Count;
drop OSHPD_ID Facility_Name Type_of_Control Disposition ;
run;

proc contents data = discharge2 varnum; run;
proc print data=discharge2 (obs = 50); run;

proc freq data=discharge2;
tables County_Name year;
run;

proc means data = discharge2 noprint ;
class County_Name;
var Count;
output out =discharge3 mean=ave_discharge_count sum=total_discharge_count;
run;
proc print data = discharge3 (where = (_TYPE_ = 1) ) ; run;

 

/*EMERGENCY*/

libname PROJECT 'C:\Users\Administrator\Desktop\SAS\proposal';
FILENAME REFFILE 'C:\Users\Administrator\Desktop\SAS\proposal\Emergency_Department_Data_By_Expected_Payer_Source.csv';

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
REPLACE
OUT=PROJECT.EMERGENCY;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=PROJECT.EMERGENCY VARNUM; RUN;

proc print data=project.emergency (obs = 50); run;


data emergency2 ;
set project.emergency;
where year = 2014;
keep Year County_Name Count;
drop OSHPD_ID Facility_Name Type_of_Control Disposition ;
run;

proc contents data = emergency2 varnum; run;
proc print data=emergency2 (obs = 50); run;

proc freq data=emergency2;
tables County_Name year;
run;

proc means data = emergency2 noprint ;
class County_Name;
var Count;
output out =emergency3 mean=ave_emergency_count sum=total_emergency_count;
run;
proc print data= emergency3 (where = (_TYPE_ = 1)) ; run;

/*merge*/

/*sort out the dataset*/
PROC SORT DATA=REHABILITATION3 OUT=REHABILITATION4;
BY COUNTY_Name;
RUN;
proc print data= REHABILITATION4(where = (_TYPE_ = 1));
run;


PROC SORT DATA=DISCHARGE3 OUT=DISCHARGE4;
BY COUNTY_Name;
RUN;
proc print data= DISCHARGE4(where = (_TYPE_ = 1));
run;


PROC SORT DATA=EMERGENCY3 OUT=EMERGENCY4;
BY COUNTY_Name;
RUN;
proc print data= EMERGENCY4(where = (_TYPE_ = 1));
run;

 

DATA MERGE_ALL;
merge REHABILITATION4(in=fredo) DISCHARGE4 (in=county) EMERGENCY4 (in=Phoebe) ;
by county_Name;
if fredo and county ;
run;
proc print data=merge_all;
run;

Attachment
PROC Star
Posts: 741

Re: How to merge three datasets

Posted in reply to Phoebelee

Please post you data in the form of a datastep, much easier to help that way and most users will not unzip/download files with unknown data.

Super User
Posts: 7,781

Re: How to merge three datasets

[ Edited ]
Posted in reply to Phoebelee

Your .zip contains csv files. Please supply the code you used to import those into SAS.

 

Edit: Belay that. The code is also in the .zip.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 7,781

Re: How to merge three datasets

[ Edited ]
Posted in reply to Phoebelee

Step 1: get away from using proc import. Take the data steps created by proc import (you find it in the log), and make sure that the variable county has sufficient and equal length in all three steps. Then (step 2) convert county to uppercase in Vocational_Rehabilitation_Successful_Closures_Average_Hours_Worked_By_County__SFY_2014.csv, and you should be fine.

As far as i can see, LOS ANGELES is consistently used, so I guess you have a length issue stemming from the guessing of proc import.

 

See Maxim 22.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 3 replies
  • 159 views
  • 1 like
  • 3 in conversation