BookmarkSubscribeRSS Feed
Phoebelee
Fluorite | Level 6

Actually, I got three datasets related to the healthcare data of the 58 counties in California. But in every dataset, the county name of the 58 counties are somewhat different. For example, Los Angelas can be input as "Los Angel" or "Los Angelas". So how am I supposed to merge all the datasets by county name? The sas code and the csv files have been attached.

Thank you all !

Here is my sas code:

/*REHABILITATION*/
libname PROJECT 'C:\Users\Administrator\Desktop\SAS\proposal';
FILENAME REFFILE 'C:\Users\Administrator\Desktop\SAS\proposal\Vocational_Rehabilitation_Successful_Closures_Average_Hours_Worked_By_County__SFY_2014.csv';

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
REPLACE
OUT=PROJECT.REHABILITATION;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=PROJECT.REHABILITATION VARNUM; RUN;

proc print data=project.REHABILITATION (obs = 50); run;

DATA REHABILITATION2;
SET PROJECT.REHABILITATION;
County_name=upcase(County);
run;
proc print data= REHABILITATION2 (obs=50) ;run;

proc means data = REHABILITATION2 noprint;
class County_name;
var Average_Hours_Worked_Per_Week;
output out = REHABILITATION3 mean=ave_hour sum=total_hour;
run;
proc print data= REHABILITATION3(where = (_TYPE_ = 1));
run;

 

/*DISCHARGE*/
libname PROJECT 'C:\Users\Administrator\Desktop\SAS\proposal';
FILENAME REFFILE 'C:\Users\Administrator\Desktop\SAS\proposal\Patient_Discharge_Data_By_Disposition.csv';

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
REPLACE
OUT=PROJECT.DISCHARGE;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=PROJECT.DISCHARGE VARNUM; RUN;

proc print data=project.discharge (obs = 50); run;

data discharge2 ;
informat County_Name $20.;
set project.discharge;
where year =2014;
keep Year County_Name Count;
drop OSHPD_ID Facility_Name Type_of_Control Disposition ;
run;

proc contents data = discharge2 varnum; run;
proc print data=discharge2 (obs = 50); run;

proc freq data=discharge2;
tables County_Name year;
run;

proc means data = discharge2 noprint ;
class County_Name;
var Count;
output out =discharge3 mean=ave_discharge_count sum=total_discharge_count;
run;
proc print data = discharge3 (where = (_TYPE_ = 1) ) ; run;

 

/*EMERGENCY*/

libname PROJECT 'C:\Users\Administrator\Desktop\SAS\proposal';
FILENAME REFFILE 'C:\Users\Administrator\Desktop\SAS\proposal\Emergency_Department_Data_By_Expected_Payer_Source.csv';

PROC IMPORT DATAFILE=REFFILE
DBMS=CSV
REPLACE
OUT=PROJECT.EMERGENCY;
GETNAMES=YES;
RUN;

PROC CONTENTS DATA=PROJECT.EMERGENCY VARNUM; RUN;

proc print data=project.emergency (obs = 50); run;


data emergency2 ;
set project.emergency;
where year = 2014;
keep Year County_Name Count;
drop OSHPD_ID Facility_Name Type_of_Control Disposition ;
run;

proc contents data = emergency2 varnum; run;
proc print data=emergency2 (obs = 50); run;

proc freq data=emergency2;
tables County_Name year;
run;

proc means data = emergency2 noprint ;
class County_Name;
var Count;
output out =emergency3 mean=ave_emergency_count sum=total_emergency_count;
run;
proc print data= emergency3 (where = (_TYPE_ = 1)) ; run;

/*merge*/

/*sort out the dataset*/
PROC SORT DATA=REHABILITATION3 OUT=REHABILITATION4;
BY COUNTY_Name;
RUN;
proc print data= REHABILITATION4(where = (_TYPE_ = 1));
run;


PROC SORT DATA=DISCHARGE3 OUT=DISCHARGE4;
BY COUNTY_Name;
RUN;
proc print data= DISCHARGE4(where = (_TYPE_ = 1));
run;


PROC SORT DATA=EMERGENCY3 OUT=EMERGENCY4;
BY COUNTY_Name;
RUN;
proc print data= EMERGENCY4(where = (_TYPE_ = 1));
run;

 

DATA MERGE_ALL;
merge REHABILITATION4(in=fredo) DISCHARGE4 (in=county) EMERGENCY4 (in=Phoebe) ;
by county_Name;
if fredo and county ;
run;
proc print data=merge_all;
run;

3 REPLIES 3
PeterClemmensen
Tourmaline | Level 20

Please post you data in the form of a datastep, much easier to help that way and most users will not unzip/download files with unknown data.

Kurt_Bremser
Super User

Step 1: get away from using proc import. Take the data steps created by proc import (you find it in the log), and make sure that the variable county has sufficient and equal length in all three steps. Then (step 2) convert county to uppercase in Vocational_Rehabilitation_Successful_Closures_Average_Hours_Worked_By_County__SFY_2014.csv, and you should be fine.

As far as i can see, LOS ANGELES is consistently used, so I guess you have a length issue stemming from the guessing of proc import.

 

See Maxim 22.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1163 views
  • 1 like
  • 3 in conversation