DATA Step, Macro, Functions and more

merging datasets

Accepted Solution Solved
Reply
Regular Contributor
Posts: 162
Accepted Solution

merging datasets

[ Edited ]

Dear all,

 

I would like to merge two datasets with different informat of a variable. I tried to change the variable "STATE" from a numeric variable to a character variable using PUT function prior to merging but I failed. 

 

Could anyone recommend a better way to solve the problem?

 

Thank you.

 

MSPAK

 


Accepted Solutions
Solution
‎09-25-2015 06:23 AM
Valued Guide
Posts: 765

Re: merging datasets

[ Edited ]

Hi.  I think that you want to CONCATENATE (not MERGE) the two data sets you posted.

 

STATE is a state name in data set SET1 and a state code in data set SET2.  

So ... make two new variables for the new data set, STATE_CODE and STATE_NAME.

 

NAME has a name and state in data set SET1 but only a name in data set SET2.

So ... get rid of the state portion of NAME (each observation will have a STATE_CODE and STATE_NAME)

 

data set SET1 has no county numbers

So ... use the FIPS code to add a county number

 

 Assuming that I'm correct, try this ...

 

* first ... remove the formats and informats from both data sets (useless);
proc datasets lib=z;
modify set1;
format _all_;
informat _all_;
modify set2;
format _all_;
informat _all_;
quit;

 

* second ... combine (CONCATENATE) the two data sets with SET statements;

data one_two;
length name $36;
format state_code z2. county z3. fips z5.;
set z.set1 (in=one rename=(state=state_name)) z.set2 (rename=(state=state_code));
if one then do;
   state_code = input(substr(put(fips,z5.),1,2),2.);
   county = input(substr(put(fips,z5.),3,3),3.);
end;
else do;
   state_name = fipname(state_code);
   name = scan(name,1,',');
end;
run;

 

The combined data set ONE_TWO has 15,714 observations.  PROC CONTENTS shows ...

 

# Variable   Type Len Format Label

7 FYEAR      Num   8
6 INHERENT   Num   8
3 county     Num   8   Z3.  3-digit county fips number
4 fips       Num   8   Z5.  FIPS Code
1 name       Char 36        County Name
2 state_code Num   8   Z2.  State Census Code
5 state_name Char 20        State Name

 

View solution in original post


All Replies
Super Contributor
Posts: 298

Re: merging datasets

Unable to understand your question. Better to show the program and the Log. 

Super User
Super User
Posts: 7,046

Re: merging datasets

The INFORMAT attached to a variable has no impact on whether you can merge by it.

If the variables are different TYPE then you will not be able combine the datasets at all, let alone merge by that variable.

You cannot change the type of variable. You can make a new variable and rename it.

newvar=put(state,z2.);
rename newvar=state state=oldstate;

You could also merge directly using SQL.

select * from a,b 
where a.state = put(b.state,Z2.)
;
Regular Contributor
Posts: 162

Re: merging datasets

Hi all,

 

These two files attached. I try to merge these two files together but the format of the variable "STATE" are different.

 

TQ.

Attachment
Super User
Posts: 7,782

Re: merging datasets

With which SAS version and on which platform did you create those datasets? I can't open them with 9.2 on Win64.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Regular Contributor
Posts: 162

Re: merging datasets

Posted in reply to KurtBremser

Hi,

 

I am using SAS9.4 Windows64. The SAS files uploaded were in imported from (stata) dta files.

SAS Employee
Posts: 7

Re: merging datasets

This is how I would solve the problem assuming that you want to get all of the inherent and fyear records from both set1 and set2.

 

/* get dataset of distinct states and counties */
proc sql;
 create table state_county as
 select distinct a.fips, a.county as county_num, b.state, b.name as county_name
 from set2 as a join
	set1 as b on a.fips = b.fips;
run;

/* concatenate the tables together to get inherent and fyear data from each set */
/* keep state and county information                                            */
proc sql;
 create table combined as
 select a.*, b.inherent, b.fyear
 from state_county as a join
 	set2 as b on a.fips = b.fips
union
select a.*, b.inherent, b.fyear
 from state_county as a join
 	set1 as b on a.fips = b.fips
	;
quit;

/* sort for fips and year */
proc sort data=combined;
 by fips fyear;
run;
Super User
Posts: 19,792

Re: merging datasets

[ Edited ]

SET1:

STATE - Type=TEXT, $20

SET2:

STATE - Type=NUMERIC, Format=STATE.

 

In one data set you have a character variable for state and in the second you have a numeric variable that is mapped using a format.

@KurtBremser you can use the option noftmerr to view the datasets.

 

@mspak something like the following should work - rename old variable state to state_code and create a new variable state that is character. Then merge the datasets. This won't work exactly as I suspect you have more than STATE in your merge condition but it's a starting point.

 

Hope this helps.

 

Data SET2_NEW;
set SET2(rename=state=state_code);
state=put(state_code, state.);
run;

*Appropriate sorts...;

data want;
merge set1 set2;
by state;
run;

 

Solution
‎09-25-2015 06:23 AM
Valued Guide
Posts: 765

Re: merging datasets

[ Edited ]

Hi.  I think that you want to CONCATENATE (not MERGE) the two data sets you posted.

 

STATE is a state name in data set SET1 and a state code in data set SET2.  

So ... make two new variables for the new data set, STATE_CODE and STATE_NAME.

 

NAME has a name and state in data set SET1 but only a name in data set SET2.

So ... get rid of the state portion of NAME (each observation will have a STATE_CODE and STATE_NAME)

 

data set SET1 has no county numbers

So ... use the FIPS code to add a county number

 

 Assuming that I'm correct, try this ...

 

* first ... remove the formats and informats from both data sets (useless);
proc datasets lib=z;
modify set1;
format _all_;
informat _all_;
modify set2;
format _all_;
informat _all_;
quit;

 

* second ... combine (CONCATENATE) the two data sets with SET statements;

data one_two;
length name $36;
format state_code z2. county z3. fips z5.;
set z.set1 (in=one rename=(state=state_name)) z.set2 (rename=(state=state_code));
if one then do;
   state_code = input(substr(put(fips,z5.),1,2),2.);
   county = input(substr(put(fips,z5.),3,3),3.);
end;
else do;
   state_name = fipname(state_code);
   name = scan(name,1,',');
end;
run;

 

The combined data set ONE_TWO has 15,714 observations.  PROC CONTENTS shows ...

 

# Variable   Type Len Format Label

7 FYEAR      Num   8
6 INHERENT   Num   8
3 county     Num   8   Z3.  3-digit county fips number
4 fips       Num   8   Z5.  FIPS Code
1 name       Char 36        County Name
2 state_code Num   8   Z2.  State Census Code
5 state_name Char 20        State Name

 

Regular Contributor
Posts: 162

Re: merging datasets

Dear all,

 

Thank you very much for all the solutions and suggestions provided. I truly appreciate your contributions. I finally manage to combine 2 datasets.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 496 views
  • 2 likes
  • 7 in conversation