Hi all,
I am having issues with an inner join by two variables, FIPS and year. When I join by year in addition to FIPS, I get no output (zero rows). The year variable in both datasets is type num, length 8, but the year in one of them has a year4. format. I have checked the data and both contain years in my desired range (1997-2014) so the issue is not that there are no matches. Most of the year variable entries in one dataset (soccap3) are the result of using proc expand to extrapolate a variable (sk) to missing years, and I suspect this may be part of the issue because the output from proc expand assigns the same VAR number to multiple years. I have attached a screenshot of the output after proc expand for reference and the relevant code is below. Some things I have tried: deleting the year4. format on the soccap3 year, adding a year4. format to the funda_4 year, deleting b.year from the merge statement. None of these have worked, and I suspect, again, that this is because there is a problem with the year variable from using proc expand. Any help would be appreciated! Thank you.
/* create social capital values for missing years*/
PROC EXPAND DATA=soccap2 out=soccap3 from=year;
by fips;
id year;
RUN;
PROC SORT DATA=soccap3;
BY fips year;
RUN;
/*intermediate steps of importing funda data, adding variables omitted*/
/* merge fundamentals and social capital data*/
PROC SQL;
create table funda_final as
SELECT a.*, b.sk, b.year from funda_4 as a, soccap3 as b
WHERE a.fips=b.fips & a.year=b.year;
quit;"from=year" in your Proc Expand tells us that the year variable must contain a SAS Date value (count of days since 1/1/1960). The year4. format then instructs SAS to print such a value as a 4 digit year.
For any comparison, merge or join SAS will use the internal values and not the formatted ones. I assume that in your other data set "funda_4" the value in the year variable is just a 4 digit number and not a SAS Date - and if so then the values won't match of course.
If my theory is correct then you need to convert the year value in one of the tables so it can match the year value in the other table. Below SQL demonstrates how this could be done (untested as you haven't provided a data step creating sample data).
proc sql;
create table funda_final as
select
a.*,
b.sk,
b.year
from
funda_4 as a,
soccap3 as b
where
a.fips=b.fips
and a.year=input(put(b.year,year4.),best32.);
quit;
"from=year" in your Proc Expand tells us that the year variable must contain a SAS Date value (count of days since 1/1/1960). The year4. format then instructs SAS to print such a value as a 4 digit year.
For any comparison, merge or join SAS will use the internal values and not the formatted ones. I assume that in your other data set "funda_4" the value in the year variable is just a 4 digit number and not a SAS Date - and if so then the values won't match of course.
If my theory is correct then you need to convert the year value in one of the tables so it can match the year value in the other table. Below SQL demonstrates how this could be done (untested as you haven't provided a data step creating sample data).
proc sql;
create table funda_final as
select
a.*,
b.sk,
b.year
from
funda_4 as a,
soccap3 as b
where
a.fips=b.fips
and a.year=input(put(b.year,year4.),best32.);
quit;
The SAS year() function provides another way to convert a SAS date value to a year number:
year(b.year)
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.