So, instead of me guessing, could you explain table 1, because it looks very different than the previous table 1.
Because you have replicates of some names in dataset1 (for example, LN appears 3 times in table 1), I identify them separately as LN1, LN2 and LN3, etc. Also, in dataset2, variable CQ shows up as character, so I have omitted that, you can fix it in your original data if necessary.
data dataset1;
set dataset1;
name=substr(eventid,1,2);
startyear=input(substr(eventid,9,4),4.);
endyear=input(substr(eventid,15,4),4.);
run;
proc sort data=dataset1;
by name;
Run;
data dataset1a;
set dataset1;
by name;
if first.name then counter=0;
counter+1;
name=cats(name,counter);
drop counter;
run;
data dataset2a;
set dataset2;
length id $ 3;
array v bj--zj;
do i=1 to dim(v);
value=v(i);
id=vname(v(i));
output;
end;
drop i bj--zj cq;
run;
proc sql;
create table dataset3 as select d1.name,mean(d2.value) as mean from dataset2a d2
right join dataset1a d1 on substr(d1.name,1,2)=d2.id
where d2.year>=d1.startyear and d2.year<=d1.endyear
group by d1.name;
quit;
@cx2019 wrote:
Excel books t1 and t2 are attached.
In t1, first two letters are the province and last twelve digits are startyear-month and endyear-month.
In t2, there are gdp growth rates.
calculation is the same(simple arithmetic mean).
I am sorry to trouble you.
Looking at the data your YEAR_MONTH values are not valid. Many have a month of 99.
But since your growth rates are for YEAR only we can just ignore the month from T1.
First convert these XLS files into datasets.
proc import dbms=xls out=t1 replace datafile="&path\t1.xls";
run;
proc import dbms=xls out=t2 replace datafile="&path\t2.xls";
run;
You might want to examine the files. The first one appears to have three extra empty columns. The last one seems to have some rows without YEAR value.
Let's transform the files to make them usable. For the T2 file we need to get the data out of the column names and into a variable. We can use PROC TRANSPOSE. I added the NOTSORTED to deal with the missing value for YEAR at the end of the file.
proc transpose data=t2 out=t2t ;
by year notsorted;
run;
For the other we need pull out the variables that are trapped in the EVENTID field (and get rid of the empty variables).
data t1_fixed;
set t1(keep=eventid);
length from_yr to_yr 8;
from_yr=input(substr(eventid,9),4.);
to_yr=input(substr(eventid,15),4.);
run;
Now we can combine the two and calculate the statistic.
proc sql ;
create table want_tall as
select a.eventid,a.from_yr,a.to_yr,b._name_,mean(b.col1) as average
from t1_fixed a
left join t2t b
on b.year between a.from_yr and a.to_yr
group by 1,2,3,4
;
quit;
If you want you can convert back to the wide format using PROC TRANSPOSE.
proc transpose data=want_tall out=want ;
by eventid from_yr to_yr ;
id _name_;
var average;
run;
HI @cx2019 Would help more if you could post a sample of the expected output. Thank you!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.