Re: calculation between two tables

PaigeMiller · Posted 04-16-2019 06:41 AM

So, instead of me guessing, could you explain table 1, because it looks very different than the previous table 1.

--
Paige Miller

cx2019 · Posted 04-16-2019 06:58 AM

1st-2 nd Name
9th-12th Startyear
15th-18th Endyear

PaigeMiller · Posted 04-16-2019 08:19 AM

Because you have replicates of some names in dataset1 (for example, LN appears 3 times in table 1), I identify them separately as LN1, LN2 and LN3, etc. Also, in dataset2, variable CQ shows up as character, so I have omitted that, you can fix it in your original data if necessary.

data dataset1;
	set dataset1;
	name=substr(eventid,1,2);
	startyear=input(substr(eventid,9,4),4.);
	endyear=input(substr(eventid,15,4),4.);
run;
proc sort data=dataset1;
	by name;
Run;
data dataset1a;
	set dataset1;
	by name;
	if first.name then counter=0;
	counter+1;
	name=cats(name,counter);
	drop counter;
run;

data dataset2a;
	set dataset2;
        length id $ 3;
	array v bj--zj;
	do i=1 to dim(v);
	    value=v(i);
		id=vname(v(i));
		output;
	end;
	drop i bj--zj cq;
run;

proc sql;
	create table dataset3 as select d1.name,mean(d2.value) as mean from dataset2a d2 
            right join dataset1a d1 on substr(d1.name,1,2)=d2.id 
            where d2.year>=d1.startyear and d2.year<=d1.endyear
			group by d1.name;
quit;

--
Paige Miller

Tom · Posted 04-16-2019 10:09 AM

@cx2019 wrote:

Excel books t1 and t2 are attached.

In t1, first two letters are the province and last twelve digits are startyear-month and endyear-month.

In t2, there are gdp growth rates.

calculation is the same(simple arithmetic mean).

I am sorry to trouble you.

Looking at the data your YEAR_MONTH values are not valid. Many have a month of 99.

But since your growth rates are for YEAR only we can just ignore the month from T1.

First convert these XLS files into datasets.

proc import dbms=xls out=t1 replace datafile="&path\t1.xls";
run;
proc import dbms=xls out=t2 replace datafile="&path\t2.xls";
run;

You might want to examine the files. The first one appears to have three extra empty columns. The last one seems to have some rows without YEAR value.

Let's transform the files to make them usable. For the T2 file we need to get the data out of the column names and into a variable. We can use PROC TRANSPOSE. I added the NOTSORTED to deal with the missing value for YEAR at the end of the file.

proc transpose data=t2 out=t2t ;
  by year notsorted;
run;

For the other we need pull out the variables that are trapped in the EVENTID field (and get rid of the empty variables).

data t1_fixed;
  set t1(keep=eventid);
  length from_yr to_yr 8;
  from_yr=input(substr(eventid,9),4.);
  to_yr=input(substr(eventid,15),4.);
run;

Now we can combine the two and calculate the statistic.

proc sql ;
create table want_tall as 
  select a.eventid,a.from_yr,a.to_yr,b._name_,mean(b.col1) as average
  from t1_fixed a 
  left join t2t b
  on b.year between a.from_yr and a.to_yr
  group by 1,2,3,4
;
quit;

If you want you can convert back to the wide format using PROC TRANSPOSE.

proc transpose data=want_tall out=want ;
  by eventid from_yr to_yr ;
  id _name_;
  var average;
run;

cx2019 · Posted 04-17-2019 04:08 AM

What should I do if I can not change the order in t1?
How to deal with the problem with array?

novinosrin · Posted 04-15-2019 09:07 AM

HI @cx2019 Would help more if you could post a sample of the expected output. Thank you!