BookmarkSubscribeRSS Feed
PaigeMiller
Diamond | Level 26

So, instead of me guessing, could you explain table 1, because it looks very different than the previous table 1.

--
Paige Miller
cx2019
Obsidian | Level 7
1st-2 nd Name
9th-12th Startyear
15th-18th Endyear
PaigeMiller
Diamond | Level 26

Because you have replicates of some names in dataset1 (for example, LN appears 3 times in table 1), I identify them separately as LN1, LN2 and LN3, etc. Also, in dataset2, variable CQ shows up as character, so I have omitted that, you can fix it in your original data if necessary.

 

 

data dataset1;
	set dataset1;
	name=substr(eventid,1,2);
	startyear=input(substr(eventid,9,4),4.);
	endyear=input(substr(eventid,15,4),4.);
run;
proc sort data=dataset1;
	by name;
Run;
data dataset1a;
	set dataset1;
	by name;
	if first.name then counter=0;
	counter+1;
	name=cats(name,counter);
	drop counter;
run;

data dataset2a;
	set dataset2;
        length id $ 3;
	array v bj--zj;
	do i=1 to dim(v);
	    value=v(i);
		id=vname(v(i));
		output;
	end;
	drop i bj--zj cq;
run;

proc sql;
	create table dataset3 as select d1.name,mean(d2.value) as mean from dataset2a d2 
            right join dataset1a d1 on substr(d1.name,1,2)=d2.id 
            where d2.year>=d1.startyear and d2.year<=d1.endyear
			group by d1.name;
quit;

 

--
Paige Miller
Tom
Super User Tom
Super User

@cx2019 wrote:

Excel books t1 and t2 are attached.

In t1, first two letters are the province and last twelve digits are startyear-month and endyear-month.

In t2, there are gdp growth rates.

calculation is the same(simple arithmetic mean).

I am sorry to trouble you.


Looking at the data your YEAR_MONTH values are not valid.  Many have a month of 99.

But since your growth rates are for YEAR only we can just ignore the month from T1.

First convert these XLS files into datasets.  

proc import dbms=xls out=t1 replace datafile="&path\t1.xls";
run;
proc import dbms=xls out=t2 replace datafile="&path\t2.xls";
run;

You might want to examine the files. The first one appears to have three extra empty columns.  The last one seems to have some rows without YEAR value.

Let's transform the files to make them usable. For the T2 file we need to get the data out of the column names and into a variable. We can use PROC TRANSPOSE.  I added the NOTSORTED to deal with the missing value for YEAR at the end of the file.

proc transpose data=t2 out=t2t ;
  by year notsorted;
run;

For the other we need pull out the variables that are trapped in the EVENTID field (and get rid of the empty variables).

data t1_fixed;
  set t1(keep=eventid);
  length from_yr to_yr 8;
  from_yr=input(substr(eventid,9),4.);
  to_yr=input(substr(eventid,15),4.);
run;

Now we can combine the two and calculate the statistic.

proc sql ;
create table want_tall as 
  select a.eventid,a.from_yr,a.to_yr,b._name_,mean(b.col1) as average
  from t1_fixed a 
  left join t2t b
  on b.year between a.from_yr and a.to_yr
  group by 1,2,3,4
;
quit;

If you want you can convert back to the wide format using PROC TRANSPOSE.

proc transpose data=want_tall out=want ;
  by eventid from_yr to_yr ;
  id _name_;
  var average;
run;
cx2019
Obsidian | Level 7
What should I do if I can not change the order in t1?
How to deal with the problem with array?
novinosrin
Tourmaline | Level 20

HI @cx2019   Would help more if you could post a sample of the expected output. Thank you!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 20 replies
  • 5025 views
  • 1 like
  • 4 in conversation