BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
TobbeNord
Obsidian | Level 7

I have 18 numerical variables pm25_total2000 to pm25_total2018

Each person have a starting year between 2013 and 2018, we can call that variable "reqyear".

 

Now I want to calculate mean for each persons 10 years before the starting year. 

 

For example if a person have starting year 2015 I want mean(of pm25_total2006-pm25_total2015) 

Or if a person have starting year 2013 I want mean(of pm25_total2004-pm25_total2013)

 

How to do this? 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@TobbeNord wrote:

I have 18 numerical variables pm25_total2000 to pm25_total2018


If those variables are numbered sequential there are 19 variables. If you only have 18 then you need to tell us which one is missing.

If you are going to throw at problem that has 20+ variables you should provide an example data set as making us dummy of that much data is asking a bit.

 

Here is one way to get started. I am too lazy to make up that many values so I am using 10 "years" and much shorter variable names.

data have;
  input v2000 - v2009 reqyear;
datalines;
1 2 3 4 5 6 7 8 9 10 2005
11 22 33 44 55 66 77 88 99 110 2008
;

data want;
  set have;
  array v(2000:2009) v2000-v2009;
  array t(2000:2009) _temporary_;
  /* reset as temporary arrays hold values across data step)*/
  call missing(of t(*));
  /*load temporary array*/
  do i=(reqyear-3) to reqyear;
     t[i]=v[i];
  end;
  Average= mean(of t(*));
  drop i;
run;

The key to my approach are two arrays. One that hold the full set of values and one that we copy the values to for the mean  function to use. If you have not used arrays much you may not recognize the array statement with two values separated by a colon. That is used to indicate the low and high index values instead of the default 1 to number of elements in the array. This is often useful when you have some other variable such as your Reqyear to extract values by index.

The T array is temporary, meaning that the contents are not written to the result data set. If you want to see what they might be remove the _temporary_.

One thing is that while the values of _temporary_ arrays aren't written to the output data set the values are retained. So we need to start fresh with T at each step.

Since I only used 10 variables I am using 3 previous "years" which is seen on the Do I= loop to extract the desired  values into the temporary array.

 

View solution in original post

7 REPLIES 7
Oligolas
Barite | Level 11

Hi,

while performing the calculation in a datastep, I realized it became quickly diffcult to read, so I did the same using a macro step.

There must be an easier way I can't think of right now

 


%** Create test data;
DATA have;
   length person $10 reqyear pm25_total2000-pm25_total2018 8;
   array pm pm25_total2000-pm25_total2018;

   do i=1 to 10;
      person=cats('Person',put(i,best.));
      do over pm;
         pm=int(ranuni(0)*100);
      end;
      reqyear=2000 + floor((1+2018-2000)*rand("uniform"));
      output;
   end;
   drop i;
RUN;


%** 1 - Using datastep;
DATA want;
   set _NULL_;
RUN;
DATA _NULL_;
   length pmname pm_start pm_end $32;
   SET have;
   ARRAY PMS pm25_total2000-pm25_total2018;
   do i=1 to dim(PMS);
      pmname    =strip(upcase(vname(PMS[i])));
      pm_start  =strip(upcase(cats('pm25_total',put(reqyear-9,best.))));
      pm_end    =strip(upcase(cats('pm25_total',put(reqyear,best.))));
   end;

   put 'N' 'OTE:' Person= reqyear= pm_start= pm_end=;
   if pm_start GE vname(PMS[1]) then do;
      call execute('DATA _hlp; LENGTH mean 8 analyzed $70; SET have(FIRSTOBS='||strip(put(_N_,best.))||' OBS='||strip(put(_N_,best.))||');');
      call execute('ARRAY pms_mean {*} '||strip(pm_start)||'--'||strip(pm_end)||';');
      call execute('mean=mean(of pms_mean[*]); analyzed="mean of '||strip(pm_start)||'--'||strip(pm_end)||'";');
      call execute('RUN;');
      call execute('DATA want; SET want _hlp; RUN; PROC DATASETS lib=work nolist; delete _:; RUN;QUIT;');
   end;
   else do;
      call execute('DATA _hlp; LENGTH mean 8  analyzed $70; SET have(FIRSTOBS='||strip(put(_N_,best.))||' OBS='||strip(put(_N_,best.))||');');
      call execute('mean=.; analyzed="mean since '||strip(pm_start)||' not calculable";');
      call execute('RUN;');
      call execute('DATA want; SET want _hlp; RUN; PROC DATASETS lib=work nolist; delete _:; RUN;QUIT;');
   end;
RUN;


%** 2 - Using Macro;
DATA want2;
   set _NULL_;
RUN;
%macro calcMean(inDS=,outDS=,n=,Miss=,startVar=,endVar=);
   %if &Miss. ne 1 %then %do;
      DATA _hlp; 
         LENGTH mean 8 analyzed $70; 
         SET have(FIRSTOBS=&n. OBS=&n.);
         ARRAY pms_mean {*} &startVar.--&endVar.;
         mean=mean(of pms_mean[*]); 
         analyzed="Mean of &startVar.--&endVar.";
      RUN;
   %end;
   %else %do;
      DATA _hlp; 
         LENGTH mean 8 analyzed $70; 
         SET have(FIRSTOBS=&n. OBS=&n.);
         mean=.; 
         analyzed="Mean since &startVar. not calculable";
      RUN;
   %end;

   DATA &outDS.; 
      SET &outDS. _hlp; 
   RUN; 

   PROC DATASETS lib=work nolist; delete _:; RUN;QUIT;
%mend calcMean;

DATA _NULL_;
   length pmname pm_start pm_end $32;
   SET have;
   ARRAY PMS pm25_total2000-pm25_total2018;
   do i=1 to dim(PMS);
      pmname    =strip(upcase(vname(PMS[i])));
      pm_start  =strip(upcase(cats('pm25_total',put(reqyear-9,best.))));
      pm_end    =strip(upcase(cats('pm25_total',put(reqyear,best.))));
   end;

   put 'N' 'OTE:' Person= reqyear= pm_start= pm_end=;
   if pm_start GE vname(PMS[1]) then 
         call execute('%nrstr(%calcMean(inDS=want,outDS=want2,n='||strip(put(_N_,best.))||'       ,startVar='||strip(pm_start)||',endVar='||strip(pm_end)||');)');
   else  call execute('%nrstr(%calcMean(inDS=want,outDS=want2,n='||strip(put(_N_,best.))||',Miss=1,startVar='||strip(pm_start)||',endVar='||strip(pm_end)||');)');
RUN;
________________________

- Cheers -

TobbeNord
Obsidian | Level 7
Wow! That was really complicated to follow. Thanks for all the effort
ballardw
Super User

@TobbeNord wrote:

I have 18 numerical variables pm25_total2000 to pm25_total2018


If those variables are numbered sequential there are 19 variables. If you only have 18 then you need to tell us which one is missing.

If you are going to throw at problem that has 20+ variables you should provide an example data set as making us dummy of that much data is asking a bit.

 

Here is one way to get started. I am too lazy to make up that many values so I am using 10 "years" and much shorter variable names.

data have;
  input v2000 - v2009 reqyear;
datalines;
1 2 3 4 5 6 7 8 9 10 2005
11 22 33 44 55 66 77 88 99 110 2008
;

data want;
  set have;
  array v(2000:2009) v2000-v2009;
  array t(2000:2009) _temporary_;
  /* reset as temporary arrays hold values across data step)*/
  call missing(of t(*));
  /*load temporary array*/
  do i=(reqyear-3) to reqyear;
     t[i]=v[i];
  end;
  Average= mean(of t(*));
  drop i;
run;

The key to my approach are two arrays. One that hold the full set of values and one that we copy the values to for the mean  function to use. If you have not used arrays much you may not recognize the array statement with two values separated by a colon. That is used to indicate the low and high index values instead of the default 1 to number of elements in the array. This is often useful when you have some other variable such as your Reqyear to extract values by index.

The T array is temporary, meaning that the contents are not written to the result data set. If you want to see what they might be remove the _temporary_.

One thing is that while the values of _temporary_ arrays aren't written to the output data set the values are retained. So we need to start fresh with T at each step.

Since I only used 10 variables I am using 3 previous "years" which is seen on the Do I= loop to extract the desired  values into the temporary array.

 

TobbeNord
Obsidian | Level 7
Thank you very much!
Ksharp
Super User
data have;
call streaminit(123);
array x{*} pm25_total2000-pm25_total2018;
do id=1 to 10;
  start_year=rand('integer',2013,2018);
  do i=1 to dim(x);
    x{i}=rand('integer',1,10);
  end;
  output;
end;
drop i;
run;


data want;
 set have;
 sum=0;
 do year=start_year-9 to start_year;
   sum+vvaluex(cats('pm25_total',year));
 end;
 mean=sum/10;  
 drop sum year;
run;
TobbeNord
Obsidian | Level 7
Thank you, that solution was smooth.
Oligolas
Barite | Level 11

easy 😅 thanks

________________________

- Cheers -

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2170 views
  • 5 likes
  • 4 in conversation