DATA Step, Macro, Functions and more

How to define an array by values for each observation?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

How to define an array by values for each observation?

Hi everyone,

 

I have a dataset that contains startdates end enddates for enrollment in educational programmes. I now wish to make variables for each month in the process, that show, for each record, if the person has attended education in that month.

 

I have tried to solve this by creating a number of month_variables (all initially set to the value of 0), in the form of month_2016_1 (for january 2016 etc.), and variables for the monthpart and yearpart for start and enddate, and then tried to use these start and enddates as macros to be used in an array, but it doesn't work. Presumably because the macros are not defined for each observation, but for the dataset as a whole? 

 

 What I have tried looks something like this:

 

data want;
set have;

%let macro_start_month=startmonth;

%let macro_start_year=startyear;

%let macro_end_month=endmonth;

%let macro_end_year=endyear;

 

array omk1 month_&macro_start_year._&macro_start_month - month_&macro_end_year._&macro_end_month;
.......

run;

 

Any help solving this problem is greatly appreciated!

 

Best regards,

Jacob Hornnes 

 

 


Accepted Solutions
Solution
‎01-04-2018 04:46 AM
Super User
Super User
Posts: 9,407

Re: How to define an array by values for each observation?

I totally agree with @KurtBremser, having a long data format is far easier to work with from a programming perspective.  I would just add, that in some instances it is necessary - for an output file, to transpose the data.  Now you could do this with arrays also, not saying this is a good way, but it is possible.  As dates are numeric total number of elements is:

(year_end-year_start * 12) + (month_end-month_start) + 1

Note, not checked that, as haven't anything to mess about with at the moment.

E.g.

data want;
  set have;
  elements=(year_end-year_start * 12) + (month_end-month_start) + 1;
  array abc{elements} $5 ...;
run;

Then you would have to work out how to give them labels.  

Still its easier to create the long format then transpose up if need be.

 

View solution in original post


All Replies
Valued Guide
Posts: 531

Re: How to define an array by values for each observation?


JacobH wrote:

Presumably because the macros are not defined for each observation, but for the dataset as a whole? 

 

 


Hi Jacob,

 

It's how the SAS macro processor works. SAS code is first handled by that macro processor before the data step runs. By the time the macro processor is has done its job all macro references are removed, either by executing statements (%let ...) or by resolving variable references (&macro_start_...). Be careful to distinguish what happens when. This is a common mistake and I suggest you read up on macro processing in general. The resolution of your actual challenge, creating variable names based on data values, will be much more evident after that. 

 

On a side note I confess that variables that contain data values are my main pet peeve. They tend to make reporting and analysis very hard.

 

Hope this helps,

-- Jan

Super User
Posts: 9,886

Re: How to define an array by values for each observation?

You have a misconception about the working of the macro PREprocessor.

data want;
set have;

%let macro_start_month=startmonth;
%let macro_start_year=startyear;
%let macro_end_month=endmonth;
%let macro_end_year=endyear;

After this, your macro variables will have the following TEXT(!) values:

&macro_start_month: startmonth

&macro_start_year: startyear

&macro_end_month: endmonth

&macro_end_year: endyear

 

So the next line of code will look like this after the preprocessor has resolved the macro variables:

array omk1 month_startyear_startmonth - month_endyear_endmonth;

You can see that the (date) values you expected are not present. The macro preprocessor does its work before the data step is compiled and can never have access to the values of data step variables (only to their names).

 

You also cannot create an array individually and dynamically for each observation in the input dataset. The array is declared once and created when the data step is compiled, and will stay that way throughout the execution of the data step. Only the values of its contained variables change.

What you want to do is best achieved by creating individual observations for each month in the output dataset (what is called a "long" format) with a do loop.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 9

Re: How to define an array by values for each observation?

Posted in reply to KurtBremser

Thanks for the answer! It was beginning to dawn on me, that I was using the macros in a completely wrong way Smiley Happy

 

Do you have some code, or a reference to get me started regarding creating the long format with a do loop? 

 

Best regards,

Jacob 

Super User
Posts: 9,886

Re: How to define an array by values for each observation?

This might look like that:

data want (drop=startmonth startyear endmonth endyear);
set have;
format cur_month date9.;
cur_month = mdy(startmonth,1,startyear);
do while (curmonth le mdy(endmonth,1,endyear));
  output;
  cur_month = intnx('month',cur_month,1,'b');
end;
run;

You will probably add additional logic in the do loop to create/calculate month-specific values.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 9

Re: How to define an array by values for each observation?

Posted in reply to KurtBremser

Thank you SO much for your help! Your program does the trick very neatly! Smiley Happy 

Solution
‎01-04-2018 04:46 AM
Super User
Super User
Posts: 9,407

Re: How to define an array by values for each observation?

I totally agree with @KurtBremser, having a long data format is far easier to work with from a programming perspective.  I would just add, that in some instances it is necessary - for an output file, to transpose the data.  Now you could do this with arrays also, not saying this is a good way, but it is possible.  As dates are numeric total number of elements is:

(year_end-year_start * 12) + (month_end-month_start) + 1

Note, not checked that, as haven't anything to mess about with at the moment.

E.g.

data want;
  set have;
  elements=(year_end-year_start * 12) + (month_end-month_start) + 1;
  array abc{elements} $5 ...;
run;

Then you would have to work out how to give them labels.  

Still its easier to create the long format then transpose up if need be.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 151 views
  • 4 likes
  • 4 in conversation