Solved: macro or array: loop inside loop to recode of repeating sets of variab...

rchung · Posted 05-24-2023 02:20 PM

Hello.

I have a survey data from 27 agencies (n=27). Each agency can enter information of up to 25 employees. For each employee, there are 17 variables, including: employment start date, employment end date, name, email, phone, position title, and FTE. I show you only 7 of them in this post.

I have transformed this agency-level data into agency-quarter-level data. In other words, each row of agency is duplicated for 40 quarters. For example, the row of agency1 is duplicated for 40 quarters from 1910-Q1 through 1919-Q4.

Now, I have to remove the values if the staff was not employed on the last day [YYYYQ_end] of the quarter. For example:

if start_dt_1 > YYYYQ_end or end_dt_1 < YYYYQ_end then do;

name_1=" ";

email_1=" ";

phone_1=" ';

role_1=" ";

fte=.;

end;

I have to do this for up to 25 staff members per agency, and for 15 variables per staff member. What would be the best way to get this job done?

Thank you so much!

Rakkoo

ballardw · Posted 05-24-2023 03:19 PM

If Name_1 and all the variables ending in 1 relate to one employee, and Name_2 and the variables ending in 2 to a second employee then your data structure is poor. You would be better off with one observation per employee and variables with names like Name, start_dt, end_dt.

Really.

As it is you need to define 17 arrays, one for each "variable". The : list builder says 'use all the variable names that start with the text before the : " in a list of variables.

data want;
   set have;
   array n (*) Name_: ;
   array start (*) start_dt: ;
   array end  (*) end_dt: ;
   /* follow the obvious pattern for all the variables*/
   do i=1 to dim(n);
      if not ( start[i] le yyyyqend le end[i]) then call missing(n[i],email[i],phone[i]);
   end;
run;

The function CALL Missing works with multiple variables to set them missing. That is all it does. After you have all of the arrays defined use all of them following the pattern to include them in the call missing.

Just because someone may give you data in a stupid structure with 25 people across doesn't mean that you have to keep it.

View solution in original post

ballardw · Posted 05-24-2023 03:19 PM

If Name_1 and all the variables ending in 1 relate to one employee, and Name_2 and the variables ending in 2 to a second employee then your data structure is poor. You would be better off with one observation per employee and variables with names like Name, start_dt, end_dt.

Really.

As it is you need to define 17 arrays, one for each "variable". The : list builder says 'use all the variable names that start with the text before the : " in a list of variables.

data want;
   set have;
   array n (*) Name_: ;
   array start (*) start_dt: ;
   array end  (*) end_dt: ;
   /* follow the obvious pattern for all the variables*/
   do i=1 to dim(n);
      if not ( start[i] le yyyyqend le end[i]) then call missing(n[i],email[i],phone[i]);
   end;
run;

The function CALL Missing works with multiple variables to set them missing. That is all it does. After you have all of the arrays defined use all of them following the pattern to include them in the call missing.

Just because someone may give you data in a stupid structure with 25 people across doesn't mean that you have to keep it.

PaigeMiller · Posted 05-24-2023 04:50 PM

Agreeing with @ballardw, this is much easier to handle in a long format. Maxim 19:

Long beats wide.

(Don't keep data in structure)

In the world of spreadsheets, people tend to line up data side-by-side, and put data items (dates, categories, …) into column headers. This runs counter to all the methods available in SAS for group processing, and makes programming difficult, as one has variable column names and has to resort to creating dynamic code (with macros and/or call execute) where such is not necessary at all if categories were represented in their own column and data aligned vertically.
There are times where a wide format is needed, eg when preparing data for regression analysis. But for the processing and storing of data, long formats are always to be preferred.

Dynamic variable names force unnecessary dynamic code.

--
Paige Miller

rchung · Posted 05-25-2023 12:11 AM

It is very helpful. Thank you so much! 🙂

macro or array: loop inside loop to recode of repeating sets of variables

Re: macro or array: loop inside loop to recode of repeating sets of variables

Re: macro or array: loop inside loop to recode of repeating sets of variables

Re: macro or array: loop inside loop to recode of repeating sets of variables

Long beats wide.

Re: macro or array: loop inside loop to recode of repeating sets of variables

macro or array: loop inside loop to recode of repeating sets of variables

Re: macro or array: loop inside loop to recode of repeating sets of variables

Re: macro or array: loop inside loop to recode of repeating sets of variables

Re: macro or array: loop inside loop to recode of repeating sets of variables

Long beats wide.

Re: macro or array: loop inside loop to recode of repeating sets of variables

Registration is open

SAS Training: Just a Click Away