Solved: Re: Dynamic Field Creation and Naming

DavidPhillips2 · Posted 07-19-2021 03:58 PM

Is there a way to automatically do the below;

if Year1 = 1 and academic_period = '200810' then Fall _2008 =1;	Year1	1	200810	Fall _2008
if Year2 = 1 and academic_period = '200710' then Fall _2008 =1;	Year2	1	200710	Fall _2008
if Year3 = 1 and academic_period = '200610' then Fall _2008 =1;	Year3	1	200610	Fall _2008
if Year4 = 1 and academic_period = '200510' then Fall _2008 =1;	Year4	1	200510	Fall _2008
if Year5 = 1 and academic_period = '200410' then Fall _2008 =1;	Year5	1	200410	Fall _2008
if Year6 = 1 and academic_period = '200310' then Fall _2008 =1;	Year6	1	200310	Fall _2008
if Year1 = 1 and academic_period = '200910' then Fall_2009 =1;	Year1	1	200910	Fall_2009
if Year2 = 1 and academic_period = '200810' then Fall_2009 =1;	Year2	1	200810	Fall_2009
if Year3 = 1 and academic_period = '200710' then Fall_2009 =1;	Year3	1	200710	Fall_2009
if Year4 = 1 and academic_period = '200610' then Fall_2009 =1;	Year4	1	200610	Fall_2009
if Year5 = 1 and academic_period = '200510' then Fall_2009 =1;	Year5	1	200510	Fall_2009
if Year6 = 1 and academic_period = '200410' then Fall_2009 =1;	Year6	1	200410	Fall_2009
if Year1 = 1 and academic_period = '201010' then Fall_2010 =1;	Year1	1	201010	Fall_2010
if Year2 = 1 and academic_period = '200910' then Fall_2010 =1;	Year2	1	200910	Fall_2010
if Year3 = 1 and academic_period = '200810' then Fall_2010 =1;	Year3	1	200810	Fall_2010
if Year4 = 1 and academic_period = '200710' then Fall_2010 =1;	Year4	1	200710	Fall_2010
if Year5 = 1 and academic_period = '200610' then Fall_2010 =1;	Year5	1	200610	Fall_2010
if Year6 = 1 and academic_period = '200510' then Fall_2010 =1;	Year6	1	200510	Fall_2010

I'm trying to dynamically create fields based on a formula. The probability of human error in the future is too high if I just use one SAS line per statement here.

The part that I want is the dynamic Fall_XXXX = 1 part based on a formula.

I experimented with *catx(' ', 'Fall', sum(input(substr(academic_period, 1, 4), 4.), 0)) = 1; But I'm missing the syntax to make this line work.

Tom · Posted 07-20-2021 11:04 AM

To make it truly dynamic use PROC TRANSPOSE.

data step1;
  set temp;
  calendar_year=input(acaedemic_period_base,4.)+input(substr(time_variable,5),32.)-1;
run;

proc transpose data=step1 prefix=FALL_ out=want(drop=_name_);
  by time_variable acaedemic_period_base notsorted;
  id calendar_year;
  var measure1;
run;

                   acaedemic_
        time_       period_      FALL_    FALL_    FALL_
Obs    variable       base        2008     2009     2010

  1     Year1        200810        1        .        .
  2     Year2        200710        1        .        .
  3     Year3        200610        1        .        .
  4     Year4        200510        1        .        .
  5     Year5        200410        1        .        .
  6     Year6        200310        0        .        .
  7     Year1        200910        .        1        .
  8     Year2        200810        .        1        .
  9     Year3        200710        .        1        .
 10     Year4        200610        .        0        .
 11     Year5        200510        .        0        .
 12     Year6        200410        .        0        .
 13     Year1        201010        .        .        1
 14     Year2        200910        .        .        0
 15     Year3        200810        .        .        0
 16     Year4        200710        .        .        0
 17     Year5        200610        .        .        0
 18     Year6        200510        .        .        0

View solution in original post

ballardw · Posted 07-19-2021 04:15 PM

I am not sure exactly what your STARTING data looks like.

I am very surprised that "academic_period" values from what appear to be 6 calendar year like values all get assigned the same Fall_XXXX value. Are you sure there are no typos in there?

What was this supposed to do?

sum(input(substr(academic_period, 1, 4), 4.), 0))

You are summing some value with zero. So why bother?

It may help to tell why you have "year1" etc to begin with and what the underlying rules are. This is one case where a working example may not actually provide sufficient information for a general algorithm such as what to do "next year" when Year7=1 and some academic period.

I suspect that there may be a possible solution with Intnx, your academic period and something extracted from the Year value but I am not going to expend much effort trying to derive obscure rules from an example.

I believe we have indicated a preference for starting data as data step code to give us something to work with before this.

DavidPhillips2 · Posted 07-19-2021 04:48 PM


proc sql;
	create table temp (time_variable varchar(50),
	acaedemic_period_base varchar(6),
	want varchar (50)
);
quit;
proc sql;insert into temp(time_variable, acaedemic_period_base, want) values ('Year1', '200810', 'Fall _2008 =1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year2', '200710', 'Fall _2008 =1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year3', '200610', 'Fall _2008 =1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year4', '200510', 'Fall _2008 =1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year5', '200410', 'Fall _2008 =1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year6', '200310', 'Fall _2008 =1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year1', '200910', 'Fall_2009 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year2', '200810', 'Fall_2009 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year3', '200710', 'Fall_2009 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year4', '200610', 'Fall_2009 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year5', '200510', 'Fall_2009 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year6', '200410', 'Fall_2009 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year1', '201010', 'Fall_2010 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year2', '200910', 'Fall_2010 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year3', '200810', 'Fall_2010 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year4', '200710', 'Fall_2010 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year5', '200610', 'Fall_2010 = 1');
insert into temp(time_variable, acaedemic_period_base, want) values ('Year6', '200510', 'Fall_2010 = 1');

quit;

DavidPhillips2 · Posted 07-19-2021 04:53 PM

The formula I pasted was to dynamically create fields like Fall 2010 and set the value equal to 1 based on a formula; I'm trying to figure out how to use any formula to create a dynamically named field. The fields will be used in a wide table.

Tom · Posted 07-19-2021 05:29 PM

You need to describe better what you are trying to do. If you want to dynamically name a variable you have two choices. Use PROC TABULATE with its ID statement to tell it which variable to use as the name. Or use some code generation.

But perhaps you just want to keep the data in a tall skinny structure instead of creating new variables. You could always create a report that converts it to a more rectangular layoff for viewing.

DavidPhillips2 · Posted 07-19-2021 09:57 PM

I'm trying to dynamically create a wide table off of a formula that defines many variables.

Tom · Posted 07-20-2021 12:27 AM

@DavidPhillips2 wrote:

I'm trying to dynamically create a wide table off of a formula that defines many variables.

PLEASE PROVIDE EXAMPLE DATA.

Show what SAS code you want to create from that data.

If you want to generate code from DATA the easiest way is to use the PUT statement in a data step to write the code to a file. You can then use %INCLUDE to execute the generated code.

Since you haven't really provided any workable example data or code let's just make a test case.

Say you have a dataset name META with two variables: VAR and CODE. The first has the name to use for a new variable and the second has the expression to use to assign a value to that new variable. Then you can generate a series of assignment statement to create the new variables.

filename code temp;
data _null_;
  set meta;
   file print;
   put var '=(' code ');' ;
run;

Now if have another dataset named HAVE that contains the actual data (the variables that are referenced by the CODE values) then you can use the generated code to create new variables by including it into a data step.

data want;
  set have;
%include code / source2;
run;

ChrisNZ · Posted 07-19-2021 06:24 PM

> You are summing some value with zero. So why bother?

This prevents returning missing values. Whether that's a good thing here remains to be seen.

High-Performance SAS Coding - Third Edition

ChrisNZ · Posted 07-19-2021 06:31 PM

> I experimented with *catx(' ', 'Fall', sum(input(substr(academic_period, 1, 4), 4.), 0)) = 1; But I'm missing the syntax to make this line work.

This line will not work. Variable names are all defined when the data step is compiled. You can't have functions to the left of the = sign (exceptions apply).

Your choices are:

- A macro that writes the code creating the variables you want. This code is then compiled as part of the data step.

- Creating variables that contain the values you need and then transposing the table to convert these values into variable names.

High-Performance SAS Coding - Third Edition

DavidPhillips2 · Posted 07-19-2021 09:54 PM

Do you mean I need to create intermediary variables and then save them to the final variable. I'm trying to save the result of the formula string to a data step variable. The logic will create 200 variables for a wide table.

ChrisNZ · Posted 07-19-2021 09:57 PM

I am saying all the variable names must be known when the data step compiles, so before it runs. You can't add variables after compilation.

High-Performance SAS Coding - Third Edition

DavidPhillips2 · Posted 07-19-2021 10:07 PM

Can I define them with a length statement and then populate them dynamically like with the catx statement?

ChrisNZ · Posted 07-19-2021 10:13 PM

If you want to go this route (rather than transposing) then you can define the variables in an array statement, and then refer to either the array index (position) or the variable name associated with that index. Like this:

data t;
  array A[3] VAR1 VAR2 VAR3;
  A[1]=1;
  VAR2=2;
run;

High-Performance SAS Coding - Third Edition

DavidPhillips2 · Posted 07-19-2021 10:17 PM

I'm trying to identify the name of the variable backward, the array method predefines the name. Since I'm working backward it might make sense to transpose. I've used transpose before, I'll look into options with it tomorrow.

ChrisNZ · Posted 07-19-2021 10:26 PM

If you don't know the variable names in advance, then transposing is probably better. The other option would be to build the names in a macro and then use the macro to define the array. That's more convoluted.

Having said that, if the names are random, how can you build a program that uses them?

High-Performance SAS Coding - Third Edition

Registration is open

SAS Training: Just a Click Away