Solved: Performing operation by keywords in varaible name

lydiawawa · Posted 02-15-2019 02:52 PM

Hi All,

I have a dataset such as the following:

  login_loading_time    check_closing_time    login_closing_time    check_loading_time
    400                      415                   456                    421
    405                      214                   413                    411
    410                      412                   400                    403  
    450                      423                   451                    401
    454                      411                   441                    421
    458                      401                   433                    411
    750                      410                   412                    410
    760                      710                   451                    421
    770                      798                   413                    433

I'm interested in finding out the difference between xxx_loading_time and xxx_closing_time (xxx_loading_time - xxx_closing_time). Here xxx can be 'check' or 'login', and the outcome dataset should look like the following:

check_diff login_diff
  6             -56
 197            -8
  .              .
  .              .

In the actual dataset there are large number of xxx each paired with _closing_time and _loading_time.

Is there a way to perform arithmetic operation in SAS based on variable name? What will be the best way to achieve the outcome?

Any help is much appriciated.

ErikLund_Jensen · Posted 02-16-2019 10:14 AM

Hi @lydiawawa

I have tried to give a solution based on your input data. As @Reeza suggests, reformatting data may be a better approach in the long run, but it is an interesting challenge to solve it your way.

First part is getting some test data; I don't know what your real data looks like, so I made data with pairs of loading and closing times, different combinations of missing pairs and one case, where half of a pair is missing.

Next part is to define macro variables for each pair. You have a list of possible prefixes, and I just wrote the values in macro variables instead of making a dynamic prefix list. This is not from laziness, but because further use of data is much simpler, if output data has the same variables in same order from run to run regardles of actual input. Please note that the list has a prefix yyy not present in input data.

Third part is to use the macro variables in a data step. This is an example of using a macro to generate data step statements and then calling the macro in a data step to execute the generated statements. Please note that in obs. 5, login_loading_time is present, but login_closing_time is missing, so login_diff is set to missing, because a value cannot be computed.

Because the prefix list is not generated from the actual input, the macro generates statements for each possible prefix including yyy, so diff_variables are generated for all prefixes in same order as the numbering of macro variables.

* get some test data;
data have;
	infile datalines dlm=' ' truncover;
	input 
		xxx_closing_time 
		login_loading_time 
		check_closing_time 
		xxx_loading_time 
		check_loading_time 
		login_closing_time ;
	datalines;
473 411 462 413 412 451
473 . 462 413 412 .
. 411 462 . 412 451
. . 462 . 412 .
473 411 462 413 412 .
. 411 462 . 412 451
473 411 . 413 . 451
473 411 462 413 412 451
473 . . 413 . .
473 411 462 413 412 451
	;
run;

* Define macro variables with prefix values;
%let pref1 = xxx;
%let pref2 = check;
%let pref3 = login;
%let pref4 = yyy;
%let prefcnt = 4;

* Generate keep- and compute statements for each prefix defined;
%macro m;
	%do i = 1 %to &prefcnt;
		%let vp = &&pref&i;
		keep &vp._diff;
		&vp._diff = &vp._closing_time  - &vp._loading_time;
	%end;
%mend;

* Run computation of differences;
data want; set have;
	%m;
run;

result:

View solution in original post

Reeza · Posted 02-15-2019 03:34 PM

It depends, but overall your structure doesn't lend itself well to any of the options.

I would recommend setting up two arrays, one with the opening and one with closing times.

data want;
set have;

array sTime (*) check_start_time login_start_time;
array eTime(*) check_closing_time login_closing_time;

array dTime(*) diff_check diff_login

do i=1 to dim(sTime);
dtime(i) = sTime(i) - etime(i);
end;

run;

A much better approach is to reformat your data to be in the form of:

Type loading_Time closing_time
Login 400 456
Check 415 421
Login 405 413
Check 214 411
....
....

I suspect in the long run that may be a better approach overall.

@lydiawawa wrote:

Hi All,

I have a dataset such as the following:
  login_loading_time    check_closing_time    login_closing_time    check_loading_time
    400                      415                   456                    421
    405                      214                   413                    411
    410                      412                   400                    403  
    450                      423                   451                    401
    454                      411                   441                    421
    458                      401                   433                    411
    750                      410                   412                    410
    760                      710                   451                    421
    770                      798                   413                    433
I'm interested in finding out the difference between xxx_loading_time and xxx_closing_time (xxx_loading_time - xxx_closing_time). Here xxx can be 'check' or 'login', and the outcome dataset should look like the following:
check_diff login_diff
  6             -56
 197            -8
  .              .
  .              .  
In the actual dataset there are large number of xxx each paired with _closing_time and _loading_time.

Is there a way to perform arithmetic operation in SAS based on variable name? What will be the best way to achieve the outcome?

Any help is much appriciated.

ballardw · Posted 02-15-2019 03:36 PM

One way would be to use arrays such as:

data want;
   set have;
   array load login_loading_time check_loading_time ;
   array close login_closing_time check_closing_time ;
   array d{2}  login_diff check_diff;
   do i= 1 to dim(load);
      d[i] = load[i]-close[i];
   end;
run;

You would need to make sure that the variable name order matches between each of the array definition lists so that the correct load and close time are aligned. Since you do not have existing diff variables you need to specify the number of difference variables, the number of parentheses after D in the array statement matches the number of pairs that you have.

Another approach would be create a data set that values like:

TYPE loadtime close

where type would have values such as LOGIN or CHECK or the other prefixes to your variable names. Without knowing more about your data and source there could be different ways to accomplish this. Likely I would be looking at a data step to read the data from a text file as a first step.

lydiawawa · Posted 02-15-2019 03:48 PM

Is there a way to index variable name by xxx, such that xxx_loading_time or xxx_closing_time. I have a list of possible xxx, and I don't think the varialbes are in certain order...

Reeza · Posted 02-15-2019 03:50 PM

Not easily, you can pull them from SASHELP.VCOLUMN or PROC DATASETS and then sort them, but you need to have the same number of variables for sure.

@lydiawawa wrote:

Is there a way to index variable name by xxx, such that xxx_loading_time or xxx_closing_time. I have a list of possible xxx, and I don't think the varialbes are in certain order...

Patrick · Posted 02-15-2019 06:52 PM

@lydiawawa

In the following some sample code creating this dynamic variable list as @Reeza suggested.

data have;
  infile datalines dlm=' ' truncover;
  input login_loading_time check_closing_time LOGIN_CLOSING_TIME CHECK_LOADING_TIME;
  some_other_var=1;
  other1_loading_time=1;
  other2_closing_time=1;
  datalines;
400 415 456 421
405 214 413 411
410 412 400 403 
450 423 451 401
454 411 441 421
458 401 433 411
750 410 412 410
760 710 451 421
770 798 413 433
;
run;

/** 
  populate macro vars to be used for array definition 
**/

/* only select variable names which come in pairs (closing/loading) and follow naming pattern */
proc sql;
  create view V_col_names as
    select name, upcase(scan(name,1,'_')) as type length=32
    from dictionary.columns
    where libname='WORK' and memname='HAVE'
      and prxmatch('/\w+_(closing|loading)_time/oi',name)>0
    group by scan(calculated type,1,'_')
    having count(*)=2
    order by calculated type, upcase(scan(name,2,'_'))
  ;
quit;

/* create and populate macro variables */
%let source_varlist=;
%let target_varlist=;
data _null_;
  set V_col_names end=last;
  by type;
  length source_varlist target_varlist $32767;
  retain source_varlist target_varlist;
  source_varlist=catx(' ',source_varlist,name);
  if first.type then
    do;
      target_varlist=catx(' ',target_varlist,cats(scan(name,1,'_'),'_diff'));
    end;
  if last then
    do;
      call symputx('source_varlist',source_varlist);
      call symputx('target_varlist',target_varlist);
    end;
run;
%put &=source_varlist;
%put &=target_varlist;

/** 
  create desired result 
**/
data want(drop=_:);
  set have;
  array source_vars {*} &source_varlist;
  array target_vars {*} &target_varlist;
  do _i=2 to dim(source_vars) by 2;
    target_vars[_i/2]=source_vars[_i-1]-source_vars[_i];
/*    formula_used=cats(vname(target_vars[_i/2]),'=',vname(source_vars[_i-1]),'-',vname(source_vars[_i]));*/
/*    output;*/
  end;
run;

ErikLund_Jensen · Posted 02-16-2019 10:14 AM

Hi @lydiawawa

I have tried to give a solution based on your input data. As @Reeza suggests, reformatting data may be a better approach in the long run, but it is an interesting challenge to solve it your way.

First part is getting some test data; I don't know what your real data looks like, so I made data with pairs of loading and closing times, different combinations of missing pairs and one case, where half of a pair is missing.

Next part is to define macro variables for each pair. You have a list of possible prefixes, and I just wrote the values in macro variables instead of making a dynamic prefix list. This is not from laziness, but because further use of data is much simpler, if output data has the same variables in same order from run to run regardles of actual input. Please note that the list has a prefix yyy not present in input data.

Third part is to use the macro variables in a data step. This is an example of using a macro to generate data step statements and then calling the macro in a data step to execute the generated statements. Please note that in obs. 5, login_loading_time is present, but login_closing_time is missing, so login_diff is set to missing, because a value cannot be computed.

Because the prefix list is not generated from the actual input, the macro generates statements for each possible prefix including yyy, so diff_variables are generated for all prefixes in same order as the numbering of macro variables.

* get some test data;
data have;
	infile datalines dlm=' ' truncover;
	input 
		xxx_closing_time 
		login_loading_time 
		check_closing_time 
		xxx_loading_time 
		check_loading_time 
		login_closing_time ;
	datalines;
473 411 462 413 412 451
473 . 462 413 412 .
. 411 462 . 412 451
. . 462 . 412 .
473 411 462 413 412 .
. 411 462 . 412 451
473 411 . 413 . 451
473 411 462 413 412 451
473 . . 413 . .
473 411 462 413 412 451
	;
run;

* Define macro variables with prefix values;
%let pref1 = xxx;
%let pref2 = check;
%let pref3 = login;
%let pref4 = yyy;
%let prefcnt = 4;

* Generate keep- and compute statements for each prefix defined;
%macro m;
	%do i = 1 %to &prefcnt;
		%let vp = &&pref&i;
		keep &vp._diff;
		&vp._diff = &vp._closing_time  - &vp._loading_time;
	%end;
%mend;

* Run computation of differences;
data want; set have;
	%m;
run;

result:

lydiawawa · Posted 02-18-2019 10:34 AM

@Patrick @ErikLund_Jensen

Thank you for the responses, I'm not currently with the computer that has SAS. Will let you know as soon as I tested out the code. Thank you so much for the detailed explanations!

Performing operation by keywords in varaible name

Re: Performing operation by keywords in varaible name

Re: Performing operation by keywords in varaible name

Re: Performing operation by keywords in varaible name

Re: Performing operation by keywords in varaible name

Re: Performing operation by keywords in varaible name

Re: Performing operation by keywords in varaible name

Re: Performing operation by keywords in varaible name

Re: Performing operation by keywords in varaible name

SAS Innovate 2025: Call for Content

Classroom Training Available!