Hi All,
I have a dataset such as the following:
login_loading_time check_closing_time login_closing_time check_loading_time
400 415 456 421 405 214 413 411 410 412 400 403 450 423 451 401 454 411 441 421 458 401 433 411 750 410 412 410 760 710 451 421 770 798 413 433
I'm interested in finding out the difference between xxx_loading_time and xxx_closing_time (xxx_loading_time - xxx_closing_time). Here xxx can be 'check' or 'login', and the outcome dataset should look like the following:
check_diff login_diff 6 -56 197 -8 . . . .
In the actual dataset there are large number of xxx each paired with _closing_time and _loading_time.
Is there a way to perform arithmetic operation in SAS based on variable name? What will be the best way to achieve the outcome?
Any help is much appriciated.
Hi @lydiawawa
I have tried to give a solution based on your input data. As @Reeza suggests, reformatting data may be a better approach in the long run, but it is an interesting challenge to solve it your way.
First part is getting some test data; I don't know what your real data looks like, so I made data with pairs of loading and closing times, different combinations of missing pairs and one case, where half of a pair is missing.
Next part is to define macro variables for each pair. You have a list of possible prefixes, and I just wrote the values in macro variables instead of making a dynamic prefix list. This is not from laziness, but because further use of data is much simpler, if output data has the same variables in same order from run to run regardles of actual input. Please note that the list has a prefix yyy not present in input data.
Third part is to use the macro variables in a data step. This is an example of using a macro to generate data step statements and then calling the macro in a data step to execute the generated statements. Please note that in obs. 5, login_loading_time is present, but login_closing_time is missing, so login_diff is set to missing, because a value cannot be computed.
Because the prefix list is not generated from the actual input, the macro generates statements for each possible prefix including yyy, so diff_variables are generated for all prefixes in same order as the numbering of macro variables.
* get some test data;
data have;
	infile datalines dlm=' ' truncover;
	input 
		xxx_closing_time 
		login_loading_time 
		check_closing_time 
		xxx_loading_time 
		check_loading_time 
		login_closing_time ;
	datalines;
473 411 462 413 412 451
473 . 462 413 412 .
. 411 462 . 412 451
. . 462 . 412 .
473 411 462 413 412 .
. 411 462 . 412 451
473 411 . 413 . 451
473 411 462 413 412 451
473 . . 413 . .
473 411 462 413 412 451
	;
run;
* Define macro variables with prefix values;
%let pref1 = xxx;
%let pref2 = check;
%let pref3 = login;
%let pref4 = yyy;
%let prefcnt = 4;
* Generate keep- and compute statements for each prefix defined;
%macro m;
	%do i = 1 %to &prefcnt;
		%let vp = &&pref&i;
		keep &vp._diff;
		&vp._diff = &vp._closing_time  - &vp._loading_time;
	%end;
%mend;
* Run computation of differences;
data want; set have;
	%m;
run;result:
It depends, but overall your structure doesn't lend itself well to any of the options.
I would recommend setting up two arrays, one with the opening and one with closing times.
data want; set have; array sTime (*) check_start_time login_start_time; array eTime(*) check_closing_time login_closing_time; array dTime(*) diff_check diff_login do i=1 to dim(sTime); dtime(i) = sTime(i) - etime(i); end; run;
A much better approach is to reformat your data to be in the form of:
Type loading_Time closing_time Login 400 456 Check 415 421 Login 405 413 Check 214 411 .... ....
I suspect in the long run that may be a better approach overall.
@lydiawawa wrote:
Hi All,
I have a dataset such as the following:
login_loading_time check_closing_time login_closing_time check_loading_time
400 415 456 421 405 214 413 411 410 412 400 403 450 423 451 401 454 411 441 421 458 401 433 411 750 410 412 410 760 710 451 421 770 798 413 433I'm interested in finding out the difference between xxx_loading_time and xxx_closing_time (xxx_loading_time - xxx_closing_time). Here xxx can be 'check' or 'login', and the outcome dataset should look like the following:
check_diff login_diff 6 -56 197 -8 . . . .In the actual dataset there are large number of xxx each paired with _closing_time and _loading_time.
Is there a way to perform arithmetic operation in SAS based on variable name? What will be the best way to achieve the outcome?
Any help is much appriciated.
One way would be to use arrays such as:
data want;
   set have;
   array load login_loading_time check_loading_time ;
   array close login_closing_time check_closing_time ;
   array d{2}  login_diff check_diff;
   do i= 1 to dim(load);
      d[i] = load[i]-close[i];
   end;
run;
You would need to make sure that the variable name order matches between each of the array definition lists so that the correct load and close time are aligned. Since you do not have existing diff variables you need to specify the number of difference variables, the number of parentheses after D in the array statement matches the number of pairs that you have.
Another approach would be create a data set that values like:
TYPE loadtime close
where type would have values such as LOGIN or CHECK or the other prefixes to your variable names. Without knowing more about your data and source there could be different ways to accomplish this. Likely I would be looking at a data step to read the data from a text file as a first step.
Is there a way to index variable name by xxx, such that xxx_loading_time or xxx_closing_time. I have a list of possible xxx, and I don't think the varialbes are in certain order...
Not easily, you can pull them from SASHELP.VCOLUMN or PROC DATASETS and then sort them, but you need to have the same number of variables for sure.
@lydiawawa wrote:
Is there a way to index variable name by xxx, such that xxx_loading_time or xxx_closing_time. I have a list of possible xxx, and I don't think the varialbes are in certain order...
In the following some sample code creating this dynamic variable list as @Reeza suggested.
data have;
  infile datalines dlm=' ' truncover;
  input login_loading_time check_closing_time LOGIN_CLOSING_TIME CHECK_LOADING_TIME;
  some_other_var=1;
  other1_loading_time=1;
  other2_closing_time=1;
  datalines;
400 415 456 421
405 214 413 411
410 412 400 403 
450 423 451 401
454 411 441 421
458 401 433 411
750 410 412 410
760 710 451 421
770 798 413 433
;
run;
/** 
  populate macro vars to be used for array definition 
**/
/* only select variable names which come in pairs (closing/loading) and follow naming pattern */
proc sql;
  create view V_col_names as
    select name, upcase(scan(name,1,'_')) as type length=32
    from dictionary.columns
    where libname='WORK' and memname='HAVE'
      and prxmatch('/\w+_(closing|loading)_time/oi',name)>0
    group by scan(calculated type,1,'_')
    having count(*)=2
    order by calculated type, upcase(scan(name,2,'_'))
  ;
quit;
/* create and populate macro variables */
%let source_varlist=;
%let target_varlist=;
data _null_;
  set V_col_names end=last;
  by type;
  length source_varlist target_varlist $32767;
  retain source_varlist target_varlist;
  source_varlist=catx(' ',source_varlist,name);
  if first.type then
    do;
      target_varlist=catx(' ',target_varlist,cats(scan(name,1,'_'),'_diff'));
    end;
  if last then
    do;
      call symputx('source_varlist',source_varlist);
      call symputx('target_varlist',target_varlist);
    end;
run;
%put &=source_varlist;
%put &=target_varlist;
/** 
  create desired result 
**/
data want(drop=_:);
  set have;
  array source_vars {*} &source_varlist;
  array target_vars {*} &target_varlist;
  do _i=2 to dim(source_vars) by 2;
    target_vars[_i/2]=source_vars[_i-1]-source_vars[_i];
/*    formula_used=cats(vname(target_vars[_i/2]),'=',vname(source_vars[_i-1]),'-',vname(source_vars[_i]));*/
/*    output;*/
  end;
run;
Hi @lydiawawa
I have tried to give a solution based on your input data. As @Reeza suggests, reformatting data may be a better approach in the long run, but it is an interesting challenge to solve it your way.
First part is getting some test data; I don't know what your real data looks like, so I made data with pairs of loading and closing times, different combinations of missing pairs and one case, where half of a pair is missing.
Next part is to define macro variables for each pair. You have a list of possible prefixes, and I just wrote the values in macro variables instead of making a dynamic prefix list. This is not from laziness, but because further use of data is much simpler, if output data has the same variables in same order from run to run regardles of actual input. Please note that the list has a prefix yyy not present in input data.
Third part is to use the macro variables in a data step. This is an example of using a macro to generate data step statements and then calling the macro in a data step to execute the generated statements. Please note that in obs. 5, login_loading_time is present, but login_closing_time is missing, so login_diff is set to missing, because a value cannot be computed.
Because the prefix list is not generated from the actual input, the macro generates statements for each possible prefix including yyy, so diff_variables are generated for all prefixes in same order as the numbering of macro variables.
* get some test data;
data have;
	infile datalines dlm=' ' truncover;
	input 
		xxx_closing_time 
		login_loading_time 
		check_closing_time 
		xxx_loading_time 
		check_loading_time 
		login_closing_time ;
	datalines;
473 411 462 413 412 451
473 . 462 413 412 .
. 411 462 . 412 451
. . 462 . 412 .
473 411 462 413 412 .
. 411 462 . 412 451
473 411 . 413 . 451
473 411 462 413 412 451
473 . . 413 . .
473 411 462 413 412 451
	;
run;
* Define macro variables with prefix values;
%let pref1 = xxx;
%let pref2 = check;
%let pref3 = login;
%let pref4 = yyy;
%let prefcnt = 4;
* Generate keep- and compute statements for each prefix defined;
%macro m;
	%do i = 1 %to &prefcnt;
		%let vp = &&pref&i;
		keep &vp._diff;
		&vp._diff = &vp._closing_time  - &vp._loading_time;
	%end;
%mend;
* Run computation of differences;
data want; set have;
	%m;
run;result:
Thank you for the responses, I'm not currently with the computer that has SAS. Will let you know as soon as I tested out the code. Thank you so much for the detailed explanations!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
