Hello, I have a question on how to run a query in a better way. I am thinking an array may work here but am stumped at how.
I have a data set that has 5 code values. I need to use a combination of codes to create a new category.
I have a dataset called have below:
data have;
length service_type $10;
input ID $ code1 code2 code3 code4 code5 ;
datalines;
1 11 7 13 986 0
2 17 309 13 0 0
3 11 922 357 999 999
4 18 300 0 0 999
5 20 7 0 0
6 749 984 0 0 0
7 750 751 0 0 999
8 34 15 4 0 0
9 999 1 0 0
10 13 0 0 0 0
11 11 0 0 13 999
12 18 0 1 0 0
13 17 0 0 0 0
14 16 1 0 0 0
15 300 0 0 1 0
16 749 0 0 751 0
run;
I can use the below if-then-else statement to create my categories, where I have multiple 'or' statements, but is there a better way to run the below code. In this example, I only showed 5 codes, but in reality, I have a dataset with about 25 codes. Where code1 remains the same, what changes are codes 2 to 25.
data want;
set have;
if (code1 in (11,17,18) and code2 =7) or (code1 in (11,17,18) and code3 =7)
or (code1 in (11,17,18) and code4 =7)
or (code1 in (11,17,18) and code5 =7) then cars='1'; else cars='0';
if (code1 in (11,17,18) and code2 =13) or (code1 in (11,17,18) and code3 =13)
or (code1 in (11,17,18) and code4 =13)
or (code1 in (11,17,18) and code5 =13) then trucks='1'; else trucks='0';
if (code1 = (749) and code2 in (984,751)) or (code1 = (749) and code3 in (984,751))
or (code1 = (749) and code4 in (984,751))
or (code1 = (749) and code5 in (984,751)) then pickup='1'; else pickup='0';
run;
Thank you in advance!
Or here another coding option.
%macro codecheck(outvar, code1Val, code2to5Val);
&outvar='0';
if code1 in (&code1Val) then
do _i=1 to dim(codes2to5);
if codes2to5[_i] in (&code2to5Val) then
do;
&outvar='1';
leave;
end;
end;
drop _i;
%mend;
data want;
set have;
array codes2to5 {*} code2 - code5;
%codecheck(cars, 11 17 18, 7)
%codecheck(trucks, 11 17 18, 13)
%codecheck(pickup, 749, 984 751)
run;
This looks pretty good to me.
You might be able to reduce the volume of code a bit, but that would be at the cost of legibility. So I wouldn't unless you have other reasons.
If anything, I'd make the logic even clearer by aligning better.
data want;
set have;
if (code1 in (11,17,18) and code2 =7)
or (code1 in (11,17,18) and code3 =7)
or (code1 in (11,17,18) and code4 =7)
or (code1 in (11,17,18) and code5 =7) then cars='1'; else cars='0';
if (code1 in (11,17,18) and code2 =13)
or (code1 in (11,17,18) and code3 =13)
or (code1 in (11,17,18) and code4 =13)
or (code1 in (11,17,18) and code5 =13) then trucks='1'; else trucks='0';
if (code1 = (749) and code2 in (984,751))
or (code1 = (749) and code3 in (984,751))
or (code1 = (749) and code4 in (984,751))
or (code1 = (749) and code5 in (984,751)) then pickup='1'; else pickup='0';
run;
If you want to centralise values, this is fine too
%let car_val1 =11,17,18 ; %let car_val2 =7 ;
%let trk_val1 =11,17,18 ; %let trk_val2 =13 ;
%let pik_val1 =749 ; %let pik_val2 =984,751 ;
data want;
set have;
if (code1 in (&car_val1) and code2 in (&car_val2) )
or (code1 in (&car_val1) and code3 in (&car_val2) )
or (code1 in (&car_val1) and code4 in (&car_val2) )
or (code1 in (&car_val1 and code5 in (&car_val2) ) then cars='1'; else cars='0';
if (code1 in (&trk_val1) and code2 in (&trk_val2) )
or (code1 in (&trk_val1) and code3 in (&trk_val2) )
or (code1 in (&trk_val1) and code4 in (&trk_val2) )
or (code1 in (&trk_val1) and code5 in (&trk_val2) ) then trucks='1'; else trucks='0';
if (code1 in (&pik_val1) and code2 in (&pik_val2) )
or (code1 in (&pik_val1) and code3 in (&pik_val2) )
or (code1 in (&pik_val1) and code4 in (&pik_val2) )
or (code1 in (&pik_val1) and code5 in (&pik_val2) ) then pickup='1'; else pickup='0';
run;
Thank you for your response!
The problem is that I have about 30 codes, and need to create 10 categories. For simplicity I showed 5 codes with 3 categories.
But with 30 codes I would have 30 rows each and doing it 10 times. So I would have 300 rows of repeat using the or statement.
So I was hoping an array or a macro example where it would know to repeat the statements 30 times for each category.
Hope that helps.
Thanks!
Maybe something like this then?
(untested, can't use SAS atm, so there might be a typo)
%let cars_val1 =11,17,18 ; %let cars_val2 =7 ;
%let trucks_val1 =11,17,18 ; %let trucks_val2 =13 ;
%let pickup_val1 =749 ; %let pickup_val2 =984,751 ;
%let vehicles= cars trucks pickups;
data want;
set have;
%macro loop;
%do i=1 %to %sysfunc(countw(&vehicles));
%let vehicle=%scan(&vehicles,&i);
if (code1 in (&&&vehicle._val1) and code2 in (&&&vehicle._val2) )
or (code1 in (&&&vehicle._val1) and code3 in (&&&vehicle._val2) )
or (code1 in (&&&vehicle._val1) and code4 in (&&&vehicle._val2) )
or (code1 in (&&&vehicle._val1) and code5 in (&&&vehicle._val2) ) then &vehicle='1'; else &vehicle='0';
%end;
%mend;
%loop
run;
Or, to be a bit more efficient:
%macro loop;
%do i=1 %to %sysfunc(countw(&vehicles));
%let vehicle=%scan(&vehicles,&i);
if code1 in (&&&vehicle._val1) and
( code2 in (&&&vehicle._val2)
| code3 in (&&&vehicle._val2)
| code4 in (&&&vehicle._val2)
| code5 in (&&&vehicle._val2) ) then &vehicle='1'; else &vehicle='0';
%end;
%mend;
%loop
Thank you!
I tried running the below and am getting some errors (pasted after the code), will try to figure out how the macro is working. If this works then that would be great!!!
data have;
length service_type $10;
input ID $ code1 code2 code3 code4 code5 ;
datalines;
1 11 7 13 986 0
2 17 309 13 0 0
3 11 922 357 999 999
4 18 300 0 0 999
5 20 7 0 0
6 749 984 0 0 0
7 750 751 0 0 999
8 34 15 4 0 0
9 999 1 0 0
10 13 0 0 0 0
11 11 0 0 13 999
12 18 0 1 0 0
13 17 0 0 0 0
14 16 1 0 0 0
15 300 0 0 1 0
16 749 0 0 751 0
run;
%let cars_val1 =11,17,18 ; %let cars_val2 =7 ;
%let trucks_val1 =11,17,18 ; %let trucks_val2 =13 ;
%let pickup_val1 =749 ; %let pickup_val2 =984,751 ;
%let vehicle= cars trucks pickups;
data want;
set have;
%macro loop;
%do i=1 %to %sysfunc(countw(&vehicle));
%let vehicle=%scan(&vehicle,&i);
if code1 in (&&&vehicle._val1) and
( code2 in (&&&vehicle._val2)
| code3 in (&&&vehicle._val2)
| code4 in (&&&vehicle._val2)
| code5 in (&&&vehicle._val2) ) then &vehicle='1'; else &vehicle='0';
%end;
%mend;
%loop
run;
%let cars_val1 =11,17,18 ; %let cars_val2 =7 ; 3123 %let trucks_val1 =11,17,18 ; %let trucks_val2 =13 ; 3124 %let pickup_val1 =749 ; %let pickup_val2 =984,751 ; 3125 3126 %let vehicle= cars trucks pickups; 3127 3128 data want; 3129 set have; 3130 3131 %macro loop; 3132 %do i=1 %to %sysfunc(countw(&vehicle)); 3133 %let vehicle=%scan(&vehicle,&i); 3134 if code1 in (&&&vehicle._val1) and 3135 ( code2 in (&&&vehicle._val2) 3136 | code3 in (&&&vehicle._val2) 3137 | code4 in (&&&vehicle._val2) 3138 | code5 in (&&&vehicle._val2) ) then &vehicle='1'; else &vehicle='0'; 3139 %end; 3140 %mend; 3141 %loop NOTE: Line generated by the macro variable "VEHICLE". 1 &_val1 - 22 200 WARNING: Apparent symbolic reference _VAL1 not resolved. NOTE: Line generated by the macro variable "VEHICLE". 1 &_val2 - 22 76 WARNING: Apparent symbolic reference _VAL2 not resolved. WARNING: Apparent symbolic reference _VAL2 not resolved. WARNING: Apparent symbolic reference _VAL2 not resolved. WARNING: Apparent symbolic reference _VAL2 not resolved. NOTE: Line generated by the macro variable "VEHICLE". 1 &_val1 - 22 200 WARNING: Apparent symbolic reference _VAL1 not resolved. NOTE: Line generated by the macro variable "VEHICLE". 1 &_val2 - 22 76 WARNING: Apparent symbolic reference _VAL2 not resolved. WARNING: Apparent symbolic reference _VAL2 not resolved. WARNING: Apparent symbolic reference _VAL2 not resolved. WARNING: Apparent symbolic reference _VAL2 not resolved. ERROR 22-322: Syntax error, expecting one of the following: a quoted string, a numeric constant, a datetime constant, a missing value, iterator, (. ERROR 200-322: The symbol is not recognized and will be ignored. ERROR 76-322: Syntax error, statement will be ignored. 3142 run; NOTE: The SAS System stopped processing this step because of errors. WARNING: The data set WORK.WANT may be incomplete. When this step was stopped there were 0 observations and 9 variables. WARNING: Data set WORK.WANT was not replaced because this step was stopped. NOTE: DATA statement used (Total process time): real time 0.02 seconds cpu time 0.01 seconds
At least for the cases where you've got only to search for one value in code2 - code5 the whichn() function could help.
if code1 in (11,17,18) and whichn(7, code2, code3, code4, code5)>0 then cars='1';
else cars='0';
And if you have to check multiple values in code2 to code5 then something like below might do the job.
array codes2to5 {*} code2 - code5;
pickup='0';
if code1 = 749 then
do _i=1 to dim(codes2to5);
if codes2to5[_i] in (904,751) then
do;
pickup='1';
leave;
end;
end;
Or here another coding option.
%macro codecheck(outvar, code1Val, code2to5Val);
&outvar='0';
if code1 in (&code1Val) then
do _i=1 to dim(codes2to5);
if codes2to5[_i] in (&code2to5Val) then
do;
&outvar='1';
leave;
end;
end;
drop _i;
%mend;
data want;
set have;
array codes2to5 {*} code2 - code5;
%codecheck(cars, 11 17 18, 7)
%codecheck(trucks, 11 17 18, 13)
%codecheck(pickup, 749, 984 751)
run;
YES!!!!!!! THANK YOU!!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.