Hi SAS users,
I have a wide-format datafile that consists of groups 20 of variables like:
vl1 vl2 vl3 ..... vl20
ob1 1000 240 3004 ..... 10
ob2 50 50 50 ..... 20
ob3 40 100 150 ..... 30
ob4 243 230 . ..... .
...
ob1200
So I have variables grouped by visits like vl1-vl20, bp1-bp20, kidney1-kidney20, cd4_1-cd4_20, etc.
I need to create an additional variable based on each group of 20, in case at least one of the variables satisfies a certain condition. for example for variables vl1-vl20, I need to create a new variable vl_success that take a value of 1 if vl1 < 50 or vl2 <50 or vl3 < 50...or vl20<50 etc.
Since I need to do this for each of the group of variables (and I have many), I don't want to write it all by hand like here:
data repeated2;
set repeated1;
if (vl1 < 50 and vl1 ne .) or (vl2 < 50 and vl2 ne .) or (vl3 < 50 and vl3 ne .) ... or (vl20 < 50 and vl20 ne .) then
vl_success = 1;
else vl_success = 0;
run;
Could you please help me with this?
For conditions as simple as posted in your sample you could go with the approach @Astounding proposed here:
For your example something like below should work.
data have;
array vl {20} 8 (20*100);
vl5=.;
output;
vl6=49;
output;
stop;
run;
data want;
set have;
vl_success= ( min(of vl1-vl20)<50 );
run;
proc print data=want;
run;
Missings aren't an issue in your case because:
The MIN function returns a missing value (.) only if all arguments are missing.
You'd have to expand your question with some more detail, but some starting points for you:
1. Arrays to loop instead of multiple statements
2. arrays to hold diagnosis as well
3. Or using WHICHC()/WHICHN() to search an array at once.
@Dinurik wrote:
Hi SAS users,
I have groups of 20 variables (for 20 repeated measurements) and now I need to create one new variable that takes a value of 1 if at least one of the 20 variables satisfy the condition. I know that I can do it with:
if (cd41<= 200 and cd41 ne . ) or (cd42 <=200 and cd42 ne .) or ... or (cd420 <= 200 and cd420 ne .) then imf = 1;
else imf = 0;
run;
But I have many such groups of variables (20 cd4 counts, 20 viral loads, 20 bp measurements, etc.) and am looking for a fast way to do the same.
Could you please help me with this?
Hi Reeza,
Thanks. I tried this:
array vl(20) vl1-vl20;
if . < vl(20) < 50 then vl_success = 1; else vl-success = 0;
run;
But it seems that SAS interpreted it as if (. < vl1 < 50) AND (. < vl2 < 50) ... AND (. < vl20 <50); so basically all values for vl_success are 0. I am not sure how to use other methods. WHICHN sounds promising though. I will try it.
See if this gives you hint.
data example; input v1 v2 v3 v4; array v v1-v4; imf = (.< min(of v(*)) le 200); datalines; . . . . 1 2 3 4 220 240 400 600 20 . 300 120 ; run;
The min function only returns missing when all values are missing. So if any of the values are between . and 200 the min will be in that range of values. The comparison returns a 1 when true and 0 when false. SAS allows comparisons like a<x<b so you need not use that "and ne to ."
The "of v(*)" says to use all of the elements of the array for the calculation of the min value.
You would need to get into something a bit more complex if you need different values compared for different variables. But if all of the values are considering the same boundary then MIN, or if you were looking for >400 for example, then MAX.
Thanks for the explanation! Very helpful.
For conditions as simple as posted in your sample you could go with the approach @Astounding proposed here:
For your example something like below should work.
data have;
array vl {20} 8 (20*100);
vl5=.;
output;
vl6=49;
output;
stop;
run;
data want;
set have;
vl_success= ( min(of vl1-vl20)<50 );
run;
proc print data=want;
run;
Missings aren't an issue in your case because:
The MIN function returns a missing value (.) only if all arguments are missing.
It worked! Thank you so much!
Thank you. Didn't know about this function. It's so useful!
You have got solutions. The Minimum function has been used to get the Indicator (1 or 0). Since minimum function must inherently need to check all elements of the array, a slightly better approach for this particular problem is to stop the checking on the first occasion when the condition is met. The output below shows how many array elements are compared before exiting from the do loop (see I).
data have; input vl1 - vl5; datalines; 100 34 49 60 200 200 60 10 70 80 10 80 20 40 50 100 60 60 70 80 ; run; data want; set have; array v[*] vl1 - vl5; do i = 1 to dim(v) until(ind = 1); ind = (v[i] < 50); end; run; Obs vl1 vl2 vl3 vl4 vl5 i ind 1 100 34 49 60 200 2 1 2 200 60 10 70 80 3 1 3 10 80 20 40 50 1 1 4 100 60 60 70 80 6 0
Thank you! I like this logic, I will definitely use it for other tasks too.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.