Solved: Where Statement with multiple variables

hannah_hortlik · Posted 11-29-2019 10:59 AM

Hey guys, hope you're all well.

I'm getting curious about a new Problem I have.

In my data I have 89 variables (called code1, code2 ...) , and each variable has a different value. For example the value "E123". There is always a letter and then a number, sometimes just two numbers, sometimes more.

I want a table, where I want to look threw all 89 variables and want to know how often there are the values "E123", "E124", ... to "E200".

I tried:

proc freq data = codes
tables (names) *(code: ) / norow nocol cumcol nocum plots=none;
where ('E123'<= code: <='E200');
run;

But sas doesn't get that ":" , it's always an error. Also when I say:

proc freq data = codes
tables (names) *(code: ) / norow nocol cumcol nocum plots=none;
where ('E123'<= code1-89 <='E200'); / where ('E123'<= code1 - code89 <='E200');
run;

Then it says, for the where statement is an numeric value needed. But when I just say: where code1='E123' it works so...

Hope you understand my problem and can help me,

thanks a lot!

Tom · Posted 11-30-2019 02:05 PM

The simplest way is to just count in the data step instead of writing out the full list and then using PROC FREQ to count.

data count;
  set have end=eof;
  array code [89];
  do index=1 to dim(code);
     if 'E123' <= code[index]  <= 'E200' then count+1;
  end;  
  if eof then do;
    put count=;
    output;
  end;
  keep count;
run;

The more general way would be to make a format that maps all of those values to the same string.

proc format ;
value $mygroup 
  'E123'  - 'E200' = 'E123 - E200'
;
proc freq data=new;
  tables a_code;
  format a_code $mygroup.;
run;

View solution in original post

PaigeMiller · Posted 11-29-2019 11:09 AM

If I am understanding this properly, I would do this in an ARRAY in a DATA step.

data want;
    set codes;
    array code code:;
    do i=1 to dim(code);
        if code(i) in ('E123','E124','E125', ... ,'E200') then count+1;
        /* in the above statement, you have to type in all of the values like 'E123' you want to test for */
    end;
    drop i;
run;

If you really want to get fancy, you could have a macro variable that contains the values 'E123' to 'E200' and then you wouldn't have to do the typing.

--
Paige Miller

hannah_hortlik · Posted 11-29-2019 11:32 AM

Well, but I don't want to type in all of the values because that will be many values 😄
How do I code a macro variable in this case?
And does that show me how often the values are in the code1-89 variables?

Astounding · Posted 11-29-2019 11:16 AM

Regardless of the syntax, the concept will not work. There is no way to apply one WHERE subset to one of the tables, and a different WHERE subset to another table. You will need to build all tables from the same set of observations.

There are a few ways you could proceed. You could use macro language to generate a separate PROC FREQ for each variable. But here is a quick and dirty method just in case it helps.

proc format;
   value $e low-'E122' = 'Lower'   'E201'-high = 'Higher';
run;

proc freq
   tables (names) * (code: ) / norow nocol cumcol nocum plots=none;
   format code: $e.;
run;

Note that these tables will be inconvenient to read. You might want to add the LIST option:

... plots=none list;

If you add that, you might want to remove the table options after the "/".

And the "quick and dirty" method does get you the extra two rows that you can ignore.

hannah_hortlik · Posted 11-29-2019 11:44 AM

I don't get what I get then I'm sorry.

Because in my result-table I still see the single values "E123" but I only want to see how many not which. And I don't understand what I get out of that value line.

Maybe, it would be more easy to create a new value, called "new" and in that value all codes code1-code89 are included? And then say where new = "E123" - E200"

Or is it not at all possible with the where statement?

Astounding · Posted 11-29-2019 12:07 PM

Looks like I didn't understand the problem fully. Based on that description, try modifying the format:

proc format;
   value $e 
   low-'E122' = 'Lower'   
   'E123'-'E200' = 'E123 to E200' 
   'E201'-high = 'Higher';
run;

That will collapse the various values into a single row of the table.

hannah_hortlik · Posted 11-30-2019 01:43 PM

Am I understanding it right?

I get a table, where I see how many values are between E123 and E200, and how many are higher? cause that's the name of the 2 columns of my table.

But I don't know why but the result is simply wrong - for code_1 if have 2 values (E125 and E 150), but in the table there is just "1. And I have 20 columns but it says "Higher = 12" and "total = 13".

I don't know where these numbers are coming from

Tom · Posted 11-29-2019 11:34 AM

Do you really care whether E123 appears in CODE1 instead of CODE89 ?

Transpose and count. If you have variables that uniquely identify the rows in your dataset you can use PROC TRANSPOSE:

proc transpose data=have out=tall ;
  by keys ;
  var code1-code89 ;
run;
proc freq data=tall;
  where col1 between 'E123' and 'E200' ;
   tables col1;
run;

If not then transpose with a data step. In that case you could keep TALL smaller by only writing the codes you want to count. You could even make it a view instead.

data tall / view=tall;
  set have;
  array code [89];
  do index=1 to dim(code);
     if 'E123' <= code[index]  <= 'E200' then do;
         a_code = code[index];
         output;
    end;
  end;
  keep /* key variables here */ a_code ;
run;
proc freq data=tall;
  tables a_code;
run;

hannah_hortlik · Posted 11-29-2019 11:59 AM

No I don't care where exactly E123-E200 in the variables code1-89 are, I just want to know how often.

When I look at your first solution, I cannot figure out what the

where col1 between 'E123' and 'E200' ;

part means - what is col1?

When I use Your second code, in the line where you say:

keep /* key variables here */ a_code ;

Which key variables do you mean? Like all the other variables I want to keep from the data?

The Problem is that I don't see any results. No error which is gut, but no results

Tom · Posted 11-29-2019 12:09 PM

If you don't tell PROC TRANSPOSE what prefix use when naming the resulting columns it will name them COL1, COL2 , ... So the value from the first observation in a group goes into COL1 from the second into COL2. etc.

That is why for your problem you need BY variables to make so that only one variable is created.

If you use the data step to transpose then make sure to use the variable you create in that step in the proc freq step. If you want to tall dataset/view to be useful include any other variables that are useful. Instead of KEEP you could also use DROP to remove the original 89 code variables.

hannah_hortlik · Posted 11-30-2019 01:59 PM

Okay so I use this:

data new ;
  set have;
  array code [89];
  do index=1 to dim(code);
     if 'E123' <= code[index]  <= 'E200' then do;
         a_code = code[index];
         output;
    end;
  end;
  keep /* key variables here */ a_code ;
run;
proc freq data=new;
  tables a_code;
run;

And get a table with a new line called "a_code" where are only the codes I defined (E123-E200). And as result from the proc freq code I see, how often there is E123, how often E124, ...

But now let's say, to make it more easy, I define E123 - E124 and there is 5 times E123 and 6 times E124. I want to know, how often there are the values between E123 and E124 (11 times in this case) but not seperated - just together.

Not liket his:

a_code Frequency

E123 5

E124 6

Just:

a_code Frequency

E123-124 11

For example! So I want to know the total number how often there are values between E123 and E200, but I don't want to see/know, how often the values are there individually. Hope you get what I want.

But already Thanks a lot!

Tom · Posted 11-30-2019 02:05 PM

The simplest way is to just count in the data step instead of writing out the full list and then using PROC FREQ to count.

data count;
  set have end=eof;
  array code [89];
  do index=1 to dim(code);
     if 'E123' <= code[index]  <= 'E200' then count+1;
  end;  
  if eof then do;
    put count=;
    output;
  end;
  keep count;
run;

The more general way would be to make a format that maps all of those values to the same string.

proc format ;
value $mygroup 
  'E123'  - 'E200' = 'E123 - E200'
;
proc freq data=new;
  tables a_code;
  format a_code $mygroup.;
run;

hannah_hortlik · Posted 12-01-2019 07:07 AM

Well, you just saved my ass. And day and week. Thank you so much, that is exactly what I want.

FreelanceReinh · Posted 11-29-2019 01:47 PM

@hannah_hortlik wrote:

(...) For example the value "E123". There is always a letter and then a number, sometimes just two numbers, sometimes more.

Do you mean two or more digits, i.e., the variables in question may contain codes like "E15" or "E1234"? Then you need to be careful with range specifications: The condition "E123" <= code <= "E200" (applied to a sufficiently long character variable code) would include codes E13 - E20 as well as E1230 - E1999, E12300 - E19999, etc. (and E2). Same with "between ... and" and with the range 'E123'-'E200' in a format definition.

To avoid this, you can rewrite the condition as

code=:'E' & 123<=input(substr(code,2),32.)<=200

(i.e., the value of code is an "E" followed by a number between 123 and 200).

Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Re: Where Statement with multiple variables

Registration is open

SAS Training: Just a Click Away