BookmarkSubscribeRSS Feed
hannah_hortlik
Obsidian | Level 7

Hey guys, hope you're doing well.

 

In my data I have 89 variables (called code_sd1, code_sd2, code_sd3, ...) and one variable (called main_3) , and each variable has a different value. For example the value "E123". There is always a letter and then a number, sometimes just two numbers, sometimes more.

 

Now I want to make a new variable, called "NEW" which includes on the one hand all 89 variables (code_sd1, ...), and that one variable (main_3). So basically i tried this:

 

 

proc format;
value dummy 0 = "no"
1 = "yes";
run;

NEW = 0; if (main_3 EQ 'E123' or code_sd1 EQ 'E123') then NEW = 1; else NEW = 0; ; format NEW dummy.;
run;

 

 

And then I create a table with:

proc freq data = example;
tables sex;
where NEW = 1;
run;

 

In that table I want to see, how many times main_3 or all 89 codes have the value "E123".

As you may have seen, in my code I wrote "if (main_3 EQ 'E123' or code_sd1 EQ 'E123') then NEW = 1;" Because I do not know how to write it, that all 89 code_sd variables are included.

If I write code_sd1-89 it doesn't work. Well there is no error, but the numbers in the tables are wrong.

If I write code_sd: it doesn't work either (ERROR 388-185: Expecting an arithmetic operator. ERROR 200-322: The symbol is not recognized and will be ignored. ERROR 76-322: Syntax error, statement will be ignored.).

 

I could write down:

"if (main_3 EQ 'E123' or code_sd1 EQ 'E123' or code_sd2 EQ 'E123' or code_sd3 EQ 'E123') then NEW = 1;"

But it's not smart. I hope you understand my problem and have a solution. Thanks for your time!

 

13 REPLIES 13
Reeza
Super User

Use WHICHC() if you're searching for an exact match.

 

data want;
set have;

array vars_list(*) code_sd1-code_sd89 main_3;

new = 0;
if whichc('E123', of vars_list(*)) then new =1;

run;

 

In general, here's a reference that illustrates how to refer to variables and datasets in a short cut list:
https://blogs.sas.com/content/iml/2018/05/29/6-easy-ways-to-specify-a-list-of-variables-in-sas.html

 


@hannah_hortlik wrote:

Hey guys, hope you're doing well.

 

In my data I have 89 variables (called code_sd1, code_sd2, code_sd3, ...) and one variable (called main_3) , and each variable has a different value. For example the value "E123". There is always a letter and then a number, sometimes just two numbers, sometimes more.

 

Now I want to make a new variable, called "NEW" which includes on the one hand all 89 variables (code_sd1, ...), and that one variable (main_3). So basically i tried this:

 

 

proc format;
value dummy 0 = "no"
1 = "yes";
run;

NEW = 0; if (main_3 EQ 'E123' or code_sd1 EQ 'E123') then NEW = 1; else NEW = 0; ; format NEW dummy.;
run;

 

 

And then I create a table with:

proc freq data = example;
tables sex;
where NEW = 1;
run;

 

In that table I want to see, how many times main_3 or all 89 codes have the value "E123".

As you may have seen, in my code I wrote "if (main_3 EQ 'E123' or code_sd1 EQ 'E123') then NEW = 1;" Because I do not know how to write it, that all 89 code_sd variables are included.

If I write code_sd1-89 it doesn't work. Well there is no error, but the numbers in the tables are wrong.

If I write code_sd: it doesn't work either (ERROR 388-185: Expecting an arithmetic operator. ERROR 200-322: The symbol is not recognized and will be ignored. ERROR 76-322: Syntax error, statement will be ignored.).

 

I could write down:

"if (main_3 EQ 'E123' or code_sd1 EQ 'E123' or code_sd2 EQ 'E123' or code_sd3 EQ 'E123') then NEW = 1;"

But it's not smart. I hope you understand my problem and have a solution. Thanks for your time!

 


 

JeffMaggio
Obsidian | Level 7

I'm not sure if main_3 is directly before the codes 1-89, but assuming 1 through 89 are next to each other in the dataset you could do something like this with arrays:

data want;
set have;
array codes{*} code_sd1--code_sd89;
new = 0;
if main_3 = 'E123' then NEW=1;
do i=1 to dim(codes);
    if codes{i} = 'E123' then NEW=1;
end;
run;
FreelanceReinh
Jade | Level 19

Hi @hannah_hortlik,

 

Also the IN operator could be used for a solution as short as this:

data want;
set have;
array a main_3 code_sd:;
new='E123' in a;
run;

If the search term ('E123') is stored in a character variable (say b), it can be used as well:

new=b in a;

 

hannah_hortlik
Obsidian | Level 7

Hello,

 

thanks for all your solutions. It worked! But I tried again this way cause for me it's mor simple:

 

NEW = 0;
if (main_3 EQ 'E123' or code_sd1-89 EQ 'E123') then NEW = 1;
else NEW = 0;

And the "code_sd1-89" part does work. I controlled it in one table and it was right.

 

But I have got a new question, and maybe therefor I need one of your solutions. Now I want to check multiple values, so not only the value 'E123', but also 'E000', 'E001', up to 'E999'.

 

NEW = 0;

if (main_3 EQ 'E000' - 'E999' or code_sd1-89 EQ 'E000' - 'E999') then NEW = 1;

else NEW = 0;

 

I know this Code does not work, but it's for showing you more exactly what I need. Thank you!

 

@Reeza @FreelanceReinh @JeffMaggio 

Reeza
Super User

If you're checking two lists it gets more complicated. Are you searching fore exactly those codes and do you need to consider the decimal portions? These are diagnosis codes, correct?

 

If so, one better method may be to flip both data sets to a long format (via PROC TRANSPOSE) and then use a SQL query against a list of values you want.

proc sql;
create table want as
select *
from long_list
where diag in (select diag_code from diag_list);
quit;

Transposing data tutorials:

Wide to Long:
https://stats.idre.ucla.edu/sas/modules/how-to-reshape-data-wide-to-long-using-proc-transpose/

https://stats.idre.ucla.edu/sas/modules/reshaping-data-wide-to-long-using-a-data-step/

Or you would use an array with two arrays, one for you list of variables and one with the list of diagnosis you're looking for and loop through it as demonstrated with @JeffMaggio solution.

 


@hannah_hortlik wrote:

Hello,

 

thanks for all your solutions. It worked! But I tried again this way cause for me it's mor simple:

 

NEW = 0;
if (main_3 EQ 'E123' or code_sd1-89 EQ 'E123') then NEW = 1;
else NEW = 0;

And the "code_sd1-89" part does work. I controlled it in one table and it was right.

 

But I have got a new question, and maybe therefor I need one of your solutions. Now I want to check multiple values, so not only the value 'E123', but also 'E000', 'E001', up to 'E999'.

 

NEW = 0;

if (main_3 EQ 'E000' - 'E999' or code_sd1-89 EQ 'E000' - 'E999') then NEW = 1;

else NEW = 0;

 

I know this Code does not work, but it's for showing you more exactly what I need. Thank you!

 

@Reeza @FreelanceReinh @JeffMaggio 


 


@hannah_hortlik wrote:

Hello,

 

thanks for all your solutions. It worked! But I tried again this way cause for me it's mor simple:

 

NEW = 0;
if (main_3 EQ 'E123' or code_sd1-89 EQ 'E123') then NEW = 1;
else NEW = 0;

And the "code_sd1-89" part does work. I controlled it in one table and it was right.

 

But I have got a new question, and maybe therefor I need one of your solutions. Now I want to check multiple values, so not only the value 'E123', but also 'E000', 'E001', up to 'E999'.

 

NEW = 0;

if (main_3 EQ 'E000' - 'E999' or  OF code_sd1-89 EQ 'E000' - 'E999') then NEW = 1;

else NEW = 0;

 

I know this Code does not work, but it's for showing you more exactly what I need. Thank you!

 

@Reeza @FreelanceReinh @JeffMaggio 


 

hannah_hortlik
Obsidian | Level 7

I tried your array code now, and it works. I get a new variable, and the results are right. But I need more than one new variable, but when I write down one of your array code after another, it doesn't work. What i need is:

NewVariable 1: checking the presence of E123

NewVariable 2: presence of Z90

(This helps me with me first problem. Cause FreelanceReinha told me my code_sd1-89 part doesn't work.)

 

 

And to your newest answer: Yes, these are diagnosis. I want to creat a new variable, which shows me how many people have the Diagnosis X, doesn't matter if it's a main (main_3) or one of the secondary (code_sd1-89) diagnosis.

So the first step: I need multiple new variables, which show me, how often one diagnosis is existing.

Second step: I need new variables, which show me, how often diagnosis in the range from e.g. E000-E999 are existing

Reeza
Super User
Transpose your data and use PROC FREQ. It's then a two step process that will scale for many other analysis.
Tom
Super User Tom
Super User

What did you think code_sd1-89 meant?  In that context the hyphen is a the subtraction operator.

Or are you trying to reference a variable named code_sd1_89 and just typed a hyphen instead of an underscore?

 

If you want to see if a character value falls in a range you can use:

'E000' <= main_3 <= 'E999'

Note that this is lexicographical ordering.  So 'E1' will fall into that range. As will 'E0notfound' or other gibberish values. 

if ('E000' <= main_3 <= 'E999') or ('E000' <= code_sd1_89 <= 'E999') then NEW = 1;

 

hannah_hortlik
Obsidian | Level 7

I thought that code_sd1-89 meant that sas will go threw all 89 variables checking if the value is in there.

And I meant the "-" symbol to say: from code_sd1 to code_sd89, you know?

 

But I guess it's like the others said: it doesn't work. So your second code doesn't work because what does

code_sd1_89

stand for?

But thanks for the range tip!

Tom
Super User Tom
Super User

For X - Y to specify a variable list the two words have to both end in a numeric suffix and the prefix has to match.

Examples:

code_sd1 - code_sd89
var9-var20
something230-something189 

Otherwise you are asking for X minus Y.

You cannot use a variable list in the middle of a comparison like (1 <= A <= 10) (or any other type of expression).

 

You can use them where you could type a normal space delimited list of variables.  Like in an array definition, a VAR statement, a KEEP statement, after the keyword OF in a function call to a function that takes an open ended number of arguments.

 

FreelanceReinh
Jade | Level 19

@hannah_hortlik wrote:
NEW = 0;
if (main_3 EQ 'E123' or code_sd1-89 EQ 'E123') then NEW = 1;
else NEW = 0;

And the "code_sd1-89" part does work. I controlled it in one table and it was right.

 

Nocode_sd1-89 is invalid syntax. It is definitely not interpreted as a variable list, but SAS would try to subtract 89 from the value of code_sd1, which doesn't make sense and requires an automatic character-to-numeric conversion ... The log must have contained one or more notes which have indicated this.

 

I agree with Reeza that using loops or a transposed data structure would be typical ways to deal with the new requirement. Another option might be using a regular expression applied to a concatenated string:

data want;
set have;
new=^^prxmatch('/~E\d{3}~/', '~'||catx('~',main_3,of code_sd:)||'~');
run;

This would (internally) create a long string of the form ~value1~value2~...~value90~, where value1, ..., value90 are the values of variables main_3, code_sd1, code_sd2, ..., code_sd89 (without leading or trailing blanks; assuming that there are no other variables whose names start with "code_sd"). It is important that the tilde character (~) is not contained in any of these 90 variables. (Otherwise use a different special character, but avoid conflicts with PRX metacharacters.) Then the PRXMATCH function searches for the pattern ~Eddd~, where "d" stands for a digit, and returns the position of the first hit or 0 if the pattern is not found. Finally the double NOT operator (^) transforms the result to 1 or 0 if the pattern was found (1) or not (0). If the pattern was found, we can conclude that at least one of the 90 variables must contain one of the 1000 values 'E000', 'E001', ..., 'E999'. Note that a value such as 'E45a' would not satisfy the condition (unlike the range condition mentioned by Tom).

 

Edit: Removed the redundant, hence unnecessary "?" metacharacter after "{3}" in the regular expression.

hannah_hortlik
Obsidian | Level 7

Thank you for telling me that code_sd1-89 is invalid syntax and not interpreted as variable list.

I tried your code and got a new table with the new variable (called new) but even if the code is in there, "NEW" is always 0.

And no the "~" is not part of the variables.

But when I change it to

new=^^prxmatch('/~E\d{2}~/', '~'||catx('~',main_3,of code_sd:)||'~');

 then it worked correct, cause the codes in the table I tried it out only have 2 numbers (So E12, E14, ...). But in other data the codes have 3 numbers (E123) and the ones with 2 numbers are not count when I use your code.

So using your code, only exact the values with 3 numbers are included. But I also want to count those with 2 numbers. (E123, E15, E12, E155, all of these and then as result get the number 4, cause EXXX is there 4 times) Hope you understand. Thanks!

FreelanceReinh
Jade | Level 19

@hannah_hortlik wrote:

But when I change it to

new=^^prxmatch('/~E\d{2}~/', '~'||catx('~',main_3,of code_sd:)||'~');

 then it worked correct, cause the codes in the table I tried it out only have 2 numbers (So E12, E14, ...). But in other data the codes have 3 numbers (E123) and the ones with 2 numbers are not count when I use your code.

So using your code, only exact the values with 3 numbers are included. But I also want to count those with 2 numbers. (E123, E15, E12, E155, all of these and then as result get the number 4, cause EXXX is there 4 times)


I'm not surprised. Let's just review the specifications (highlighting added) ...

@hannah_hortlik wrote:

Now I want to check multiple values, so not only the value 'E123', but also 'E000', 'E001', up to 'E999'.

... and the claim I made regarding my code (again, highlighting added):

@FreelanceReinh wrote:

Then the PRXMATCH function searches for the pattern ~Eddd~, where "d" stands for a digit, (...) If the pattern was found, we can conclude that at least one of the 90 variables must contain one of the 1000 values 'E000', 'E001', ..., 'E999'. Note that a value such as 'E45a' would not satisfy the condition (unlike the range condition mentioned by Tom).


Thanks to the flexibility of Perl regular expressions it's very easy to adapt the search pattern to the new specification "match codes of the forms Edd and Eddd, where d stands for a digit (0, 1, ..., 9)":

/~E\d{2,3}~/

Similarly, you could change it to /~E\d{1,3}~/ in order to include the ten codes E0, E1, ..., E9 (in addition to the 1100 codes E00, ..., E99, E000, ..., E999). The two numbers in the curly braces specify the minimum and maximum number of digits to be matched, respectively. So, it's up to you to adjust these parameters to your needs.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1334 views
  • 3 likes
  • 5 in conversation