BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wj2
Quartz | Level 8 wj2
Quartz | Level 8

Hello Community, 

 

I have a very large list (n=2,500) of character values of a variable that I need to define in an IF THEN statement. For example:

 

IF varA in ('a', 'b', 'c', 'd' .... n=2,500) THEN newvar=1;

ELSE newvar=0;

 

Is there a shortcut that I can use to define all 2,500 character values in my statement WITHOUT having to add quotation marks around every single value? 

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So I answered that already, but let's spell it out in more detail.  Take your list from WORD and put in into a text file. Or you could even just paste it into your program as in-line data.  Then you can read it into a dataset and use the dataset to match to your list of subjects.

data list ;
  length disease $100;
  infile cards dsd flowover ;
  input disease @@;
  if disease ne ' ';
cards4;
headache,gout,knee pain
cancer,diabetes
;;;;

proc sql noprint ;
create table want as 
select subjectID
     , max( diagnosis in (select disease from list) ) as FLAG
from have
group by subjectid
;
quit;

You might need to clean up data you have pulled from WORD to make it acceptable as data.  Perhaps they have commas in the middle of some of the values?  WORD could have converted some of the characters in the strings to "pretty" versions so that they will no longer match the values in your data. 

 

You will probably need to do a little work to make sure that the values in your list actually match the values in your data.  Perhaps you want to force everything to uppercase to make matching easier?

upcase(diagnosis) in (select upcase(disease) from list)

Or once you have the data in dataset the other answers with examples of how to pull the values out into macro variables will also work.

proc sql noprint;
select quote(trim(disease)) into :mylist separated by ' '
from list ;
quit;

data want;
  set have;
  flag = diagnosis in (&mylist);
run;

View solution in original post

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

So the 2500 character values in question are in a separate data set or?

Jagadishkatam
Amethyst | Level 16

Please try below approach

 

I used sashelp.class dataset with name variable

 

proc sql;
select cats("'",name,"'") into: nam separated ',' from sashelp.class;
quit;

%put &nam;

data want;
set sashelp.class;
if name in (&nam) then flag=1;
else flag=.;
run;
Thanks,
Jag
wj2
Quartz | Level 8 wj2
Quartz | Level 8

Hi @Jagadishkatam, this looks to be a very good approach! However, how/where would I list the specific values of the name variable (e.g., Alice, John, Jane) that I am interested in defining for the variable flag? Thanks!

Tom
Super User Tom
Super User

@wj2 wrote:

Hi @Jagadishkatam, this looks to be a very good approach! However, how/where would I list the specific values of the name variable (e.g., Alice, John, Jane) that I am interested in defining for the variable flag? Thanks!


I really don't understand what you are asking for.

Where is the list now? Isn't already in a dataset?
If not then make one.

data mylist;
  infile cards truncover ;
  input word $100.;
cards;
Alice
John
Jane
Bill Smith
;

Of course once you have it in a dataset then no need for quotes at all to create your new variable.

proc sql ;
create table want as
select *,name in (select word from mylist) as flag
from sashelp.class
;
quit;
Jagadishkatam
Amethyst | Level 16

Please replace the dataset sashelp.class and name variable with your dataset and variable and follow the code logic

Thanks,
Jag
ballardw
Super User

@wj2 wrote:

Hi @Jagadishkatam, this looks to be a very good approach! However, how/where would I list the specific values of the name variable (e.g., Alice, John, Jane) that I am interested in defining for the variable flag? Thanks!


Make a data set with the values. If the values already exist in a list somewhere that should not be too difficult. I suspect that you are not making this list up as you go along.

 

 

Once you have a data set there are numerous ways such as @Jagadishkatam , create a custom format or informat and assign the new value using Put or Input, a Proc sql join

wj2
Quartz | Level 8 wj2
Quartz | Level 8

To clarify, I have one variable (e.g., varA) of a data set that contains all the values that I am interested in. These values are already in character format ($254.).

 

I need to create a new variable (e.g., newvar) that contains only select values from varA. The issue is that there are 2,500 select values of varA that I need to include in newvar but to my knowledge, you need to put quotations around each character value of in IF THEN statement. Therefore, I'm wondering if there is a way around this to save time and avoid error. Also, I need to create this variable for each subject of a data set, so my coding framework is as follows.

 

data want;
set have;
by subjectID;
if first.subjectID then do;
   newvar=0; 
end;
retain newvar;
if varA in ('a','b','c',.....) /* My question is how do I get around manually adding quotation marks here because there is a list of 2,500 values of varA that I need to add here. I already have the list of the 2,500 values that can be copy and pasted here*/
then newvar=1;
if last.subjectID;
run;

Using @Jagadishkatam example with sashelp.class:

proc sql;
select cats("'",name,"'") into: nam separated ',' from sashelp.class;
quit;

%put &nam;

data want;
set sashelp.class;
if name in (Alice,John,Jane) then flag=1; /*How do I get around needing to add quotation marks to the values here?*/
else flag=.;
run;

 

Jagadishkatam
Amethyst | Level 16

replace the (Alice,John,Jane)  with &nam macro variable we are creating, it carries the quotation and values. 

 

Thanks,
Jag
wj2
Quartz | Level 8 wj2
Quartz | Level 8

Thank you @Jagadishkatam. I tried the code; however, SAS is still expecting a quoted string of values where indicated. Any suggestions would be much appreciated. 

 

proc sql;
select cats("'",name,"'") into: nam separated ',' from sashelp.class;
quit;

%put &nam;

data want;
set sashelp.class;
if name in (&nam) then flag=1;
else flag=.;
run;

data want2;
set want;
if name in (Alice,John,Jane) then newvar=1; /*SAS is still expecting a quoted string here. How can I define these character values without using quotations?*/
else newvar=0;
run;

proc freq data=want2;
tables newvar;
run;
Tom
Super User Tom
Super User

This is really beginning to sound like an XY problem.  http://xyproblem.info/

Or perhaps more of a chicken-egg problem.  https://en.wikipedia.org/wiki/Chicken_or_the_egg

 

How do you propose to teach SAS which values should result in FLAG being set?

You either have to list them out or provide it with the list in some other form.

You could download your list to Excel or a text file and then either eliminate the records you don't want. Or add a new column with the flag value. 

 

But there is not really any need to ever use IN with a list of values.  But if you did then for a short lists of words that need to be quoted I use this %QLIST() macro. https://github.com/sasutils/macros/blob/master/qlist.sas

data want;
  set sashelp.class ;
  newvar= name in %qlist(Alice John Jane);
run;

If your list has embedded spaces you can use some other character as the delimiter, like |.

data want;
  set sashelp.class ;
  newvar= name in %qlist(Alice Jones|John Smith|Jane Doe,delimit=|);
run;
wj2
Quartz | Level 8 wj2
Quartz | Level 8

@Tom, Yes, thank you for your insight. Essentially I am trying to create a binary variable (yes/no) indicating whether or not a patient has a certain diagnosis. If the patient has ANY one diagnosis out my selected list of 2,500 diagnoses then newvar=1. I already have the list of 2,500 diagnoses of interest compiled (i.e., in a Word document) and they are simply listed together separated by commas (e.g., a,b,c,d,e,f,....). I am wondering how I can just simply copy and paste that list in a code in order to create this variable. 

 

My current coding approach is as shown below; however, this approach requires that I add quotations around each value (since it is a character variable), which would be too time consuming and prone to error. Now that I have provided a better description of the problem, any suggestions would be GREATLY appreciated. 

 

data want;
set have;
by subjectID;
if first.subjectID then do;
   newvar=0; 
end;
retain newvar;
if varA in ('a','b','c',...) /* This approach requires that I add quotations around each value. Is there an alternate approach where I can simply copy and past my list of 2,500 values (in format a,b,c,d, etc.) into a code to create the variable?*/
then newvar=1;
if last.subjectID;
run;

 

Tom
Super User Tom
Super User

So I answered that already, but let's spell it out in more detail.  Take your list from WORD and put in into a text file. Or you could even just paste it into your program as in-line data.  Then you can read it into a dataset and use the dataset to match to your list of subjects.

data list ;
  length disease $100;
  infile cards dsd flowover ;
  input disease @@;
  if disease ne ' ';
cards4;
headache,gout,knee pain
cancer,diabetes
;;;;

proc sql noprint ;
create table want as 
select subjectID
     , max( diagnosis in (select disease from list) ) as FLAG
from have
group by subjectid
;
quit;

You might need to clean up data you have pulled from WORD to make it acceptable as data.  Perhaps they have commas in the middle of some of the values?  WORD could have converted some of the characters in the strings to "pretty" versions so that they will no longer match the values in your data. 

 

You will probably need to do a little work to make sure that the values in your list actually match the values in your data.  Perhaps you want to force everything to uppercase to make matching easier?

upcase(diagnosis) in (select upcase(disease) from list)

Or once you have the data in dataset the other answers with examples of how to pull the values out into macro variables will also work.

proc sql noprint;
select quote(trim(disease)) into :mylist separated by ' '
from list ;
quit;

data want;
  set have;
  flag = diagnosis in (&mylist);
run;
Ksharp
Super User

PROC FORMAT or Hash Table .

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 5223 views
  • 1 like
  • 6 in conversation