BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
myname
Calcite | Level 5

Hello everyone,

 

I am currently facing the problem of subsetting a dataset by words listed in a wordlist.

The dataset I have is quite large and includes the variables word and collection. I want to subset this dataset based on the observations in the variable word. For this I have several wordlists that include up to 60 words. As I do not want to write them all in a data step (especially as the word lists may change later on) I wrote myself a macro:

OPTIONS MPRINT;
%macro subset(cat);
proc sql ;
select distinct word 
into :word_list_&cat separated by " "
from work.&cat;
quit;

data work.subset_&cat;
set work.insurance;
%do i=1 %to 50;
%if word="qscan(&word_list_&cat,&i)" %then output;
%end;
run;
%mend subset;


%subset(fire)

I am quite new to working with macros, so this does not work, obviously. The error I get is

"WARNING: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation

marks."

which I understand, SAS obviously reads the whole word list into the quotes, but I only want the first, second, third, and so on at one time.

 

I would be very grateful if someone could help me out with this problem.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

This would be simpler:

 

OPTIONS MPRINT;
%macro subset(cat);

proc sql;
create table work.subset_&cat as
select * 
from  work.insurance
where word in (select word from work.&cat);
quit;

%mend subset;

%subset(fire)

(untested)

PG

View solution in original post

7 REPLIES 7
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post test data (as a datastep), it is very hard to visualise data we can't see.  At a guess:

data _null_;
  set catds;
  call execute(cats('data want_',word,'; set work.insuarance (where=(word="',word,'")); run;'));
run;

This will  create a datastep for each observation in catds which has a where for the word on that observation.  Quite simple.  however I would also advise against creating lots of datasets - it really is not a good way of working.  SAS is setup to use by group processing, so create a variable in your data whith this grouping, and then use that as a by group - far simpler coding. 

PGStats
Opal | Level 21

This would be simpler:

 

OPTIONS MPRINT;
%macro subset(cat);

proc sql;
create table work.subset_&cat as
select * 
from  work.insurance
where word in (select word from work.&cat);
quit;

%mend subset;

%subset(fire)

(untested)

PG
myname
Calcite | Level 5

 

data work.insurance;
infile datalines dsd;
input word:$25. category: $11.;
datalines;
inferno,earthquake
flames,earthquake
burning,earthquake
combustion,earthquake
burn,earthquake
shaking,earthquake
shake,earthquake
vibrate,earthquake
tremble,earthquake
magnitude,earthquake
epicenter,earthquake
fire,fire
blaze,fire
conflagration,fire
inferno,fire
firestorm,fire
burning,fire
combustion,fire
ignite,fire
;

In this case the wordlist for fire would look something like: fire, inferno, combustion, burning, firestorm.

I don't just want the observations in the fire category, but also the ones with the same words  in the other categories.

 

Thanks a lot for the help.

 

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, little change then:

data _null_;
  set insurance;
  by category;
  if first.category then call execute(cats('data want_',category,'; set have where=(word in ("',word,'"'));
  call execute(cats(',"',word,'"'));
  if last.category then call execute(')); run;');
run;

This will create one datastep per category with a where clause with each of the words in that category.

myname
Calcite | Level 5
I really tried to make that code work because I think it will help me a lot in further problems, but I couldn't.
SAS kept on giving out errors because of the comma in the second execute statement. I tried exchanging it with = or OR, both of which did not have the result I wanted. Is there another way to make it a list and not a condition?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Was a typo in there thats why, the where needs brackets around it.  Basically call execute is generating a string - this string is being sent to the compliler which tries to read it as code, just like the code you submitted.  So the text generated has to be valida SAS syntax.

 

data insurance;
  word="abc"; category="cat1"; output;
  word="def"; category="cat1"; output;
run;
data have;
  word="abc"; output;
run;

data _null_;
  set insurance;
  by category;
  if first.category then call execute(cats('data want_',category,'; set have (where=(word in ("',word,'"'));
  else call execute(cats(',"',word,'"'));
  if last.category then call execute('))); run;');
run;
myname
Calcite | Level 5
Thank you so much, this works perfectly!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1137 views
  • 0 likes
  • 3 in conversation