Hello, I have what appears to be an easy question but am getting an error. Perhaps you can help:
Below is an example--I want to subset dataset x based on pt variable common only in dataset y. I want to do this using a macro variable (I realize there are many ways to do this but I just want to do it this way)...Can you help with the error?
data x;
input pt ht;
datalines;
1001 100
1002 32
1003 42
1004 93
;
data y;
input pt;
datalines;
1003
1003
1001
;
proc sql noprint;
select distinct(pt) into :ids separated by ','
from y;
quit;
data x2;
set x;
if pt in &ids then output;
run;
I get an error message for the last datastep...Any ideas why?
Thanks very much for reading!
Just need to make a slight change in
data x2;
set x;
if pt in (&ids) then output;
run;
Thanks very much stat@sas...that was precisely what I was looking for...Didn't realize that the parenthesis are essential but it makes sense...macros are just text substitution so thx for your answer. It works!
Just keep in mind that the maximum number of characters that can be stored in a Macro Variable is 65,534. If you have a sizable dataset you may exceed this limitation and get an error.
Perhaps you could use a join, a merge or another type of lookup to ensure you don't run into this issue.
This is untested, so give it a shot and see how you go.
PROC SQL;
CREATE TABLE Y1 AS
SELECT X.PT, X.HT
FROM X WHERE X.PT IN
(SELECT DISTINCT(Y.PT)
FROM Y)
;
QUIT;
A Hash Table might also be a valid solution. The cool thing about the default functionality of the Hash Table is that it only contains unique values, so there is no need to dedupe or undertake a seperate SQL query to obtain the distinct values.
DATA HASHLOOKUP;
IF _N_ = 1 THEN DO;
DECLARE HASH HH(DATASET:'Y');
HH.DEFINEKEY('PT');
HH.DEFINEDONE();
END;
DO UNTIL (DONE);
SET X END=DONE;
RC = HH.CHECK();
IF RC = 0 THEN OUTPUT;
END;
RUN;
RC = HH.CHECK(); checks to see if the key appears in the Hash Table and where it does it RC is populated with 0 (if not it is populated with a number other than 0). If the values of RC is 0 then we output the values from the X dataset.
Thanks Scott...I really appreciate your response! Thank you so much. I didn't know you could use SQL that way to do the subset (not requiring say an inner join)...very cool for teaching me a new technique.
Thanks Scott for the Hash Table method....Unfortunatly I don't know much about hashes so your code is outside my limited brain capacity...However, I shall learn to use hashes & once I am comfortable enough shall revisit your solution in the future.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.