BookmarkSubscribeRSS Feed
Longduration
Calcite | Level 5
Hello

I have a column in a data set that contains multiple values. I'd like to filter this column for x number of different values. my code below works but it only does it for the last value. I've tried to use DO etc to cycle through each value but I've failed..


PROC SQL;
SELECT cOUNT (*) INTO COUNT
FROM TEST.DATABASE.
QUIT;

PROC SQL;
SELECT VARIABLES
INTO :VARIABLES1-:VARIABLES%TRIM(%LEFT&COUNT))
FROM TEST.DATABASE;
QUIT;

%MACRO TESTING;

DATA TEST.FILTERED;
SET TEST.ORIGINAL;

WHERE (VARIABLES CONTAINS "&VARIABLES");
RUN;
%MEND TESTING;
%TESTING
16 REPLIES 16
SPR
Quartz | Level 8 SPR
Quartz | Level 8
Hello Longdiratuon,
If I understood you correctly this is a solution:
[pre]
PROC SQL;
SELECT VARIABLES
INTO :VALUELIST separated by ","
FROM TEST.DATABASE
;QUIT;
DATA TEST.FILTERED;
SET TEST.ORIGINAL;
WHERE (VARIABLES in (&VALUELIST);
RUN;
[/pre]
Sincerely,
SPR
art297
Opal | Level 21
I read your request slightly differently than SPR did. Does the following example match the scenario you describe?:
[pre]

data database;
input variables $;
cards;
James
Judy
Robert
;

PROC SQL noprint;
SELECT "variables contains "||quote(strip(VARIABLES))
INTO :VARIABLES separated by " or "
FROM DATABASE;
QUIT;

data testfile;
length variables $30;
SET sashelp.class (rename=(name=variables));
if variables eq "Jeffrey" then
variables="Jeffrey Al James";
else if variables eq "Philip" then
variables="Judy Robert Philip";
run;

DATA FILTERED;
SET testfile;
WHERE &VARIABLES;
RUN;
[/pre]
Art
chang_y_chung_hotmail_com
Obsidian | Level 7

Not sure if the so-called global macro variable array is necessary. You can filter it simply with the where statement like below:


proc sql;
   /* a la TEST.DATABASE */
   create table names as
   select name
   from sashelp.class
   where name eqt 'A';

   /* a la  TEST.FILTERED */
   select *
   from  sashelp.class
   where name in (select name from names);
quit;
/* on lst
   Name      Sex       Age    Height    Weight
   Alfred    M          14        69     112.5
   Alice     F          13      56.5        84
*/

Longduration
Calcite | Level 5
Thanks for your responses.

I think SPR you're along the lines of what Im trying to do. But with your code, similar to the other suggestions, only seem to work with one variable (?).

I would like to automise it so as not to manually type in the code WHERE NAME =xx etc. The person running the program just has to change which variables they want in the external file.

By luck, Ive found a sample code that does what I want but I can't change the output to my library rather than 'work' . It splits the data set into individual files corresponding to each variable thats specified.


%MACRO SPLIT.
PROC SQL.
SELECT VARIABLE FROM TEST.DATABASE;
SELECT VARIABLE
INTO: VARIABLE1 -:VARIABLE&SQLOBS.
FROM TEST.DATABASE;
QUIT;

DATA
%DO i=1 %TO &SQLOBS; &&VARIABLE&i %END; ;
SET TEST.ORIGINAL;
SELECT (COLUMNx);

%DO j=1 %TO &SQLOBS;
WHEN ("&&VARIABLE&j") OUTPUT &&VARIABLE&j;
%END;
OTHERWISE;
END;

RUN;

%MEND;
%SPLIT;
Longduration
Calcite | Level 5
Well I managed to solve my problem, using an example in the sas documentation:

proc sql noprint;
select count(distinct type)
into :n
from sql.features;
select distinct type
into :type1 - :type%left(&n)
from sql.features;
quit;

%macro makeds;
%do i=1 %to &n;
data &&type&i (drop=type);
set sql.features;
if type="&&type&i";
run;
%end;
%mend makeds;
%makeds;
art297
Opal | Level 21
Don't take this wrong, but I'm totally confused as to which problem you solved other than that you figured out how to use the combination of proc sql and macro code to create multiple files for each unique value in a dataset.

That was not how you had originally described the problem you needed to solve and I don't know how obtaining those files solves your original problem.

In your initial post you stated: "I have a column in a data set that contains multiple values. I'd like to filter this column for x number of different values."

You then provided an example where you had two files, labeled test.database and test.original, and you indicated that you were trying to create a third file labeled test.filtered.

I rewrote my original proposed solution so that it creates and uses an include file instead of a macro variable, but I'm not going to post it unless it actually addresses the problem you are trying to solve.

I'd be interested in finding out the true nature of the problem you feel that you've solved and how obtaining the files you described solves it.

Art
Longduration
Calcite | Level 5
Apologies if my problem wasn't clear. Hopefully this is better:

----------------------------------------------------------------------------------------------------------
Original file that is to be filtered by column "TYPE" using the variables in TEST.DATABASE:

TEST.ORIGINAL

Column1 Column2 TYPE Column4
x x A x
x x G x
x x B x
x x D x
x x C x

----------------------------------------------------------------------------------------------------------
File with variables that are used to filter the column "TYPE" in TEST.ORIGINAL:

TEST.DATABASE

Variables
A,
B,
C,

----------------------------------------------------------------------------------------------------------
Resulting in a file with observations only where "TYPE" contains the variables specified earlier:

TEST.FILTERED

Column1, Column2, TYPE, Column3
x,x,A,x
x,x,B,x
x,x,C,x

----------------------------------------------------------------------------------------------------------

The following code solves this except it produces an individual file (TEST.A
TEST.B and TEST.C for each variable and not a single file as I had originally wanted (DATA.FILTERED):

----------------------------------------------------------------------------------------------------------

PROC SQL
SELECT COUNT (VARIABLES)
INTO :n
FROM TEST.DATABASE;
QUIT;

%MACRO FORUM;

%DO i=1 %TO &n;
DATA TEST.&&VARIABLES&i;
SET TEST.ORIGINAL;
IF TYPE="&&VARIABLES&i";
RUN;

%END;
%MEND;
%FORUM;

----------------------------------------------------------------------------------------------------------

Art, if you have alternative solution I would still appreciate reading it.

Thanks
art297
Opal | Level 21
Longduration,

First, a question about test.database. What does it really look like? From your example, it might have three records which contain 3 values separated by commas, of it might have three records, each with a value and, possibly, a comma. Not clear.

From your original post, I had thought that test.database looked like:
[pre]
data test.database;
informat variables $10.;
input variables &;
datalines;
A B C
E
G H
;
run;

In order to provide a solution, we have to know for sure what that file actually looks like.

Art
> Apologies if my problem wasn't clear. Hopefully this
> is better:
>
> ------------------------------------------------------
> ----------------------------------------------------
> Original file that is to be filtered by column "TYPE"
> using the variables in TEST.DATABASE:
>
> TEST.ORIGINAL
>
> Column1 Column2 TYPE Column4
> x x A x
> x x G x
> x x B x
> x x D x
> x x C x
>
> ------------------------------------------------------
> ----------------------------------------------------
> File with variables that are used to filter the
> column "TYPE" in TEST.ORIGINAL:
>
> TEST.DATABASE
>
> Variables
> A,
> B,
> C,
>
> ------------------------------------------------------
> ----------------------------------------------------
> Resulting in a file with observations only where
> "TYPE" contains the variables specified earlier:
>
> TEST.FILTERED
>
> Column1, Column2, TYPE, Column3
> x,x,A,x
> x,x,B,x
> x,x,C,x
>
> ------------------------------------------------------
> ----------------------------------------------------
>
> The following code solves this except it produces an
> individual file (TEST.A
> TEST.B and TEST.C for each variable and not a single
> file as I had originally wanted (DATA.FILTERED):
>
> ------------------------------------------------------
> ----------------------------------------------------
>
> PROC SQL
> SELECT COUNT (VARIABLES)
> INTO :n
> FROM TEST.DATABASE;
> QUIT;
>
> %MACRO FORUM;
>
> %DO i=1 %TO &n;
> DATA TEST.&&VARIABLES&i;
> SET TEST.ORIGINAL;
> IF TYPE="&&VARIABLES&i";
> RUN;
>
> %END;
> %MEND;
> %FORUM;
>
> ------------------------------------------------------
> ----------------------------------------------------
>
> Art, if you have alternative solution I would still
> appreciate reading it.
>
> Thanks
Longduration
Calcite | Level 5
The file does actually look exactly as I've posted. It has the column name (Variables), with values separated by a comma. The number of values in this column can be changed (A,B,C,D etc) but they remain separated by a comma :

e.g

Variables
A,
B,
C,

It is simply just these lines, in order for a person to put what values they would want to filter in this external file without having to go into the code to specify them.
art297
Opal | Level 21
Longduration,

It's still not crystal clear, but I think answering one question will clarify it. Which of the following two examples best describes the contents of your test.database file:

(a) one character variable with the name Variables that has three records that have the values:
A,
B,
C,

(b) one character variable with the name Variables that has one record that has a value like:
A, B, C,


(c) one character variable with the name Variables that has more than one record that has values like:
A, B, C,
D,
E, F,

or (d) something else and, if so, what?

Art
art297
Opal | Level 21
Longduration,

I lied! Not on purpose but, when I looked at my solution, I realized that what I need to know is what the file test.original looks like.

I know it has a variable in it called Variables, but do the records in the file most closely approximate:

(a) one value, e.g.:
A
B
C

(b) one or more values, e.g.:
A B C
D
E F

or (c) something else and, if so, what?

Art
art297
Opal | Level 21
Then any of the suggestions would have worked. My original, and the following, will work for either a or b. My example uses b:
[pre]
libname test "c:\art";

*create sample test.original file;
data test.original;
informat variables $10.;
input variables &;
cards;
A
B C
D
E F G
H
;
run;

data test.database;
input variables $;
cards;
C,
D,
F,
;
run;

* use a data step to write the necessary SAS code, using
the compress function to delete the unnecessary commas;
filename name_chk temp;
data _null_;
set test.database end=eof;
file name_chk;
variables=compress(variables,",");
if _n_ eq 1 then text =
cat("where variables contains ",quote(strip(VARIABLES)));
else text=
cat("or variables contains ",quote(strip(VARIABLES)));
put text;
if eof then put ";";
run;

* select the desired records;
data test.FILTERED;
set test.original;
%include name_chk / source2;;
run;
[/pre]

Art

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 2367 views
  • 0 likes
  • 5 in conversation