DATA Step, Macro, Functions and more

How to filter a dataset with an IF condition

Reply
Occasional Contributor
Posts: 13

How to filter a dataset with an IF condition

[ Edited ]

Hi all,

 

I am attempting to randomly select 5 ALL and 10 CML cancer patients from a list from each institution from an excel file. I have attached the excel file. 


This is the sas code i have so far that is not working:

 

data p3.new3;
set new3;
run;

data tempp;
set new3;
if inst_num= cancer institute 1;

data temp1;
set tempp;

if leu_type= Acute Lymphocytic Leukemia (ALL);
x= ranuni(0);
output;

proc sort data temp1;
by x;
run;

data one;
set temp1 (obs=2);
run;

 

can anyone see what I am doing wrong? Also trying to do the same thing using Macros which I really have no idea how to do.

Respected Advisor
Posts: 3,247

Re: sas coding problems

What results are you getting? Show us. What is wrong? Explain!!

 

Show us the SASLOG too.

--
Paige Miller
Occasional Contributor
Posts: 13

Re: sas coding problems

Posted in reply to PaigeMiller

Ok sorry about that, here are the errors in the log:

 

16 data temp1;
17 set tempp;
18
19 if leu_type= Acute Lymphocytic Leukemia (ALL);
-----------
388
76
ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

20 x= ranuni(0);
21 output;
22

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEMP1 may be incomplete. When this step was stopped there were 0
observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds

 

23 proc sort data temp1;
-----
73
ERROR 73-322: Expecting an =.
24 by x;
25 run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.05 seconds
cpu time 0.00 seconds

 

Super User
Posts: 23,928

Re: sas coding problems

if inst_num= cancer institute 1;

data temp1;
set tempp;

if leu_type= Acute Lymphocytic Leukemia (ALL);

 

These are wrong. 

 

To test for a character comparison you need to enclose the value in quotes and it needs to be an exact match, ie cancer institute 1, Cancer Institute 1 and Cancer institute 1 are all different values. Or you upcase them all to make the comparison. 

 

if upcase(inst_num)= "CANCER INSTITUTE 1";

@BWDAWG wrote:

Hi all,

 

I am attempting to randomly select 5 ALL and 10 CML cancer patients from a list from each institution from an excel file. I have attached the excel file. 


This is the sas code i have so far that is not working:

 

data p3.new3;
set new3;
run;

data tempp;
set new3;
if inst_num= cancer institute 1;

data temp1;
set tempp;

if leu_type= Acute Lymphocytic Leukemia (ALL);
x= ranuni(0);
output;

proc sort data temp1;
by x;
run;

data one;
set temp1 (obs=2);
run;

 

can anyone see what I am doing wrong? Also trying to do the same thing using Macros which I really have no idea how to do.


 

Occasional Contributor
Posts: 13

Re: sas coding problems

Thank you Reeza, that was very helpful. I am having one other issue. When I try to print my data, it only prints the data from Cancer institute 1 and leaves out cancer institute 2-5. Do you see anything else that I am doing wrong?

 

 

 

data tempp;
set buy;
if upcase(inst_num)= "CANCER INSTITUTE 1" ;

data temp1;
set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=temp1;
by x;
run;

data one;
set temp1 (obs=5);
run;

data temp2;
set tempp;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=temp2;
by x;
run;

data two;
set temp2 (obs=10);
run;

data temp3;
set tempp;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=temp3;
by x;
run;

data three;
set temp3 (obs=20);
run;

data temp4;
set tempp;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=temp4;
by x;
run;

data four;
set temp4 (obs=5);
run;

data new1;
set one two three four;
run;

data tempa;
set proj3;
if upcase(inst_num)= "CANCER INSTITUE 2";

data tempb;
set tempa;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=tempb;
by x;
run;

data one;
set tempb (obs=5);
run;

data tempc;
set tempa;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=tempc;
by x;
run;

data two;
set tempc (obs=10);
run;

data tempd;
set tempa;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=tempd;
by x;
run;

data three;
set tempd (obs=20);
run;

data tempe;
set tempa;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=tempe;
by x;
run;

data four;
set tempe (obs=5);
run;

data new2;
set one two three four;
run;

************************************************
********************************
*********************;

data tempp;
set proj3;
if upcase(inst_num)= "CANCER INSTITUE 3";

data temp1;
set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=temp1;
by x;
run;

data one;
set temp1 (obs=5);
run;

data temp2;
set tempp;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=temp2;
by x;
run;

data two;
set temp2 (obs=10);
run;

data temp3;
set tempp;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=temp3;
by x;
run;

data three;
set temp3 (obs=20);
run;

data temp4;
set tempp;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=temp4;
by x;
run;

data four;
set temp4 (obs=5);
run;

data new3;
set one two three four;
run;


*****************;

data tempp;
set proj3;
if upcase(inst_num)= "CANCER INSTITUE 3";

data temp1;
set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=temp1;
by x;
run;

data one;
set temp1 (obs=5);
run;

data temp2;
set tempp;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=temp2;
by x;
run;

data two;
set temp2 (obs=10);
run;

data temp3;
set tempp;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=temp3;
by x;
run;

data three;
set temp3 (obs=20);
run;

data temp4;
set tempp;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=temp4;
by x;
run;

data four;
set temp4 (obs=5);
run;

data new4;
set one two three four;
run;

*********************************;

data tempp;
set proj3;
if upcase(inst_num)= "CANCER INSTITUE 5";

data temp1;
set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=temp1;
by x;
run;

data one;
set temp1 (obs=5);
run;

data temp2;
set tempp;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=temp2;
by x;
run;

data two;
set temp2 (obs=10);
run;

data temp3;
set tempp;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=temp3;
by x;
run;

data three;
set temp3 (obs=20);
run;

data temp4;
set tempp;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=temp4;
by x;
run;

data four;
set temp4 (obs=5);
run;

data new5;
set one two three four;
run;

data new;
set new1 new2 new3 new4 new5;
run;

proc print data=new;
title 'Tabulation of selected records';
run;

Respected Advisor
Posts: 3,247

Re: sas coding problems

Thank you Reeza, that was very helpful. I am having one other issue. When I try to print my data, it only prints the data from Cancer institute 1 and leaves out cancer institute 2-5. Do you see anything else that I am doing wrong?


Are there errors in the SASLOG? Are there data sets created with zero observations?

--
Paige Miller
Occasional Contributor
Posts: 13

Re: sas coding problems

Posted in reply to PaigeMiller

Hey paigemiller,

 

I do not get any errors in the SASLOG, however, starting with cancer institute 2 and on, everything created has 0 observations, I've looked through and can't figure it out.

Respected Advisor
Posts: 3,247

Re: sas coding problems

Show us the relevant parts of the SAS Log where the 0 observation data sets are being created.
--
Paige Miller
Occasional Contributor
Posts: 13

Re: sas coding problems

Posted in reply to PaigeMiller

This is the first part of the log where zero observations begin to occur:

 

77
78 data tempa;
79 set proj3;
80 if upcase(inst_num)= "CANCER INSTITUE 2";
81

NOTE: There were 4000 observations read from the data set WORK.PROJ3.
NOTE: The data set WORK.TEMPA has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds


82 data tempb;
83 set tempa;
84
85 if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
86 x= ranuni(0);
87 output;
88

NOTE: There were 0 observations read from the data set WORK.TEMPA.
NOTE: The data set WORK.TEMPB has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds


89 proc sort data=tempb;
90 by x;
91 run;

NOTE: Input data set is empty.
NOTE: The data set WORK.TEMPB has 0 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


92
93 data one;
94 set tempb (obs=5);
95 run;

NOTE: There were 0 observations read from the data set WORK.TEMPB.
NOTE: The data set WORK.ONE has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds

Respected Advisor
Posts: 3,247

Re: sas coding problems

[ Edited ]

@BWDAWG wrote:

This is the first part of the log where zero observations begin to occur:

 

77
78 data tempa;
79 set proj3;
80 if upcase(inst_num)= "CANCER INSTITUE 2";
81

NOTE: There were 4000 observations read from the data set WORK.PROJ3.
NOTE: The data set WORK.TEMPA has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds

 

This is a mistake a lot of people make. It appears that you have NOT actually looked at your data to see if there are any observations where inst_num is equal to "CANCER INSTITUE 2" (without the quotes).

 

There are none. That's why you are getting this.

 

What is the solution? Look at your data! Let me repeat that. LOOK AT YOUR DATA! What are the values that inst_num actually has??? Could it be that you have misspelled one of the values (looks like it to me) or you have spaces in the code where there are none in the data? All of these problems you are having are easily fixed by looking at the data, and looking at your code, and making sure they match.

--
Paige Miller
Occasional Contributor
Posts: 13

Re: sas coding problems

Posted in reply to PaigeMiller

Can't believe I missed that. I couldn't figure it out. Thanks for taking the time to look through it and help me out.

Super User
Posts: 23,928

Re: sas coding problems

I think we've mentioned this before, the methodology implemented here will work but is inefficient. You could use PROC SURVEYSELECT or do all in a single step with a BY group. Is there a particular reason you're staying with this approach? 

 

 

 

 

Occasional Contributor
Posts: 13

Re: sas coding problems

[ Edited ]

Well I was originally trying to do this using macros but I just couldn't figure it out at all. I am just a beginner so this was just the easiest way I knew to do it. 

Super User
Posts: 23,928

Re: sas coding problems

I couldn't get this working and don't have the time to keep working on it today but maybe someone else can help. It looks like you're creating a random selection from the data. If that's correct this is pretty close to what you actually need:

 

*Import raw data;
proc import out=RawData datafile='/folders/myfolders/Cancerinstdata.xlsx' 
dbms=xlsx replace; 
run;

*filter out records of interest - whole file so not really necessary unless you uploaded only a sample;
data temp;
    set rawData;

    if upcase(inst_num) in ("CANCER INSTITUTE 1", "CANCER INSTITUTE 2", "CANCER INSTITUTE 3", "CANCER INSTITUTE 4", "CANCER INSTITUTE 5") ;
    if upcase(leu_type) in ("ACUTE LYMPHOCYTIC LEUKEMIA (ALL)", "ACUTE MYELOID LEUKEMIA (AML)",
                            "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)", "CHRONIC MYELOID LEUKEMIA (CML)");
  
  inst_num = upcase(inst_num);
  leu_type=upcase(leu_type);
run;

proc sort data=temp;
by inst_num leu_type;
run;



*create sample sizes data;
data sampleSizes;
infile cards dlm=',' dsd;
input _Nsize_ leu_type $50.;
format leu_type $50.;
cards;
5, ACUTE LYMPHOCYTIC LEUKEMIA (ALL)
10,   ACUTE MYELOID LEUKEMIA (AML)
20, CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)
5, CHRONIC MYELOID LEUKEMIA (CML)
;;;;
run;

*Add in institutions so you have a record for every disease/ institution;
data sampleSizes1;
    set sampleSizes;
    do i=1 to 5;
        inst_num = cat('CANCER INSTITUTE ', I);
        output;
    end;
    drop i;
run;

*sort;
proc sort data=sampleSizes1;
by inst_num leu_type;
run;

proc surveyselect data=temp /*input data*/
                    method=srs /*method of sampling*/
                    out=mySamples  /*name of output data*/
                    sampSize=sampleSizes1 /*sampSizes data set*/
                    seed=100; /*random number seed to ensure the same results each time*/
strata inst_num leu_type; /*stratification levels for sample */
run;
Super User
Posts: 23,928

Re: How to filter a dataset with an IF condition

Also, here's a link to a series of SAS videos that covers basic topics, if you prefer to learn that way:
https://video.sas.com/category/videos/sas-analytics-u
Ask a Question
Discussion stats
  • 14 replies
  • 126 views
  • 3 likes
  • 3 in conversation