Showing results for

- Home
- /
- Programming
- /
- Programming
- /
- How to filter a dataset with an IF condition

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 07-15-2018 01:00 PM
(4084 views)

Hi all,

I am attempting to randomly select 5 ALL and 10 CML cancer patients from a list from each institution from an excel file. I have attached the excel file.

This is the sas code i have so far that is not working:

data p3.new3;

set new3;

run;

data tempp;

set new3;

if inst_num= cancer institute 1;

data temp1;

set tempp;

if leu_type= Acute Lymphocytic Leukemia (ALL);

x= ranuni(0);

output;

proc sort data temp1;

by x;

run;

data one;

set temp1 (obs=2);

run;

can anyone see what I am doing wrong? Also trying to do the same thing using Macros which I really have no idea how to do.

14 REPLIES 14

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

What results are you getting? Show us. What is wrong? Explain!!

Show us the SASLOG too.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Ok sorry about that, here are the errors in the log:

16 data temp1;

17 set tempp;

18

19 if leu_type= Acute Lymphocytic Leukemia (ALL);

-----------

388

76

ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

20 x= ranuni(0);

21 output;

22

NOTE: The SAS System stopped processing this step because of errors.

WARNING: The data set WORK.TEMP1 may be incomplete. When this step was stopped there were 0

observations and 5 variables.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.03 seconds

23 proc sort data temp1;

-----

73

ERROR 73-322: Expecting an =.

24 by x;

25 run;

NOTE: The SAS System stopped processing this step because of errors.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.05 seconds

cpu time 0.00 seconds

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

if inst_num= cancer institute 1;

data temp1;

set tempp;

if leu_type= Acute Lymphocytic Leukemia (ALL);

These are wrong.

To test for a character comparison you need to enclose the value in quotes and it needs to be an exact match, ie cancer institute 1, Cancer Institute 1 and Cancer institute 1 are all different values. Or you upcase them all to make the comparison.

`if upcase(inst_num)= "CANCER INSTITUTE 1";`

@BWDAWG wrote:

Hi all,

I am attempting to randomly select 5 ALL and 10 CML cancer patients from a list from each institution from an excel file. I have attached the excel file.

This is the sas code i have so far that is not working:

data p3.new3;

set new3;

run;data tempp;

set new3;

if inst_num= cancer institute 1;data temp1;

set tempp;if leu_type= Acute Lymphocytic Leukemia (ALL);

x= ranuni(0);

output;proc sort data temp1;

by x;

run;data one;

set temp1 (obs=2);

run;

can anyone see what I am doing wrong? Also trying to do the same thing using Macros which I really have no idea how to do.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you Reeza, that was very helpful. I am having one other issue. When I try to print my data, it only prints the data from Cancer institute 1 and leaves out cancer institute 2-5. Do you see anything else that I am doing wrong?

data tempp;

set buy;

if upcase(inst_num)= "CANCER INSTITUTE 1" ;

data temp1;

set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";

x= ranuni(0);

output;

proc sort data=temp1;

by x;

run;

data one;

set temp1 (obs=5);

run;

data temp2;

set tempp;

if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";

x= ranuni(0);

output;

proc sort data=temp2;

by x;

run;

data two;

set temp2 (obs=10);

run;

data temp3;

set tempp;

if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";

x=ranuni(0);

output;

proc sort data=temp3;

by x;

run;

data three;

set temp3 (obs=20);

run;

data temp4;

set tempp;

if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";

x=ranuni(0);

output;

proc sort data=temp4;

by x;

run;

data four;

set temp4 (obs=5);

run;

data new1;

set one two three four;

run;

data tempa;

set proj3;

if upcase(inst_num)= "CANCER INSTITUE 2";

data tempb;

set tempa;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";

x= ranuni(0);

output;

proc sort data=tempb;

by x;

run;

data one;

set tempb (obs=5);

run;

data tempc;

set tempa;

if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";

x= ranuni(0);

output;

proc sort data=tempc;

by x;

run;

data two;

set tempc (obs=10);

run;

data tempd;

set tempa;

if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";

x=ranuni(0);

output;

proc sort data=tempd;

by x;

run;

data three;

set tempd (obs=20);

run;

data tempe;

set tempa;

if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";

x=ranuni(0);

output;

proc sort data=tempe;

by x;

run;

data four;

set tempe (obs=5);

run;

data new2;

set one two three four;

run;

************************************************

********************************

*********************;

data tempp;

set proj3;

if upcase(inst_num)= "CANCER INSTITUE 3";

data temp1;

set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";

x= ranuni(0);

output;

proc sort data=temp1;

by x;

run;

data one;

set temp1 (obs=5);

run;

data temp2;

set tempp;

if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";

x= ranuni(0);

output;

proc sort data=temp2;

by x;

run;

data two;

set temp2 (obs=10);

run;

data temp3;

set tempp;

if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";

x=ranuni(0);

output;

proc sort data=temp3;

by x;

run;

data three;

set temp3 (obs=20);

run;

data temp4;

set tempp;

if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";

x=ranuni(0);

output;

proc sort data=temp4;

by x;

run;

data four;

set temp4 (obs=5);

run;

data new3;

set one two three four;

run;

*****************;

data tempp;

set proj3;

if upcase(inst_num)= "CANCER INSTITUE 3";

data temp1;

set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";

x= ranuni(0);

output;

proc sort data=temp1;

by x;

run;

data one;

set temp1 (obs=5);

run;

data temp2;

set tempp;

if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";

x= ranuni(0);

output;

proc sort data=temp2;

by x;

run;

data two;

set temp2 (obs=10);

run;

data temp3;

set tempp;

if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";

x=ranuni(0);

output;

proc sort data=temp3;

by x;

run;

data three;

set temp3 (obs=20);

run;

data temp4;

set tempp;

if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";

x=ranuni(0);

output;

proc sort data=temp4;

by x;

run;

data four;

set temp4 (obs=5);

run;

data new4;

set one two three four;

run;

*********************************;

data tempp;

set proj3;

if upcase(inst_num)= "CANCER INSTITUE 5";

data temp1;

set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";

x= ranuni(0);

output;

proc sort data=temp1;

by x;

run;

data one;

set temp1 (obs=5);

run;

data temp2;

set tempp;

if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";

x= ranuni(0);

output;

proc sort data=temp2;

by x;

run;

data two;

set temp2 (obs=10);

run;

set tempp;

if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";

x=ranuni(0);

output;

proc sort data=temp3;

by x;

run;

data three;

set temp3 (obs=20);

run;

data temp4;

set tempp;

if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";

x=ranuni(0);

output;

proc sort data=temp4;

by x;

run;

data four;

set temp4 (obs=5);

run;

data new5;

set one two three four;

run;

data new;

set new1 new2 new3 new4 new5;

run;

proc print data=new;

title 'Tabulation of selected records';

run;

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Thank you Reeza, that was very helpful. I am having one other issue. When I try to print my data, it only prints the data from Cancer institute 1 and leaves out cancer institute 2-5. Do you see anything else that I am doing wrong?

Are there errors in the SASLOG? Are there data sets created with zero observations?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hey paigemiller,

I do not get any errors in the SASLOG, however, starting with cancer institute 2 and on, everything created has 0 observations, I've looked through and can't figure it out.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Show us the relevant parts of the SAS Log where the 0 observation data sets are being created.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

This is the first part of the log where zero observations begin to occur:

77

78 data tempa;

79 set proj3;

80 if upcase(inst_num)= "CANCER INSTITUE 2";

81

NOTE: There were 4000 observations read from the data set WORK.PROJ3.

NOTE: The data set WORK.TEMPA has 0 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.03 seconds

82 data tempb;

83 set tempa;

84

85 if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";

86 x= ranuni(0);

87 output;

88

NOTE: There were 0 observations read from the data set WORK.TEMPA.

NOTE: The data set WORK.TEMPB has 0 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.03 seconds

cpu time 0.03 seconds

89 proc sort data=tempb;

90 by x;

91 run;

NOTE: Input data set is empty.

NOTE: The data set WORK.TEMPB has 0 observations and 4 variables.

NOTE: PROCEDURE SORT used (Total process time):

real time 0.01 seconds

cpu time 0.01 seconds

92

93 data one;

94 set tempb (obs=5);

95 run;

NOTE: There were 0 observations read from the data set WORK.TEMPB.

NOTE: The data set WORK.ONE has 0 observations and 4 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.00 seconds

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

@BWDAWG wrote:

This is the first part of the log where zero observations begin to occur:

77

78 data tempa;

79 set proj3;

80 if upcase(inst_num)= "CANCER INSTITUE 2";

81NOTE: There were 4000 observations read from the data set WORK.PROJ3.

NOTE: The data set WORK.TEMPA has 0 observations and 3 variables.

NOTE: DATA statement used (Total process time):

real time 0.02 seconds

cpu time 0.03 seconds

This is a mistake a lot of people make. It appears that you have NOT actually looked at your data to see if there are any observations where inst_num is equal to "CANCER INSTITUE 2" (without the quotes).

There are none. That's why you are getting this.

What is the solution? Look at your data! Let me repeat that. LOOK AT YOUR DATA! What are the values that inst_num actually has??? Could it be that you have misspelled one of the values (looks like it to me) or you have spaces in the code where there are none in the data? All of these problems you are having are easily fixed by looking at the data, and looking at your code, and making sure they match.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I couldn't get this working and don't have the time to keep working on it today but maybe someone else can help. It looks like you're creating a random selection from the data. If that's correct this is pretty close to what you actually need:

```
*Import raw data;
proc import out=RawData datafile='/folders/myfolders/Cancerinstdata.xlsx'
dbms=xlsx replace;
run;
*filter out records of interest - whole file so not really necessary unless you uploaded only a sample;
data temp;
set rawData;
if upcase(inst_num) in ("CANCER INSTITUTE 1", "CANCER INSTITUTE 2", "CANCER INSTITUTE 3", "CANCER INSTITUTE 4", "CANCER INSTITUTE 5") ;
if upcase(leu_type) in ("ACUTE LYMPHOCYTIC LEUKEMIA (ALL)", "ACUTE MYELOID LEUKEMIA (AML)",
"CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)", "CHRONIC MYELOID LEUKEMIA (CML)");
inst_num = upcase(inst_num);
leu_type=upcase(leu_type);
run;
proc sort data=temp;
by inst_num leu_type;
run;
*create sample sizes data;
data sampleSizes;
infile cards dlm=',' dsd;
input _Nsize_ leu_type $50.;
format leu_type $50.;
cards;
5, ACUTE LYMPHOCYTIC LEUKEMIA (ALL)
10, ACUTE MYELOID LEUKEMIA (AML)
20, CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)
5, CHRONIC MYELOID LEUKEMIA (CML)
;;;;
run;
*Add in institutions so you have a record for every disease/ institution;
data sampleSizes1;
set sampleSizes;
do i=1 to 5;
inst_num = cat('CANCER INSTITUTE ', I);
output;
end;
drop i;
run;
*sort;
proc sort data=sampleSizes1;
by inst_num leu_type;
run;
proc surveyselect data=temp /*input data*/
method=srs /*method of sampling*/
out=mySamples /*name of output data*/
sampSize=sampleSizes1 /*sampSizes data set*/
seed=100; /*random number seed to ensure the same results each time*/
strata inst_num leu_type; /*stratification levels for sample */
run;
```

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Also, here's a link to a series of SAS videos that covers basic topics, if you prefer to learn that way:

https://video.sas.com/category/videos/sas-analytics-u

https://video.sas.com/category/videos/sas-analytics-u

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. **Registration is now open through August 30th**. Visit the SAS Hackathon homepage.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.