BookmarkSubscribeRSS Feed
BWDAWG
Obsidian | Level 7

Hi all,

 

I am attempting to randomly select 5 ALL and 10 CML cancer patients from a list from each institution from an excel file. I have attached the excel file. 


This is the sas code i have so far that is not working:

 

data p3.new3;
set new3;
run;

data tempp;
set new3;
if inst_num= cancer institute 1;

data temp1;
set tempp;

if leu_type= Acute Lymphocytic Leukemia (ALL);
x= ranuni(0);
output;

proc sort data temp1;
by x;
run;

data one;
set temp1 (obs=2);
run;

 

can anyone see what I am doing wrong? Also trying to do the same thing using Macros which I really have no idea how to do.

14 REPLIES 14
PaigeMiller
Diamond | Level 26

What results are you getting? Show us. What is wrong? Explain!!

 

Show us the SASLOG too.

--
Paige Miller
BWDAWG
Obsidian | Level 7

Ok sorry about that, here are the errors in the log:

 

16 data temp1;
17 set tempp;
18
19 if leu_type= Acute Lymphocytic Leukemia (ALL);
-----------
388
76
ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

20 x= ranuni(0);
21 output;
22

NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.TEMP1 may be incomplete. When this step was stopped there were 0
observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds

 

23 proc sort data temp1;
-----
73
ERROR 73-322: Expecting an =.
24 by x;
25 run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.05 seconds
cpu time 0.00 seconds

 

Reeza
Super User

if inst_num= cancer institute 1;

data temp1;
set tempp;

if leu_type= Acute Lymphocytic Leukemia (ALL);

 

These are wrong. 

 

To test for a character comparison you need to enclose the value in quotes and it needs to be an exact match, ie cancer institute 1, Cancer Institute 1 and Cancer institute 1 are all different values. Or you upcase them all to make the comparison. 

 

if upcase(inst_num)= "CANCER INSTITUTE 1";

@BWDAWG wrote:

Hi all,

 

I am attempting to randomly select 5 ALL and 10 CML cancer patients from a list from each institution from an excel file. I have attached the excel file. 


This is the sas code i have so far that is not working:

 

data p3.new3;
set new3;
run;

data tempp;
set new3;
if inst_num= cancer institute 1;

data temp1;
set tempp;

if leu_type= Acute Lymphocytic Leukemia (ALL);
x= ranuni(0);
output;

proc sort data temp1;
by x;
run;

data one;
set temp1 (obs=2);
run;

 

can anyone see what I am doing wrong? Also trying to do the same thing using Macros which I really have no idea how to do.


 

BWDAWG
Obsidian | Level 7

Thank you Reeza, that was very helpful. I am having one other issue. When I try to print my data, it only prints the data from Cancer institute 1 and leaves out cancer institute 2-5. Do you see anything else that I am doing wrong?

 

 

 

data tempp;
set buy;
if upcase(inst_num)= "CANCER INSTITUTE 1" ;

data temp1;
set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=temp1;
by x;
run;

data one;
set temp1 (obs=5);
run;

data temp2;
set tempp;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=temp2;
by x;
run;

data two;
set temp2 (obs=10);
run;

data temp3;
set tempp;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=temp3;
by x;
run;

data three;
set temp3 (obs=20);
run;

data temp4;
set tempp;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=temp4;
by x;
run;

data four;
set temp4 (obs=5);
run;

data new1;
set one two three four;
run;

data tempa;
set proj3;
if upcase(inst_num)= "CANCER INSTITUE 2";

data tempb;
set tempa;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=tempb;
by x;
run;

data one;
set tempb (obs=5);
run;

data tempc;
set tempa;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=tempc;
by x;
run;

data two;
set tempc (obs=10);
run;

data tempd;
set tempa;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=tempd;
by x;
run;

data three;
set tempd (obs=20);
run;

data tempe;
set tempa;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=tempe;
by x;
run;

data four;
set tempe (obs=5);
run;

data new2;
set one two three four;
run;

************************************************
********************************
*********************;

data tempp;
set proj3;
if upcase(inst_num)= "CANCER INSTITUE 3";

data temp1;
set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=temp1;
by x;
run;

data one;
set temp1 (obs=5);
run;

data temp2;
set tempp;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=temp2;
by x;
run;

data two;
set temp2 (obs=10);
run;

data temp3;
set tempp;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=temp3;
by x;
run;

data three;
set temp3 (obs=20);
run;

data temp4;
set tempp;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=temp4;
by x;
run;

data four;
set temp4 (obs=5);
run;

data new3;
set one two three four;
run;


*****************;

data tempp;
set proj3;
if upcase(inst_num)= "CANCER INSTITUE 3";

data temp1;
set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=temp1;
by x;
run;

data one;
set temp1 (obs=5);
run;

data temp2;
set tempp;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=temp2;
by x;
run;

data two;
set temp2 (obs=10);
run;

data temp3;
set tempp;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=temp3;
by x;
run;

data three;
set temp3 (obs=20);
run;

data temp4;
set tempp;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=temp4;
by x;
run;

data four;
set temp4 (obs=5);
run;

data new4;
set one two three four;
run;

*********************************;

data tempp;
set proj3;
if upcase(inst_num)= "CANCER INSTITUE 5";

data temp1;
set tempp;

if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
x= ranuni(0);
output;

proc sort data=temp1;
by x;
run;

data one;
set temp1 (obs=5);
run;

data temp2;
set tempp;
if upcase(leu_type)= "ACUTE MYELOID LEUKEMIA (AML)";
x= ranuni(0);
output;

proc sort data=temp2;
by x;
run;

data two;
set temp2 (obs=10);
run;

data temp3;
set tempp;
if upcase(leu_type)= "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)";
x=ranuni(0);
output;

proc sort data=temp3;
by x;
run;

data three;
set temp3 (obs=20);
run;

data temp4;
set tempp;
if upcase(leu_type)= "CHRONIC MYELOID LEUKEMIA (CML)";
x=ranuni(0);
output;

proc sort data=temp4;
by x;
run;

data four;
set temp4 (obs=5);
run;

data new5;
set one two three four;
run;

data new;
set new1 new2 new3 new4 new5;
run;

proc print data=new;
title 'Tabulation of selected records';
run;

PaigeMiller
Diamond | Level 26

Thank you Reeza, that was very helpful. I am having one other issue. When I try to print my data, it only prints the data from Cancer institute 1 and leaves out cancer institute 2-5. Do you see anything else that I am doing wrong?


Are there errors in the SASLOG? Are there data sets created with zero observations?

--
Paige Miller
BWDAWG
Obsidian | Level 7

Hey paigemiller,

 

I do not get any errors in the SASLOG, however, starting with cancer institute 2 and on, everything created has 0 observations, I've looked through and can't figure it out.

PaigeMiller
Diamond | Level 26
Show us the relevant parts of the SAS Log where the 0 observation data sets are being created.
--
Paige Miller
BWDAWG
Obsidian | Level 7

This is the first part of the log where zero observations begin to occur:

 

77
78 data tempa;
79 set proj3;
80 if upcase(inst_num)= "CANCER INSTITUE 2";
81

NOTE: There were 4000 observations read from the data set WORK.PROJ3.
NOTE: The data set WORK.TEMPA has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds


82 data tempb;
83 set tempa;
84
85 if upcase(leu_type)= "ACUTE LYMPHOCYTIC LEUKEMIA (ALL)";
86 x= ranuni(0);
87 output;
88

NOTE: There were 0 observations read from the data set WORK.TEMPA.
NOTE: The data set WORK.TEMPB has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.03 seconds
cpu time 0.03 seconds


89 proc sort data=tempb;
90 by x;
91 run;

NOTE: Input data set is empty.
NOTE: The data set WORK.TEMPB has 0 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds


92
93 data one;
94 set tempb (obs=5);
95 run;

NOTE: There were 0 observations read from the data set WORK.TEMPB.
NOTE: The data set WORK.ONE has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.00 seconds

PaigeMiller
Diamond | Level 26

@BWDAWG wrote:

This is the first part of the log where zero observations begin to occur:

 

77
78 data tempa;
79 set proj3;
80 if upcase(inst_num)= "CANCER INSTITUE 2";
81

NOTE: There were 4000 observations read from the data set WORK.PROJ3.
NOTE: The data set WORK.TEMPA has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds

 

This is a mistake a lot of people make. It appears that you have NOT actually looked at your data to see if there are any observations where inst_num is equal to "CANCER INSTITUE 2" (without the quotes).

 

There are none. That's why you are getting this.

 

What is the solution? Look at your data! Let me repeat that. LOOK AT YOUR DATA! What are the values that inst_num actually has??? Could it be that you have misspelled one of the values (looks like it to me) or you have spaces in the code where there are none in the data? All of these problems you are having are easily fixed by looking at the data, and looking at your code, and making sure they match.

--
Paige Miller
BWDAWG
Obsidian | Level 7

Can't believe I missed that. I couldn't figure it out. Thanks for taking the time to look through it and help me out.

Reeza
Super User

I think we've mentioned this before, the methodology implemented here will work but is inefficient. You could use PROC SURVEYSELECT or do all in a single step with a BY group. Is there a particular reason you're staying with this approach? 

 

 

 

 

BWDAWG
Obsidian | Level 7

Well I was originally trying to do this using macros but I just couldn't figure it out at all. I am just a beginner so this was just the easiest way I knew to do it. 

Reeza
Super User

I couldn't get this working and don't have the time to keep working on it today but maybe someone else can help. It looks like you're creating a random selection from the data. If that's correct this is pretty close to what you actually need:

 

*Import raw data;
proc import out=RawData datafile='/folders/myfolders/Cancerinstdata.xlsx' 
dbms=xlsx replace; 
run;

*filter out records of interest - whole file so not really necessary unless you uploaded only a sample;
data temp;
    set rawData;

    if upcase(inst_num) in ("CANCER INSTITUTE 1", "CANCER INSTITUTE 2", "CANCER INSTITUTE 3", "CANCER INSTITUTE 4", "CANCER INSTITUTE 5") ;
    if upcase(leu_type) in ("ACUTE LYMPHOCYTIC LEUKEMIA (ALL)", "ACUTE MYELOID LEUKEMIA (AML)",
                            "CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)", "CHRONIC MYELOID LEUKEMIA (CML)");
  
  inst_num = upcase(inst_num);
  leu_type=upcase(leu_type);
run;

proc sort data=temp;
by inst_num leu_type;
run;



*create sample sizes data;
data sampleSizes;
infile cards dlm=',' dsd;
input _Nsize_ leu_type $50.;
format leu_type $50.;
cards;
5, ACUTE LYMPHOCYTIC LEUKEMIA (ALL)
10,   ACUTE MYELOID LEUKEMIA (AML)
20, CHRONIC LYMPHOCUTIC LEUKEMIA (CLL)
5, CHRONIC MYELOID LEUKEMIA (CML)
;;;;
run;

*Add in institutions so you have a record for every disease/ institution;
data sampleSizes1;
    set sampleSizes;
    do i=1 to 5;
        inst_num = cat('CANCER INSTITUTE ', I);
        output;
    end;
    drop i;
run;

*sort;
proc sort data=sampleSizes1;
by inst_num leu_type;
run;

proc surveyselect data=temp /*input data*/
                    method=srs /*method of sampling*/
                    out=mySamples  /*name of output data*/
                    sampSize=sampleSizes1 /*sampSizes data set*/
                    seed=100; /*random number seed to ensure the same results each time*/
strata inst_num leu_type; /*stratification levels for sample */
run;
Reeza
Super User
Also, here's a link to a series of SAS videos that covers basic topics, if you prefer to learn that way:
https://video.sas.com/category/videos/sas-analytics-u

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 3445 views
  • 4 likes
  • 3 in conversation