BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lady8506
Quartz | Level 8

Hello all,

 

I am about intermediate level with SAS programming and beginner level with SQL. I am having difficulty subsetting my dataset by certain criteria. The dataset contains patients who have had cancer either once or multiple times. However, if the patients FIRST cancer was not pancreatic, then they need to be deleted. This part I have done. The second step is to get rid of the remaining patients who are in the dataset with any type of cancer, but never had a first cancer listed as pancreatic cancer. One goal of this project is to take a look at patients who had pancreatic cancer as their first cancer, and for those who have had multiple cancers, find out what other types of cancers they got after their pancreatic cancer diagnosis.

 

One way I was trying to do this was by Patient_ID, but I am not aware of any DISTINCT equivalency statements in SAS and I don't know SQL that well. 


Below is the code I used to get rid of those whose first cancer was not pancreatic. It is also subset by histology type as I do not want patients with certain histological types of pancreatic cancer:

 

DATA pancreas_temp1; SET pancreas;
	IF Histologic_Type_ICD_O_3 not in (8150, 8151, 8152, 8153, 8155, 8240, 8241, 8246)
	AND
	Sequence_number IN ("One primary only", "1st of 2 or more primaries") 
	then DELETE;

	IF Primary_Site___labeled not =: 'C25' 
	AND 
	Sequence_number IN ("One primary only", "1st of 2 or more primaries") 
	then DELETE;
RUN;

Here is a sample of my dataset. I have sorted it by Patient_ID which is what revealed the problem - patients with no "Only one primary" or "1st of 2 or more primaries" are in this dataset:

Patient_ID

Sequence_number

Primary_site__labeled

Histologic_Type_ICD_O_3

16

2nd of 2 or more primaries

C20.9 - Rectum, NOS

8140

18

2nd of 2 or more primaries

C38.1-Anterior mediastinum

8801

22

2nd of 2 or more primaries

C42.1-Bone marrow

9732

22

3rd of 3 or more primaries

C50.9-Breast, NOS

8500

25

One primary only

C25.2-Tail of pancreas

8246

31

4th of 4 or more primaries

C50.4-Upper-outer quadrant of breast

8500

31

5th of 5 or more primaries

C34.1-Upper lobe, lung

8070

32

1st of 2 or more primaries

C25.0-Head of pancreas

8155

 

 

Here is what I want:

Patient_ID

Sequence_number

Primary_site__labeled

Histologic_Type_ICD_O_3

25

One primary only

C25.2-Tail of pancreas

8246

32

1st of 2 or more primaries

C25.0-Head of pancreas

8155

32

2nd of 2 or more primaries

C64.9-Kidney, NOS

8312

 

Patients 16, 18, 22, and 31 need to go because it is apparent that their first cancer is not listed so we can't tell if it was pancreatic or not. 

 

Any help is appreciated. Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

You can do it all in one go:

data HAVE;
  input ID $ SEQ &: $50. SITE  &: $50. TYPE;
 cards;
16  1st of 2 or more primaries  C20.9 - Rectum, NOS  8140
16  2nd of 2 or more primaries  C20.9 - Rectum, NOS  8140
18  1st of 2 or more primaries  C20.9 - Rectum, NOS  8140
18  2nd of 2 or more primaries  C38.1-Anterior mediastinum  8801
22  1st of 2 or more primaries  C20.9 - Rectum, NOS  8140
22  2nd of 2 or more primaries  C42.1-Bone marrow  9732
22  3rd of 3 or more primaries  C50.9-Breast, NOS  8500
25  One primary only  C25.2-Tail of pancreas  8246
31  4th of 4 or more primaries  C50.4-Upper-outer quadrant of breast  8500
31  5th of 5 or more primaries  C34.1-Upper lobe, lung  8070
32  1st of 2 or more primaries  C25.0-Head of pancreas  8155
32  2nd of 2 or more primaries  C64.9-Kidney, NOS  8312
run;

data WANT;
  retain KEEP;
  set HAVE;
  by ID;
  if first.ID then KEEP=( SEQ in:('One', '1st') and index(SITE ,'ancreas'));
  if KEEP;
run;
ID SEQ SITE
25 One primary only C25.2-Tail of pancreas
32 1st of 2 or more primaries C25.0-Head of pancreas
32 2nd of 2 or more primaries C64.9-Kidney, NOS

 

 

View solution in original post

4 REPLIES 4
Patrick
Opal | Level 21

@lady8506 

You write you're already able to select the patient_id's you want. Below sample code shows you two ways how you then can select all rows in a table matching a list of such patient_id's.

data have;
  do pat_id=1 to 100;
    do row=1 to 5;
      if ranuni(1)>0.2 then diag='Other';
      else diag='Panc';
      output;
    end;
  end;
run;
  
/**
  option 1: Using SAS Data Step 
**/
data pat_want(keep=pat_id);
  set have;
  by pat_id;
  /* selection criteria */
  if row=1 and diag='Panc';
run;

data want1;
  if _n_=1 then 
    do;
      /* lookup table with list of unique pat_id's matching selection criteria */
      dcl hash h1(dataset:'pat_want');
      h1.defineKey('pat_id');
      h1.defineDone();
    end;
  set have;
  /* select all pat_id's from have with match to lookup table */
  if h1.check()=0 then output;
run;

/*
  Option 2: SQL
*/
proc sql;
  create table want2 as
    select h.*
    from 
      have h
      inner join
      (
        /* list of unique pat_id's matching selection criteria */
        select distinct pat_id
        from have 
        where row=1 and diag='Panc'
      ) s
      on h.pat_id=s.pat_id
    ;
quit;
ChrisNZ
Tourmaline | Level 20

You can do it all in one go:

data HAVE;
  input ID $ SEQ &: $50. SITE  &: $50. TYPE;
 cards;
16  1st of 2 or more primaries  C20.9 - Rectum, NOS  8140
16  2nd of 2 or more primaries  C20.9 - Rectum, NOS  8140
18  1st of 2 or more primaries  C20.9 - Rectum, NOS  8140
18  2nd of 2 or more primaries  C38.1-Anterior mediastinum  8801
22  1st of 2 or more primaries  C20.9 - Rectum, NOS  8140
22  2nd of 2 or more primaries  C42.1-Bone marrow  9732
22  3rd of 3 or more primaries  C50.9-Breast, NOS  8500
25  One primary only  C25.2-Tail of pancreas  8246
31  4th of 4 or more primaries  C50.4-Upper-outer quadrant of breast  8500
31  5th of 5 or more primaries  C34.1-Upper lobe, lung  8070
32  1st of 2 or more primaries  C25.0-Head of pancreas  8155
32  2nd of 2 or more primaries  C64.9-Kidney, NOS  8312
run;

data WANT;
  retain KEEP;
  set HAVE;
  by ID;
  if first.ID then KEEP=( SEQ in:('One', '1st') and index(SITE ,'ancreas'));
  if KEEP;
run;
ID SEQ SITE
25 One primary only C25.2-Tail of pancreas
32 1st of 2 or more primaries C25.0-Head of pancreas
32 2nd of 2 or more primaries C64.9-Kidney, NOS

 

 

lady8506
Quartz | Level 8

This is fantastic! I am unfamiliar with some of the syntax you used, but I will figure it out. For now, it looks like it worked. My original dataset is over 1 million observations long and now, after implementing your code, it's only 6500+ long. That seems about right as another study was done drawing data from SEER (same place I got my data) and they ended up with only 3300+ pancreatic cancer patients, but they also weren't including subsequent cancers of other non-pancreatic types like I did. I just need to check it over and make sure before I accept as final solution.

ChrisNZ
Tourmaline | Level 20
Ask if you get stuck on some of the syntax. Note that the expression giving its value to KEEP is just a test. So it yields a Boolean result.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1266 views
  • 1 like
  • 3 in conversation