BookmarkSubscribeRSS Feed
jennxxness
Calcite | Level 5

I am trying to figure out the best way to program this.

I have 1000 cases that are identified by a ID number.

I have 4000 controls (matched case:control by 1:4) that have their own unique ID number. Each control has another variable called case_ID and this is where the ID number of the respective case is included. For each case, the case_ID is denoted as "." (missing)

 

I am trying to perform a subgroup analysis where I only look at a group of cases (those defined as CAA="No Involvement") and their respective controls. I have removed the irrelevant cases from my dataset with the following:

keepit = .;
if case=0 then keepit=1;
if case=1 and CAA="No Involvement" then keepit=1;
where keepit=1;

Is there a way to program such that I only keep the controls where their respective case_ID is a case that exists in the dataset?

 

Thanks!

3 REPLIES 3
PaigeMiller
Diamond | Level 26

I don't understand your problem enough to make a substantive comment about solving the problem.

 

I do understand your SAS code, and the WHERE statement does not work there. WHERE can only work on variables that are present when the data step begins, which is usually the variables in the data set(s) named in a SET or MERGE statement.

 

In place of the WHERE statement, you can use a subsetting IF statement

 

if keepit=1;

 

--
Paige Miller
Astounding
PROC Star

While posting a sample of the data might help clarify the question, it sounds like you need just one statement:

if (case=0) or (case=1 and CAA="No Involvement";

At least that would be equivalent to the four statements you are using now.  If there is a need to find matches vs. nonmatches, there may be more to it. but your post doesn't show enough of the situation to comment further.

Quentin
Super User

From your description, it sounds like you might have 1,000 cases and 4,000 controls as rows in one dataset that has 5,000 rows.  Is that right?

 

Could you show a sample of of your input data (just ~10 records, with ID, Case, Case_ID, CAA) and the desired output from that sample?

 

If I'm understanding what you want, I would do it in three steps:

 

1. Select the cases of interest (those with CAA="No Involvement") and output them to a dataset.

2. Select the controls for those cases (you could do this by joining/merging the cases of interest to the original data, matching the ID for the cases to the Case_ID of the controls.

3. Stack together the selected cases and controls.

 

It could probably be done with one step, but for such small data, I'd probably do it in multiple steps.  So if there are 111 cases with CAA="No Involvement" the first step would output 111 records.  The second step would output 444 records. And stacking them together would give 555 records.

 

 

 

 

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1013 views
  • 0 likes
  • 4 in conversation