BookmarkSubscribeRSS Feed
Mdormond
Calcite | Level 5

I have a data set that contains cases and matched controls (most cases are matched 1:4 with controls). Each case-control group is identified by having the same ID number following with "X" for case and "A" "B" "C" "D" for controls. Example (GI001X GI001A GI001B GI001C). I would like to make a new dataset that only contains giardia cases and the matching controls. Is there a way to subset my data based on ID number? Basically I want to keep observations where "Giardia"=1 and "Case"=1 (this combination denotes giardia cases) as well as the matching controls in which "Case"=0. The tricky part is, not all the ID numbers are the same length- they range from 2-4 numbers followed the X or A,B,C,D.

13 REPLIES 13
LinusH
Tourmaline | Level 20

Seems you just have to sub-string out the id part, and then inner join on that.

Data never sleeps
Mdormond
Calcite | Level 5

How would I go about doing that?

Reeza
Super User

Using substring and length functions.

I'm assuming Giardia and Case are different variables?

If you provide better sample data you get better answers, but not guarantees  

Mdormond
Calcite | Level 5

Thanks for your response, yes "Giardia" and "Case" are different variables. There are four main exposures of interest in my dataset (Giardia, Amebias, Cryptosporidium, and Cyclospora) in addition to the matched (1:4) controls for each. I would like to write a code to match all the case variables to corresponding controls and then make a dataset that only contains the Giardia cases and the matching controls. Is this more specific?

LinusH
Tourmaline | Level 20

I think I got it from the start. But the syntax for this is straightforward, and you should be able to find a solution by search the on-line documentation, papers, other forum threads, you name it...

Data never sleeps
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post some test data and required output, maybe something like:

data have;
  group="GI001X"; output;
  group="GI001A"; output;
  group="GI001B"; output;
  group="AI001X"; output;
  group="AI001A"; output;
run;

data giarida amebias;
  set have;
  select(substr(group,1,3));
    when ("GI0") output giarida;
    when ("AI0") output amebias;
    otherwise;
  end;
run;

Mdormond
Calcite | Level 5

Great, I will do this tonight. Will the substring command correctly select even though some ID's are 3 characters long and some are 4?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You need to think of a string as an array of characters, e.g:

{1} {2} {3} {4} {5} {6}

G   I     0   0   1   X

G   I     0   0   1   A

So if we use substring(array 1 to array 3), both of these strings will be GI0 as 4-6 is dropped, hence they will match.

LinusH
Tourmaline | Level 20

You can substring from the "end", see on-line doc for syntax.

Data never sleeps
Mdormond
Calcite | Level 5

I've used the substring option as follows:

match_id= substr (ID, 3,5)

which gives me just the numeric portion of a variable as follows:

Match_ID:

100

100

100

101

101

101

102

102

102

Now I need to run a descriptive analysis to get the characteristics of the cases and the matched controls. How do I run an analysis on controls matched to Giardia cases (the Giardia value for these observations will be 0). Can I make a new data set selecting only Giardia cases and the matched controls using the newly generated Match_ID variable? Something along the lines of:

Data New;

Set Old;

Where Giardia=1 and Match_ID=_____ ;(somehow denoting matched cases)

run;

Mdormond
Calcite | Level 5

II think I actually want to reword the data:

1. Creat a variable "match id" which is the identification number without the trailing letter. I can't use the substring option with this because all the identification numbers are different lengths. I basically want to "drop" the last character in each ID.

2. create a dataset with only giardia cases abd the matched controls.

Reeza
Super User

1. Substr(variable, 1, length(variable)-1);

Mdormond
Calcite | Level 5

Great! How would I go about selecting cases and matched controls based on the two variables?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1470 views
  • 0 likes
  • 4 in conversation