Help using Base SAS procedures

Matching observations based on ID number

Reply
Contributor
Posts: 25

Matching observations based on ID number

I have a data set that contains cases and matched controls (most cases are matched 1:4 with controls). Each case-control group is identified by having the same ID number following with "X" for case and "A" "B" "C" "D" for controls. Example (GI001X GI001A GI001B GI001C). I would like to make a new dataset that only contains giardia cases and the matching controls. Is there a way to subset my data based on ID number? Basically I want to keep observations where "Giardia"=1 and "Case"=1 (this combination denotes giardia cases) as well as the matching controls in which "Case"=0. The tricky part is, not all the ID numbers are the same length- they range from 2-4 numbers followed the X or A,B,C,D.

Super User
Posts: 5,256

Re: Matching observations based on ID number

Seems you just have to sub-string out the id part, and then inner join on that.

Data never sleeps
Contributor
Posts: 25

Re: Matching observations based on ID number

How would I go about doing that?

Super User
Posts: 17,829

Re: Matching observations based on ID number

Using substring and length functions.

I'm assuming Giardia and Case are different variables?

If you provide better sample data you get better answers, but not guarantees  

Contributor
Posts: 25

Re: Matching observations based on ID number

Thanks for your response, yes "Giardia" and "Case" are different variables. There are four main exposures of interest in my dataset (Giardia, Amebias, Cryptosporidium, and Cyclospora) in addition to the matched (1:4) controls for each. I would like to write a code to match all the case variables to corresponding controls and then make a dataset that only contains the Giardia cases and the matching controls. Is this more specific?

Super User
Posts: 5,256

Re: Matching observations based on ID number

I think I got it from the start. But the syntax for this is straightforward, and you should be able to find a solution by search the on-line documentation, papers, other forum threads, you name it...

Data never sleeps
Super User
Super User
Posts: 7,401

Re: Matching observations based on ID number

Post some test data and required output, maybe something like:

data have;
  group="GI001X"; output;
  group="GI001A"; output;
  group="GI001B"; output;
  group="AI001X"; output;
  group="AI001A"; output;
run;

data giarida amebias;
  set have;
  select(substr(group,1,3));
    when ("GI0") output giarida;
    when ("AI0") output amebias;
    otherwise;
  end;
run;

Contributor
Posts: 25

Re: Matching observations based on ID number

Great, I will do this tonight. Will the substring command correctly select even though some ID's are 3 characters long and some are 4?

Super User
Super User
Posts: 7,401

Re: Matching observations based on ID number

You need to think of a string as an array of characters, e.g:

{1} {2} {3} {4} {5} {6}

G   I     0   0   1   X

G   I     0   0   1   A

So if we use substring(array 1 to array 3), both of these strings will be GI0 as 4-6 is dropped, hence they will match.

Super User
Posts: 5,256

Re: Matching observations based on ID number

You can substring from the "end", see on-line doc for syntax.

Data never sleeps
Contributor
Posts: 25

Re: Matching observations based on ID number

I've used the substring option as follows:

match_id= substr (ID, 3,5)

which gives me just the numeric portion of a variable as follows:

Match_ID:

100

100

100

101

101

101

102

102

102

Now I need to run a descriptive analysis to get the characteristics of the cases and the matched controls. How do I run an analysis on controls matched to Giardia cases (the Giardia value for these observations will be 0). Can I make a new data set selecting only Giardia cases and the matched controls using the newly generated Match_ID variable? Something along the lines of:

Data New;

Set Old;

Where Giardia=1 and Match_ID=_____ ;(somehow denoting matched cases)

run;

Contributor
Posts: 25

Re: Matching observations based on ID number

II think I actually want to reword the data:

1. Creat a variable "match id" which is the identification number without the trailing letter. I can't use the substring option with this because all the identification numbers are different lengths. I basically want to "drop" the last character in each ID.

2. create a dataset with only giardia cases abd the matched controls.

Super User
Posts: 17,829

Re: Matching observations based on ID number

1. Substr(variable, 1, length(variable)-1);

Contributor
Posts: 25

Re: Matching observations based on ID number

Great! How would I go about selecting cases and matched controls based on the two variables?

Ask a Question
Discussion stats
  • 13 replies
  • 382 views
  • 0 likes
  • 4 in conversation