BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fones
Calcite | Level 5

I'm bootstrapping samples (with replacement) for a matched case control data using proc survey select. The sampling unit is the id for each matched pair and am outputting all the replicates using the outhits option. So the output data set is the same size as the input data set.

 

My goal is to assign different id values for each duplicated matched pair to prevent such case-control pairs from being dropped off during analysis. Here is an sample input and output data set, and the modification am trying to achieve. 

 

I'm thinking by group processing but when the number of hits is more than 2, then I can't seem to wrap my mind around it.

 

Any ideas are appreciated. 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @fones and welcome to the SAS Support Communities!

 

Sorry to see that you haven't received a reply yet. Maybe it's because many users here are hesitant or not allowed to open MS office attachments (security concerns). I don't even have Word or Excel installed on my SAS workstation, but I managed to reconstruct your data from the preview of the .docx file:

data have;
input Replicate id Case NumberHits;
cards;
1 10 0 1 
1 10 1 1 
1 11 0 3 
1 11 1 3 
1 11 0 3
1 11 1 3 
1 11 0 3 
1 11 1 3 
1 14 0 2 
1 14 1 2 
1 14 0 2 
1 14 1 2 
2  8 0 1 
2  8 1 1 
2 10 0 2 
2 10 1 2 
2 10 0 2 
2 10 1 2
;

(A DATA step as above is the preferred way to provide test data. Instructions on how to create such a DATA step from an existing SAS dataset can be found here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...)

 

Yes, BY-group processing can be used to create the new IDs:

data want(drop=seqno /* id rename=(newid=id) */);
set have;
by replicate id;
if first.id then seqno=0;
else seqno+1;
newid=ifn(seqno>1, id*1e4+int(seqno/2), id);
run;

Result:

                           Number
Replicate    id    Case     Hits      newid

    1        10      0        1          10
    1        10      1        1          10
    1        11      0        3          11
    1        11      1        3          11
    1        11      0        3      110001
    1        11      1        3      110001
    1        11      0        3      110002
    1        11      1        3      110002
    1        14      0        2          14
    1        14      1        2          14
    1        14      0        2      140001
    1        14      1        2      140001
    2         8      0        1           8
    2         8      1        1           8
    2        10      0        2          10
    2        10      1        2          10
    2        10      0        2      100001
    2        10      1        2      100001

Depending on your max(NumberHits) and max(id) you may want to adapt the multiplier (1e4=10000) so that it is large enough to avoid overlaps between the new IDs of different IDs and small enough to avoid numeric precision issues (if NEWID gets larger than about 9.0e15). Of course, you can simplify the definition of NEWID if those IDs which are currently copied to NEWID may be multiplied, too:

newid=id*1e4+int(seqno/2);

If you really want to replace the original ID values, just activate the code that I've commented out.

 

 

 

View solution in original post

3 REPLIES 3
FreelanceReinh
Jade | Level 19

Hello @fones and welcome to the SAS Support Communities!

 

Sorry to see that you haven't received a reply yet. Maybe it's because many users here are hesitant or not allowed to open MS office attachments (security concerns). I don't even have Word or Excel installed on my SAS workstation, but I managed to reconstruct your data from the preview of the .docx file:

data have;
input Replicate id Case NumberHits;
cards;
1 10 0 1 
1 10 1 1 
1 11 0 3 
1 11 1 3 
1 11 0 3
1 11 1 3 
1 11 0 3 
1 11 1 3 
1 14 0 2 
1 14 1 2 
1 14 0 2 
1 14 1 2 
2  8 0 1 
2  8 1 1 
2 10 0 2 
2 10 1 2 
2 10 0 2 
2 10 1 2
;

(A DATA step as above is the preferred way to provide test data. Instructions on how to create such a DATA step from an existing SAS dataset can be found here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...)

 

Yes, BY-group processing can be used to create the new IDs:

data want(drop=seqno /* id rename=(newid=id) */);
set have;
by replicate id;
if first.id then seqno=0;
else seqno+1;
newid=ifn(seqno>1, id*1e4+int(seqno/2), id);
run;

Result:

                           Number
Replicate    id    Case     Hits      newid

    1        10      0        1          10
    1        10      1        1          10
    1        11      0        3          11
    1        11      1        3          11
    1        11      0        3      110001
    1        11      1        3      110001
    1        11      0        3      110002
    1        11      1        3      110002
    1        14      0        2          14
    1        14      1        2          14
    1        14      0        2      140001
    1        14      1        2      140001
    2         8      0        1           8
    2         8      1        1           8
    2        10      0        2          10
    2        10      1        2          10
    2        10      0        2      100001
    2        10      1        2      100001

Depending on your max(NumberHits) and max(id) you may want to adapt the multiplier (1e4=10000) so that it is large enough to avoid overlaps between the new IDs of different IDs and small enough to avoid numeric precision issues (if NEWID gets larger than about 9.0e15). Of course, you can simplify the definition of NEWID if those IDs which are currently copied to NEWID may be multiplied, too:

newid=id*1e4+int(seqno/2);

If you really want to replace the original ID values, just activate the code that I've commented out.

 

 

 

fones
Calcite | Level 5

Just what I wanted to avoid overlaps! Thanks a lot!

 

PS: Will use a SAS data step next time I have a data example!

fones
Calcite | Level 5

And just to add on for other users out there: The data have to be sorted by the BY variables before the BY- group processing!

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1047 views
  • 1 like
  • 2 in conversation