BookmarkSubscribeRSS Feed
Greek
Obsidian | Level 7

Good morning, 

 

I am trying to create a matched case/control data set from a case (n=1,646,667) and Control (n=313,552). I have gone through the awesome article called "Simplified Matched Case-Control sampling using proc surveyselect" 

http://www2.sas.com/proceedings/sugi29/209-29.pdf

 

But I have been unable to make it work. 

 

I thought the sample size could be an issue, but the Case part (larger data set) runs fine only with a warning that my log is exceeds the size and needs to close, while the control part (smaller data set) crashes completely. The problems appear during the proc surveyselect part

 

I have listed my code below which is 99% similar to the one of the article.  

 

Does anybody have a suspicion of what is happening?  


data control;
length index $25;
set HDP_SCRH.c88090_mcc_control ;
Index=compress(cust_gendr_cd || age_band_cd || zip3 || rx_mms || bio2016 || tmc_band || risk_band || ccm_eng_2016 || ccm_id_2016
|| er_band || inp_band || fip_inn_Band || fip_oon_band || fop_inn_Band || fop_oon_band || prof_inn_Band || prof_oon_band);

length_=length(index);
run;

 

PROC FREQ DATA= Control NOPRINT;
TABLES INDEX/LIST MISSING OUT=CTRLCNT (/*KEEP=INDEX COUNT*/
RENAME=(COUNT=CTRLCNT));

 


data Case;
length index $25;
set HDP_SCRH.c88090_mcc_CASE;
Index=compress(cust_gendr_cd || age_band_cd || zip3 || rx_mms || bio2016 || tmc_band || risk_band || ccm_eng_2016 || ccm_id_2016
|| er_band || inp_band || fip_inn_Band || fip_oon_band || fop_inn_Band || fop_oon_band || prof_inn_Band || prof_oon_band);

length_=length(index);
run;

 

PROC FREQ DATA= case NOPRINT;
TABLES INDEX/LIST MISSING OUT=CASECNT (/*KEEP=INDEX COUNT*/
RENAME=(COUNT=CASECNT));


DATA ALLCOUNT;
MERGE CASECNT (IN=A) CTRLCNT (IN=B);
BY INDEX;
IF CASECNT > 0;
IF A AND NOT B THEN CTRLCNT = 0;
_NSIZE_ = MIN(CASECNT,CTRLCNT);
IF _NSIZE_ GT 0;
run;


PROC SQL;
CREATE TABLE ELIGIBLE_CONTROLS AS
SELECT *
FROM CASE
WHERE INDEX IN (SELECT INDEX FROM ALLCOUNT)
ORDER BY INDEX;
QUIT;


PROC SQL;
CREATE TABLE ELIGIBLE_CASES AS
SELECT *
FROM CONTROL
WHERE INDEX IN (SELECT INDEX FROM ALLCOUNT)
ORDER BY INDEX;
QUIT;


PROC SURVEYSELECT DATA =ELIGIBLE_CASES
SAMPSIZE = ALLCOUNT
METHOD = SRS
SEED=890
OUT=SELECTED_CASES;
STRATA INDEX;
RUN;


PROC SURVEYSELECT DATA =ELIGIBLE_CONTROLS
SAMPSIZE = ALLCOUNT
METHOD = SRS
SEED=134
OUT=SELECTED_CONTROLS;
STRATA INDEX;
RUN;

3 REPLIES 3
ballardw
Super User

@Greek wrote:

Good morning, 

 

I am trying to create a matched case/control data set from a case (n=1,646,667) and Control (n=313,552). I have gone through the awesome article called "Simplified Matched Case-Control sampling using proc surveyselect" 

http://www2.sas.com/proceedings/sugi29/209-29.pdf

 

But I have been unable to make it work. 

 

I thought the sample size could be an issue, but the Case part (larger data set) runs fine only with a warning that my log is exceeds the size and needs to close, while the control part (smaller data set) crashes completely. The problems appear during the proc surveyselect part

 

 


Please describe "crashes completely". Does SAS crash/quit working,is the output unexpected or does the code generate errors?

If there are errors then best would be to show the code and error messages from the LOG, otherwise we are reduced to guessing.

Best is to paste the code and errors from the log into a code box opened on the forum using the {I} menu icon in the message box header to preserve formatting of error diagnostic messages that the main message window would remove.

 

At this point I am going to guess that since you use the same SAMPSIZE data set for cases and controls that you may have requested more records from the control set for one or more strata.

 

I will say that the highlight data set names below seem a bit confusing to me as well:

PROC SQL;
CREATE TABLE ELIGIBLE_CONTROLS AS
SELECT *
FROM CASE
WHERE INDEX IN (SELECT INDEX FROM ALLCOUNT)
ORDER BY INDEX;
QUIT;


PROC SQL;
CREATE TABLE ELIGIBLE_CASES AS
SELECT *
FROM CONTROL
WHERE INDEX IN (SELECT INDEX FROM ALLCOUNT)
ORDER BY INDEX;
QUIT;

Note that in the article you site code is

 

PROC SQL;
CREATE TABLE WORK.ELIGIBLE_CONTROLS AS
SELECT *
FROM CTRL
WHERE INDEX IN (SELECT INDEX FROM ALLCOUNT);
Greek
Obsidian | Level 7

Thank you very much for your quick response! 

 

Many apologies, I tried posting a picture of the error message, but my company has a very conservative firewall that blocked it. There was no output, SAS EG just disconnected.

 

I got the following error:

 

"An error occurred executing the workspace job Matching. The server is disconnected.

 

I was able to find a solution to my problem here. The proc now works without issues

http://support.sas.com/kb/42/679.html

 

 

ballardw
Super User

@Greek wrote:

Thank you very much for your quick response! 

 

Many apologies, I tried posting a picture of the error message, but my company has a very conservative firewall that blocked it. There was no output, SAS EG just disconnected.

 

 


Logs are text, so copy text and paste text.

 

If the error is from the OS and not the log, which seems possible in this case, it may be that you are exceeding the output limits of your job. Having such large log entries may be an indication of other issues.

 

I suggest also using a subset of the SAMPSIZE data set so that you are only selecting for a couple of strata and examine the log to see why it is generating so much output.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1950 views
  • 1 like
  • 2 in conversation