BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ak2011
Fluorite | Level 6

Hi,

Please, could someone help me with the SAS code to verify the count(frequency) of ca case ca cont pop cont for each unique/distinct id and idchem combination. I have  29 obs, 3 pollutants idchem 990005=cla_exp, 990021 = bio_exp and 210701 = amo_exp. I need to count the number of the lung variables associated with each unique id_idchem combination. For obs 2 (Osaa13-idchem 210701) has 1 lung ca case. Same with obs 4 so I expect SAS to count obs 2 & 4 as frequency of 1 since they have the same id_idchem combination. 

Thus, I would like the overall freq of unique id_idchem by lung for 210701 to be 2/8 for ca case (for example).

 

My SAS output Tables 1 & 2 are attached. The SAS code and log are found below.

 

1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 data mg1b; set mgp(obs=29);
 
NOTE: There were 29 observations read from the data set WORK.MGP.
NOTE: The data set WORK.MG1B has 29 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
 
 
74 proc print data=mg1b;
75 Title " Table 1: Exposure to 3 pollutants";
76
 
NOTE: There were 29 observations read from the data set WORK.MG1B.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.27 seconds
cpu time 0.27 seconds
 
 
77 proc freq data=mg1b;
78 tables idchem*lung;
79 Title" Table 2: Frequency: id_idchem vs case/control for 3 pollutants "; run;
NOTE: There were 29 observations read from the data set WORK.MG1B.
NOTE: PROCEDURE FREQ used (Total process time):
real time 0.21 seconds
cpu time 0.20 seconds
 
 
80
81 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
93
 
Thanks very much.
ak.

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

proc sort with NODUPKEY is useful, but depending on the size of the data set, can be expensive.   In this case, I'd suggest two proc freqs.  The first produces a data set FREQS of frequencies of IDCHEM*LUNG  (i.e. one obs for each combination with 2 additional variables COUNT and PERCENT).  Then submit the new dataset to a proc freq with the same tables specification.  This will yield counts of non-empty combinations:

 

proc freq data =mgp noprint;
  tables idchem * lung /out=freqs;
run;
proc freq data=freqs;
  tables idchem * lung / norow nocol;
run;

 

 

The advantage here is that proc freq doesn't need a sorted data set.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
ed_sas_member
Meteorite | Level 14

Hi @ak2011 

 

Happy new year!

 

I suggest that you run a PROC SORT with the NODUPKEY option before running the PROC FREQ:

 

proc sort data = env out = env_distinct nodupkey;
	by lung id_idchem;
run;
proc freq data = env_distinct;
	table lung * idchem / nocol norow nocol nopercent;
run;

Does that give you the expected results?

mkeintz
PROC Star

proc sort with NODUPKEY is useful, but depending on the size of the data set, can be expensive.   In this case, I'd suggest two proc freqs.  The first produces a data set FREQS of frequencies of IDCHEM*LUNG  (i.e. one obs for each combination with 2 additional variables COUNT and PERCENT).  Then submit the new dataset to a proc freq with the same tables specification.  This will yield counts of non-empty combinations:

 

proc freq data =mgp noprint;
  tables idchem * lung /out=freqs;
run;
proc freq data=freqs;
  tables idchem * lung / norow nocol;
run;

 

 

The advantage here is that proc freq doesn't need a sorted data set.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ak2011
Fluorite | Level 6
Thanks very much. It works!
ak.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 518 views
  • 0 likes
  • 3 in conversation