Hi,
Please, could someone help me with the SAS code to verify the count(frequency) of ca case ca cont pop cont for each unique/distinct id and idchem combination. I have 29 obs, 3 pollutants idchem 990005=cla_exp, 990021 = bio_exp and 210701 = amo_exp. I need to count the number of the lung variables associated with each unique id_idchem combination. For obs 2 (Osaa13-idchem 210701) has 1 lung ca case. Same with obs 4 so I expect SAS to count obs 2 & 4 as frequency of 1 since they have the same id_idchem combination.
Thus, I would like the overall freq of unique id_idchem by lung for 210701 to be 2/8 for ca case (for example).
My SAS output Tables 1 & 2 are attached. The SAS code and log are found below.
proc sort with NODUPKEY is useful, but depending on the size of the data set, can be expensive. In this case, I'd suggest two proc freqs. The first produces a data set FREQS of frequencies of IDCHEM*LUNG (i.e. one obs for each combination with 2 additional variables COUNT and PERCENT). Then submit the new dataset to a proc freq with the same tables specification. This will yield counts of non-empty combinations:
proc freq data =mgp noprint;
tables idchem * lung /out=freqs;
run;
proc freq data=freqs;
tables idchem * lung / norow nocol;
run;
The advantage here is that proc freq doesn't need a sorted data set.
Hi @ak2011
Happy new year!
I suggest that you run a PROC SORT with the NODUPKEY option before running the PROC FREQ:
proc sort data = env out = env_distinct nodupkey;
by lung id_idchem;
run;
proc freq data = env_distinct;
table lung * idchem / nocol norow nocol nopercent;
run;
Does that give you the expected results?
proc sort with NODUPKEY is useful, but depending on the size of the data set, can be expensive. In this case, I'd suggest two proc freqs. The first produces a data set FREQS of frequencies of IDCHEM*LUNG (i.e. one obs for each combination with 2 additional variables COUNT and PERCENT). Then submit the new dataset to a proc freq with the same tables specification. This will yield counts of non-empty combinations:
proc freq data =mgp noprint;
tables idchem * lung /out=freqs;
run;
proc freq data=freqs;
tables idchem * lung / norow nocol;
run;
The advantage here is that proc freq doesn't need a sorted data set.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.