Dear all!
I have the following problem:
I want to do a proc freq with two variables (var1*var2) but then I would like to divide the frequency counts by a third variable (var3).
Something like:
proc freq data = sas;
tables (var1*var2)/var3;
run;
Var1 and var3 are numerical/continuous, var2 is cathegorical.
The example above is not acceptable SAS syntax and I wonder if anyone have any suggestions on how to solve this?
Thank you on beforehand!
Best regards,
Karin Gunnarsson
In the suggestion above by @sbxkoenk , if you make var3a have the value
var3a=1/var3;
then the WEIGHT statement in PROC FREQ ought to give you the division by var3.
proc freq data = yourdatasetname;
tables var1*var2/ missing list;
weight var3a;
run;
Strange question , but you could probably make creative use of the WEIGHT statement, no?
proc freq data = sashelp.class;
tables age*sex / missing list;
run;
proc freq data = sashelp.class;
tables age*sex / missing list;
weight height;
run;
Koen
Thanks for your quick reply!
I couldn't really make it work out by your solution though.
The reason I want to divide is to get a frequency proportion as the individuals in my datset have different follow up and I want to calculate proportions related to the number of individuals still at risk.
Var1 is a time cathegory variable, var2 is having a disease of no. Var3 is numbers still at risk related to that time interval. The goal is to make a side to side histogram presenting proportions in two groups.
Hello,
Can't you provide us with some data (data step with datalines).
A HAVE dataset and possibly a WANT dataset as well?
Here's some code for your comparative histograms :
Comparative histograms: Panel and overlay histograms in SAS
By Rick Wicklin on The DO Loop March 9, 2016
https://blogs.sas.com/content/iml/2016/03/09/comparative-panel-overlay-histograms-sas.html
Koen
Hello again,
Here is an example of datalines:
data test_data;
input runnumber timediff ssc $ sar;
datalines;
1 3 patient 9
2 4 comparator 8
3 2 comparator 6
4 4 patient 5
5 5 patient 5
6 7 comparator 4
7 8 patient 3
8 2 comparator 3
9 4 comparator 3
;
run;
Here is how I have coded the timediff variable in my program, it calculates how many patients got cancer in these timeintervals from an index date (called date_2).
data exit4;
set exit3;
if cancerdat ne "." then do timedif =((cancerdat-date_2)/365.25);/*defining the new varible timedif in years*/
end;
run;
And then I have calculated time cathegories for the timedif variable (now called timediff):
data exit5;
set exit4;
if timedif = 0 then timediff = "index";
if timedif GT 0 and timedif LE 2 then timediff = "0-2";
if timedif GT 2 and timedif LE 4 then timediff = "2-4";
if timedif GT 4 and timedif LE 6 then timediff = "4-6";
if timedif GT 6 and timedif LE 8 then timediff = "6-8";
if timedif GT 8 and timedif LE 10 then timediff = "8-10";
if timedif GT 10 then timediff = ">10";
if timedif LT 0 and timedif GE -2 then timediff = "-2-0";
if timedif LT -2 and timedif GE -4 then timediff = "-2-4";
if timedif LT -4 and timedif GE -6 then timediff = "-4-6";
if timedif LT -6 and timedif GE -8 then timediff = "-6-8";
if timedif LT -8 and timedif GE -10 then timediff = "-8-10";
if timedif LT -10 and timedif GE -12 then timediff = "-10-12";
if timedif LT -12 and timedif GE -14 then timediff = "-12-14";
if timedif LT -14 and timedif GE -20 then timediff = "-14-20";
if timedif LT -20 then timediff = ">-20";
if cancerdat NE "." then output;
run;
@KarinGun wrote:
And then I have calculated time cathegories for the timedif variable (now called timediff):
data exit5;
set exit4;
if timedif = 0 then timediff = "index";
if timedif GT 0 and timedif LE 2 then timediff = "0-2";
if timedif GT 2 and timedif LE 4 then timediff = "2-4";
if timedif GT 4 and timedif LE 6 then timediff = "4-6";
if timedif GT 6 and timedif LE 8 then timediff = "6-8";
if timedif GT 8 and timedif LE 10 then timediff = "8-10";
if timedif GT 10 then timediff = ">10";
if timedif LT 0 and timedif GE -2 then timediff = "-2-0";
if timedif LT -2 and timedif GE -4 then timediff = "-2-4";
if timedif LT -4 and timedif GE -6 then timediff = "-4-6";
if timedif LT -6 and timedif GE -8 then timediff = "-6-8";
if timedif LT -8 and timedif GE -10 then timediff = "-8-10";
if timedif LT -10 and timedif GE -12 then timediff = "-10-12";
if timedif LT -12 and timedif GE -14 then timediff = "-12-14";
if timedif LT -14 and timedif GE -20 then timediff = "-14-20";
if timedif LT -20 then timediff = ">-20";
if cancerdat NE "." then output;
run;
This will never work. You should be getting errors in the log.
Instead of IF-THEN to create categories, use a custom format.
proc format;
value timef
0='index'
0<-2='0-2'
2<-4='2-4'
/* I'm lazy you type the rest */
run;
proc freq data=have;
format timedif timef.;
/* The rest of your PROC FREQ goes here */
run;
In the suggestion above by @sbxkoenk , if you make var3a have the value
var3a=1/var3;
then the WEIGHT statement in PROC FREQ ought to give you the division by var3.
proc freq data = yourdatasetname;
tables var1*var2/ missing list;
weight var3a;
run;
This seemed to work! Thanks!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.