Dear all!
I have the following problem:
I want to do a proc freq with two variables (var1*var2) but then I would like to divide the frequency counts by a third variable (var3).
Something like:
proc freq data = sas;
tables (var1*var2)/var3;
run;
Var1 and var3 are numerical/continuous, var2 is cathegorical.
The example above is not acceptable SAS syntax and I wonder if anyone have any suggestions on how to solve this?
Thank you on beforehand!
Best regards,
Karin Gunnarsson
In the suggestion above by @sbxkoenk , if you make var3a have the value
var3a=1/var3;
then the WEIGHT statement in PROC FREQ ought to give you the division by var3.
proc freq data = yourdatasetname;
tables var1*var2/ missing list;
weight var3a;
run;
Strange question , but you could probably make creative use of the WEIGHT statement, no?
proc freq data = sashelp.class;
tables age*sex / missing list;
run;
proc freq data = sashelp.class;
tables age*sex / missing list;
weight height;
run;
Koen
Thanks for your quick reply!
I couldn't really make it work out by your solution though.
The reason I want to divide is to get a frequency proportion as the individuals in my datset have different follow up and I want to calculate proportions related to the number of individuals still at risk.
Var1 is a time cathegory variable, var2 is having a disease of no. Var3 is numbers still at risk related to that time interval. The goal is to make a side to side histogram presenting proportions in two groups.
Hello,
Can't you provide us with some data (data step with datalines).
A HAVE dataset and possibly a WANT dataset as well?
Here's some code for your comparative histograms :
Comparative histograms: Panel and overlay histograms in SAS
By Rick Wicklin on The DO Loop March 9, 2016
https://blogs.sas.com/content/iml/2016/03/09/comparative-panel-overlay-histograms-sas.html
Koen
Hello again,
Here is an example of datalines:
data test_data;
input runnumber timediff ssc $ sar;
datalines;
1 3 patient 9
2 4 comparator 8
3 2 comparator 6
4 4 patient 5
5 5 patient 5
6 7 comparator 4
7 8 patient 3
8 2 comparator 3
9 4 comparator 3
;
run;
Here is how I have coded the timediff variable in my program, it calculates how many patients got cancer in these timeintervals from an index date (called date_2).
data exit4;
set exit3;
if cancerdat ne "." then do timedif =((cancerdat-date_2)/365.25);/*defining the new varible timedif in years*/
end;
run;
And then I have calculated time cathegories for the timedif variable (now called timediff):
data exit5;
set exit4;
if timedif = 0 then timediff = "index";
if timedif GT 0 and timedif LE 2 then timediff = "0-2";
if timedif GT 2 and timedif LE 4 then timediff = "2-4";
if timedif GT 4 and timedif LE 6 then timediff = "4-6";
if timedif GT 6 and timedif LE 8 then timediff = "6-8";
if timedif GT 8 and timedif LE 10 then timediff = "8-10";
if timedif GT 10 then timediff = ">10";
if timedif LT 0 and timedif GE -2 then timediff = "-2-0";
if timedif LT -2 and timedif GE -4 then timediff = "-2-4";
if timedif LT -4 and timedif GE -6 then timediff = "-4-6";
if timedif LT -6 and timedif GE -8 then timediff = "-6-8";
if timedif LT -8 and timedif GE -10 then timediff = "-8-10";
if timedif LT -10 and timedif GE -12 then timediff = "-10-12";
if timedif LT -12 and timedif GE -14 then timediff = "-12-14";
if timedif LT -14 and timedif GE -20 then timediff = "-14-20";
if timedif LT -20 then timediff = ">-20";
if cancerdat NE "." then output;
run;
@KarinGun wrote:
And then I have calculated time cathegories for the timedif variable (now called timediff):
data exit5;
set exit4;
if timedif = 0 then timediff = "index";
if timedif GT 0 and timedif LE 2 then timediff = "0-2";
if timedif GT 2 and timedif LE 4 then timediff = "2-4";
if timedif GT 4 and timedif LE 6 then timediff = "4-6";
if timedif GT 6 and timedif LE 8 then timediff = "6-8";
if timedif GT 8 and timedif LE 10 then timediff = "8-10";
if timedif GT 10 then timediff = ">10";
if timedif LT 0 and timedif GE -2 then timediff = "-2-0";
if timedif LT -2 and timedif GE -4 then timediff = "-2-4";
if timedif LT -4 and timedif GE -6 then timediff = "-4-6";
if timedif LT -6 and timedif GE -8 then timediff = "-6-8";
if timedif LT -8 and timedif GE -10 then timediff = "-8-10";
if timedif LT -10 and timedif GE -12 then timediff = "-10-12";
if timedif LT -12 and timedif GE -14 then timediff = "-12-14";
if timedif LT -14 and timedif GE -20 then timediff = "-14-20";
if timedif LT -20 then timediff = ">-20";
if cancerdat NE "." then output;
run;
This will never work. You should be getting errors in the log.
Instead of IF-THEN to create categories, use a custom format.
proc format;
value timef
0='index'
0<-2='0-2'
2<-4='2-4'
/* I'm lazy you type the rest */
run;
proc freq data=have;
format timedif timef.;
/* The rest of your PROC FREQ goes here */
run;
In the suggestion above by @sbxkoenk , if you make var3a have the value
var3a=1/var3;
then the WEIGHT statement in PROC FREQ ought to give you the division by var3.
proc freq data = yourdatasetname;
tables var1*var2/ missing list;
weight var3a;
run;
This seemed to work! Thanks!
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.