Hi again!
I can't seem to wrap my head around creating a proportion in a long format of my data. I will provide a portion of my data below:
input STUDYNO:32. PID:BEST. Day_1826:DATE9. Day_InitialIW:DATE9. TRICHOT_1YR:BEST12. Personal_consumption_expenditure:BEST12. Month:32. Interval:$5. TimeFrame:$20. ExpenseType:$20. Value:32.;
format PID BEST. Day_1826 DATE9. Day_InitialIW DATE9. TRICHOT_1YR BEST12. Personal_consumption_expenditure BEST12.;
label STUDYNO="Sample size: 55,144,41,185,191, Total:616" PID="SUBJID#" Day_1826="Day1826" Day_InitialIW="IWInitial" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123";
datalines;
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr House 3164.5029543
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Consume 820.0784735
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Health 332.71026714
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Insure .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Loan 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Tax .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Educa .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Gift 16262.542769
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Enter 209.70802449
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Recre 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr FinPlan .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Legala 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr IncomeTL 61368.08592
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Trans 3334.881462
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr AlcSpend 72.87460203
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr House 3075.6491125
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Consume 808.7741655
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Health 316.74626874
What I want to do is create a proportion of the PostResYr Value for each ExpenseType over the PreResYr Value for each expense type. I feel that I am on the right track but cant think of a way to complete this. Ultimately, i want to do a stacked bar chart with the x axis being the expense type, the bars being the sum of the proportions and the different segments being the Trichot_1YR.
The code that I was trying to do was the following:
data Proportion3;
set Proportion2;
if Value = 0 then Value = 1;
run;
%macro Val(Exp);
data Proportion3;
set Proportion3;
If TimeFrame = 'PreResYr' then PreVal = Value;
If TimeFrame = 'PostResYr' then PostVal = Value;
if ExpenseType = '&Exp' then Proportion = PostVal/PreVal ;
run;
%mend;
%Val(House);
%Val(Consume);
%Val(Durable);
%Val(Trans);
%Val(Health);
%Val(Insure);
%Val(Loan);
%Val(Tax);
%Val(Educa);
%Val(Gift);
%Val(Enter);
%Val(Recre);
%Val(FinPlan);
%Val(Legala);
%Val(AlcSpend);
%Val(IncomeTL);
However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.
I have managed to create it the long way by creating a new variable for each ExpenseType such that PostResYr_House/PreResYr_House = PropHouse
However, I cannot put each of the variables in the X-axis of my stacked bar chart.
Any help, resources, or direction would be appreciated!
My thoughts as of now are to make two different datasets for Pre and Post and then merge them for the proportion maybe using Proc sql as it seems easier to join?
Let me know if you need clarification or any additional information.
However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.
Use PROC TRANSPOSE to put them on the same row. Then this becomes a trivial task.
1. Sort by variables that uniquely identify each group/row
2. Transpose to a Wide format so that Pre/Post are on the same line
3. Calculate Ratio
proc sort data=have;
by studyNo PID TRICHOT_1YR ExpenseTYpe timeframe;
run;
proc transpose data=have out=wide;
id timeframe;
by studyNo PID TRICHOT_1YR ExpenseTYpe;
var value;
run;
data want;
set wide;
Ratio = postResYr /PreResYr;
run;
For your code, macro variables do not resolve in single quotes, you need double quotes. You'd also likely need a RETAIN somewhere to keep values across rows.
@joebacon wrote:
Hi again!
I can't seem to wrap my head around creating a proportion in a long format of my data. I will provide a portion of my data below:
input STUDYNO:32. PID:BEST. Day_1826:DATE9. Day_InitialIW:DATE9. TRICHOT_1YR:BEST12. Personal_consumption_expenditure:BEST12. Month:32. Interval:$5. TimeFrame:$20. ExpenseType:$20. Value:32.; format PID BEST. Day_1826 DATE9. Day_InitialIW DATE9. TRICHOT_1YR BEST12. Personal_consumption_expenditure BEST12.; label STUDYNO="Sample size: 55,144,41,185,191, Total:616" PID="SUBJID#" Day_1826="Day1826" Day_InitialIW="IWInitial" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123"; datalines; 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr House 3164.5029543 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Consume 820.0784735 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Durable . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Health 332.71026714 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Insure . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Loan 0 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Tax . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Educa . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Gift 16262.542769 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Enter 209.70802449 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Recre 0 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr FinPlan . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Legala 0 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr IncomeTL 61368.08592 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Trans 3334.881462 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr AlcSpend 72.87460203 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr House 3075.6491125 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Consume 808.7741655 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Durable . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Health 316.74626874
What I want to do is create a proportion of the PostResYr Value for each ExpenseType over the PreResYr Value for each expense type. I feel that I am on the right track but cant think of a way to complete this. Ultimately, i want to do a stacked bar chart with the x axis being the expense type, the bars being the sum of the proportions and the different segments being the Trichot_1YR.
The code that I was trying to do was the following:
data Proportion3; set Proportion2; if Value = 0 then Value = 1; run; %macro Val(Exp); data Proportion3; set Proportion3; If TimeFrame = 'PreResYr' then PreVal = Value; If TimeFrame = 'PostResYr' then PostVal = Value; if ExpenseType = '&Exp' then Proportion = PostVal/PreVal ; run; %mend; %Val(House); %Val(Consume); %Val(Durable); %Val(Trans); %Val(Health); %Val(Insure); %Val(Loan); %Val(Tax); %Val(Educa); %Val(Gift); %Val(Enter); %Val(Recre); %Val(FinPlan); %Val(Legala); %Val(AlcSpend); %Val(IncomeTL);
However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.
I have managed to create it the long way by creating a new variable for each ExpenseType such that PostResYr_House/PreResYr_House = PropHouse
However, I cannot put each of the variables in the X-axis of my stacked bar chart.
Any help, resources, or direction would be appreciated!
My thoughts as of now are to make two different datasets for Pre and Post and then merge them for the proportion maybe using Proc sql as it seems easier to join?
Let me know if you need clarification or any additional information.
What you want is not clear to me, and given that there are a dozen people who usually respond 10x faster than I can, I'm guessing it's not clear to others. You really should post an example of what the desired end state is. That said, here's my poke at what you want using two approaches that hopefully gets you on the right track. But if I'm way off, my apologies.
data have;
input STUDYNO:32. PID:BEST. Day_1826:DATE9. Day_InitialIW:DATE9. TRICHOT_1YR:BEST12. Personal_consumption_expenditure:BEST12. Month:32. Interval:$5. TimeFrame:$20. ExpenseType:$20. Value:32.;
format PID BEST. Day_1826 DATE9. Day_InitialIW DATE9. TRICHOT_1YR BEST12. Personal_consumption_expenditure BEST12.;
label STUDYNO="Sample size: 55,144,41,185,191, Total:616" PID="SUBJID#" Day_1826="Day1826" Day_InitialIW="IWInitial" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123";
datalines;
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr House 3164.5029543
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Consume 820.0784735
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Health 332.71026714
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr House 3075.6491125
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Consume 808.7741655
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Health 316.74626874
;
/* Method 1 */
proc sql;
CREATE TABLE want1 AS
SELECT *,
max((TimeFrame='PostResYr')*value)/max((TimeFrame='PreResYr')*value) AS PostOverPre format percent8.2
FROM have
GROUP BY studyno, pid, expensetype
ORDER BY studyno, pid, expensetype, TimeFrame='PostResYr';
quit;
/* Method 2 */
proc sort data = have;
by studyno pid expensetype descending TimeFrame;
run;
data want2;
retain pre_value;
set have;
by studyno pid expensetype descending TimeFrame;
if first.expensetype then do;
pre_value = value;
PostOverPre = .;
end;
else do;
PostOverPre = value / pre_value;
end;
if last.expensetype then output;
run;
However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.
Use PROC TRANSPOSE to put them on the same row. Then this becomes a trivial task.
1. Sort by variables that uniquely identify each group/row
2. Transpose to a Wide format so that Pre/Post are on the same line
3. Calculate Ratio
proc sort data=have;
by studyNo PID TRICHOT_1YR ExpenseTYpe timeframe;
run;
proc transpose data=have out=wide;
id timeframe;
by studyNo PID TRICHOT_1YR ExpenseTYpe;
var value;
run;
data want;
set wide;
Ratio = postResYr /PreResYr;
run;
For your code, macro variables do not resolve in single quotes, you need double quotes. You'd also likely need a RETAIN somewhere to keep values across rows.
@joebacon wrote:
Hi again!
I can't seem to wrap my head around creating a proportion in a long format of my data. I will provide a portion of my data below:
input STUDYNO:32. PID:BEST. Day_1826:DATE9. Day_InitialIW:DATE9. TRICHOT_1YR:BEST12. Personal_consumption_expenditure:BEST12. Month:32. Interval:$5. TimeFrame:$20. ExpenseType:$20. Value:32.; format PID BEST. Day_1826 DATE9. Day_InitialIW DATE9. TRICHOT_1YR BEST12. Personal_consumption_expenditure BEST12.; label STUDYNO="Sample size: 55,144,41,185,191, Total:616" PID="SUBJID#" Day_1826="Day1826" Day_InitialIW="IWInitial" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123"; datalines; 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr House 3164.5029543 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Consume 820.0784735 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Durable . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Health 332.71026714 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Insure . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Loan 0 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Tax . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Educa . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Gift 16262.542769 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Enter 209.70802449 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Recre 0 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr FinPlan . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Legala 0 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr IncomeTL 61368.08592 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Trans 3334.881462 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr AlcSpend 72.87460203 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr House 3075.6491125 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Consume 808.7741655 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Durable . 2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Health 316.74626874
What I want to do is create a proportion of the PostResYr Value for each ExpenseType over the PreResYr Value for each expense type. I feel that I am on the right track but cant think of a way to complete this. Ultimately, i want to do a stacked bar chart with the x axis being the expense type, the bars being the sum of the proportions and the different segments being the Trichot_1YR.
The code that I was trying to do was the following:
data Proportion3; set Proportion2; if Value = 0 then Value = 1; run; %macro Val(Exp); data Proportion3; set Proportion3; If TimeFrame = 'PreResYr' then PreVal = Value; If TimeFrame = 'PostResYr' then PostVal = Value; if ExpenseType = '&Exp' then Proportion = PostVal/PreVal ; run; %mend; %Val(House); %Val(Consume); %Val(Durable); %Val(Trans); %Val(Health); %Val(Insure); %Val(Loan); %Val(Tax); %Val(Educa); %Val(Gift); %Val(Enter); %Val(Recre); %Val(FinPlan); %Val(Legala); %Val(AlcSpend); %Val(IncomeTL);
However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.
I have managed to create it the long way by creating a new variable for each ExpenseType such that PostResYr_House/PreResYr_House = PropHouse
However, I cannot put each of the variables in the X-axis of my stacked bar chart.
Any help, resources, or direction would be appreciated!
My thoughts as of now are to make two different datasets for Pre and Post and then merge them for the proportion maybe using Proc sql as it seems easier to join?
Let me know if you need clarification or any additional information.
As always, big thank you Reeza!
I didn't know Proc Transpose could function in this fashion. That's a super simple fix to my problem!
Will fix my macro and add a Retain Statement!
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.