BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
joebacon
Pyrite | Level 9

Hi again!

 

I can't seem to wrap my head around creating a proportion in a long format of my data. I will provide a portion of my data below:

 

input STUDYNO:32. PID:BEST. Day_1826:DATE9. Day_InitialIW:DATE9. TRICHOT_1YR:BEST12. Personal_consumption_expenditure:BEST12. Month:32. Interval:$5. TimeFrame:$20. ExpenseType:$20. Value:32.;
  format PID BEST. Day_1826 DATE9. Day_InitialIW DATE9. TRICHOT_1YR BEST12. Personal_consumption_expenditure BEST12.;
  label STUDYNO="Sample size: 55,144,41,185,191, Total:616" PID="SUBJID#" Day_1826="Day1826" Day_InitialIW="IWInitial" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123";
datalines;
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr House 3164.5029543
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Consume 820.0784735
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Health 332.71026714
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Insure .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Loan 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Tax .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Educa .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Gift 16262.542769
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Enter 209.70802449
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Recre 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr FinPlan .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Legala 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr IncomeTL 61368.08592
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Trans 3334.881462
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr AlcSpend 72.87460203
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr House 3075.6491125
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Consume 808.7741655
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Health 316.74626874

What I want to do is create a proportion of the PostResYr Value for each ExpenseType over the PreResYr Value for each expense type. I feel that I am on the right track but cant think of a way to complete this. Ultimately, i want to do a stacked bar chart with the x axis being the expense type, the bars being the sum of the proportions and the different segments being the Trichot_1YR. 

 

The code that I was trying to do was the following:

 

data Proportion3;
set Proportion2;
if Value = 0 then Value = 1;
run;
%macro Val(Exp);
data Proportion3;
set Proportion3;
If TimeFrame = 'PreResYr' then PreVal = Value;
If TimeFrame = 'PostResYr' then PostVal = Value;
if ExpenseType = '&Exp' then Proportion = PostVal/PreVal ;
run;
%mend;
%Val(House);
%Val(Consume);
%Val(Durable);
%Val(Trans);
%Val(Health);
%Val(Insure);
%Val(Loan);
%Val(Tax);
%Val(Educa);
%Val(Gift);
%Val(Enter);
%Val(Recre);
%Val(FinPlan);
%Val(Legala);
%Val(AlcSpend);
%Val(IncomeTL);

However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.

 

I have managed to create it the long way by creating a new variable for each ExpenseType such that PostResYr_House/PreResYr_House = PropHouse

 

However, I cannot put each of the variables in the X-axis of my stacked bar chart.

 

Any help, resources, or direction would be appreciated!

 

My thoughts as of now are to make two different datasets for Pre and Post and then merge them for the proportion maybe using Proc sql as it seems easier to join?

 

Let me know if you need clarification or any additional information.

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.

 

 

Use PROC TRANSPOSE to put them on the same row. Then this becomes a trivial task.

 

1. Sort by variables that uniquely identify each group/row

2. Transpose to a Wide format so that Pre/Post are on the same line

3. Calculate Ratio

 


proc sort data=have;
by studyNo PID TRICHOT_1YR ExpenseTYpe timeframe;
run;

proc transpose data=have out=wide;
id timeframe;
by studyNo PID TRICHOT_1YR ExpenseTYpe;
var value;
run;

data want;
set wide;
Ratio = postResYr /PreResYr;
run;

For your code, macro variables do not resolve in single quotes, you need double quotes. You'd also likely need a RETAIN somewhere to keep values across rows. 

 


@joebacon wrote:

Hi again!

 

I can't seem to wrap my head around creating a proportion in a long format of my data. I will provide a portion of my data below:

 

input STUDYNO:32. PID:BEST. Day_1826:DATE9. Day_InitialIW:DATE9. TRICHOT_1YR:BEST12. Personal_consumption_expenditure:BEST12. Month:32. Interval:$5. TimeFrame:$20. ExpenseType:$20. Value:32.;
  format PID BEST. Day_1826 DATE9. Day_InitialIW DATE9. TRICHOT_1YR BEST12. Personal_consumption_expenditure BEST12.;
  label STUDYNO="Sample size: 55,144,41,185,191, Total:616" PID="SUBJID#" Day_1826="Day1826" Day_InitialIW="IWInitial" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123";
datalines;
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr House 3164.5029543
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Consume 820.0784735
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Health 332.71026714
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Insure .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Loan 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Tax .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Educa .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Gift 16262.542769
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Enter 209.70802449
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Recre 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr FinPlan .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Legala 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr IncomeTL 61368.08592
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Trans 3334.881462
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr AlcSpend 72.87460203
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr House 3075.6491125
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Consume 808.7741655
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Health 316.74626874

What I want to do is create a proportion of the PostResYr Value for each ExpenseType over the PreResYr Value for each expense type. I feel that I am on the right track but cant think of a way to complete this. Ultimately, i want to do a stacked bar chart with the x axis being the expense type, the bars being the sum of the proportions and the different segments being the Trichot_1YR. 

 

The code that I was trying to do was the following:

 

data Proportion3;
set Proportion2;
if Value = 0 then Value = 1;
run;
%macro Val(Exp);
data Proportion3;
set Proportion3;
If TimeFrame = 'PreResYr' then PreVal = Value;
If TimeFrame = 'PostResYr' then PostVal = Value;
if ExpenseType = '&Exp' then Proportion = PostVal/PreVal ;
run;
%mend;
%Val(House);
%Val(Consume);
%Val(Durable);
%Val(Trans);
%Val(Health);
%Val(Insure);
%Val(Loan);
%Val(Tax);
%Val(Educa);
%Val(Gift);
%Val(Enter);
%Val(Recre);
%Val(FinPlan);
%Val(Legala);
%Val(AlcSpend);
%Val(IncomeTL);

However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.

 

I have managed to create it the long way by creating a new variable for each ExpenseType such that PostResYr_House/PreResYr_House = PropHouse

 

However, I cannot put each of the variables in the X-axis of my stacked bar chart.

 

Any help, resources, or direction would be appreciated!

 

My thoughts as of now are to make two different datasets for Pre and Post and then merge them for the proportion maybe using Proc sql as it seems easier to join?

 

Let me know if you need clarification or any additional information.


 

View solution in original post

4 REPLIES 4
collinelliot
Barite | Level 11

What you want is not clear to me, and given that there are a dozen people who usually respond 10x faster than I can, I'm guessing it's not clear to others. You really should post an example of what the desired end state is. That said, here's my poke at what you want using two approaches that hopefully gets you on the right track. But if I'm way off, my apologies.

 

 

data have;
  input STUDYNO:32. PID:BEST. Day_1826:DATE9. Day_InitialIW:DATE9. TRICHOT_1YR:BEST12. Personal_consumption_expenditure:BEST12. Month:32. Interval:$5. TimeFrame:$20. ExpenseType:$20. Value:32.;
  format PID BEST. Day_1826 DATE9. Day_InitialIW DATE9. TRICHOT_1YR BEST12. Personal_consumption_expenditure BEST12.;
  label STUDYNO="Sample size: 55,144,41,185,191, Total:616" PID="SUBJID#" Day_1826="Day1826" Day_InitialIW="IWInitial" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123";
datalines;
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr House 3164.5029543
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Consume 820.0784735
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Health 332.71026714
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr House 3075.6491125
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Consume 808.7741655
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Health 316.74626874
;

/* Method 1 */
proc sql;
    CREATE TABLE want1 AS
    SELECT *,
           max((TimeFrame='PostResYr')*value)/max((TimeFrame='PreResYr')*value) AS PostOverPre format percent8.2
    FROM have
    GROUP BY studyno, pid, expensetype
    ORDER BY studyno, pid, expensetype, TimeFrame='PostResYr';
quit;

/* Method 2 */
proc sort data = have;
    by studyno pid expensetype descending TimeFrame;
run;

data want2;
    retain pre_value;
    set have;
    by studyno pid expensetype descending TimeFrame;
    if first.expensetype then do;
        pre_value = value;
        PostOverPre = .;
    end;
    else do;
        PostOverPre = value / pre_value;
    end;
    if last.expensetype then output;
run;
joebacon
Pyrite | Level 9
This looks close to what I want!

Basically, what I want is a proportion of the PostVal/PreVal for each case.

However, my data was created in a way that it has an "TimeFrame" variable that is either Pre or Post. Then, it has the value for each Pre and Post for each of the 15 different "ExpenseType".

I want the proportion of the Post/Pre for each ExpenseType for each PID (if that makes sense).
Reeza
Super User

However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.

 

 

Use PROC TRANSPOSE to put them on the same row. Then this becomes a trivial task.

 

1. Sort by variables that uniquely identify each group/row

2. Transpose to a Wide format so that Pre/Post are on the same line

3. Calculate Ratio

 


proc sort data=have;
by studyNo PID TRICHOT_1YR ExpenseTYpe timeframe;
run;

proc transpose data=have out=wide;
id timeframe;
by studyNo PID TRICHOT_1YR ExpenseTYpe;
var value;
run;

data want;
set wide;
Ratio = postResYr /PreResYr;
run;

For your code, macro variables do not resolve in single quotes, you need double quotes. You'd also likely need a RETAIN somewhere to keep values across rows. 

 


@joebacon wrote:

Hi again!

 

I can't seem to wrap my head around creating a proportion in a long format of my data. I will provide a portion of my data below:

 

input STUDYNO:32. PID:BEST. Day_1826:DATE9. Day_InitialIW:DATE9. TRICHOT_1YR:BEST12. Personal_consumption_expenditure:BEST12. Month:32. Interval:$5. TimeFrame:$20. ExpenseType:$20. Value:32.;
  format PID BEST. Day_1826 DATE9. Day_InitialIW DATE9. TRICHOT_1YR BEST12. Personal_consumption_expenditure BEST12.;
  label STUDYNO="Sample size: 55,144,41,185,191, Total:616" PID="SUBJID#" Day_1826="Day1826" Day_InitialIW="IWInitial" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123";
datalines;
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr House 3164.5029543
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Consume 820.0784735
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Health 332.71026714
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Insure .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Loan 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Tax .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Educa .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Gift 16262.542769
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Enter 209.70802449
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Recre 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr FinPlan .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Legala 0
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr IncomeTL 61368.08592
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr Trans 3334.881462
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PreResYr AlcSpend 72.87460203
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr House 3075.6491125
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Consume 808.7741655
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Durable .
2 486 30DEC1992 05OCT1993 1 68.892 1 Month PostResYr Health 316.74626874

What I want to do is create a proportion of the PostResYr Value for each ExpenseType over the PreResYr Value for each expense type. I feel that I am on the right track but cant think of a way to complete this. Ultimately, i want to do a stacked bar chart with the x axis being the expense type, the bars being the sum of the proportions and the different segments being the Trichot_1YR. 

 

The code that I was trying to do was the following:

 

data Proportion3;
set Proportion2;
if Value = 0 then Value = 1;
run;
%macro Val(Exp);
data Proportion3;
set Proportion3;
If TimeFrame = 'PreResYr' then PreVal = Value;
If TimeFrame = 'PostResYr' then PostVal = Value;
if ExpenseType = '&Exp' then Proportion = PostVal/PreVal ;
run;
%mend;
%Val(House);
%Val(Consume);
%Val(Durable);
%Val(Trans);
%Val(Health);
%Val(Insure);
%Val(Loan);
%Val(Tax);
%Val(Educa);
%Val(Gift);
%Val(Enter);
%Val(Recre);
%Val(FinPlan);
%Val(Legala);
%Val(AlcSpend);
%Val(IncomeTL);

However, I cannot populate "Proportion" because the PreVal and PostVal are not on the same row.

 

I have managed to create it the long way by creating a new variable for each ExpenseType such that PostResYr_House/PreResYr_House = PropHouse

 

However, I cannot put each of the variables in the X-axis of my stacked bar chart.

 

Any help, resources, or direction would be appreciated!

 

My thoughts as of now are to make two different datasets for Pre and Post and then merge them for the proportion maybe using Proc sql as it seems easier to join?

 

Let me know if you need clarification or any additional information.


 

joebacon
Pyrite | Level 9

As always, big thank you Reeza!

 

I didn't know Proc Transpose could function in this fashion. That's a super simple fix to my problem!

 

Will fix my macro and add a Retain Statement!

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1786 views
  • 3 likes
  • 3 in conversation