Hi all,
I really want to try to recreate a graph that I saw in a journal. The best I could do was get a picture of it. I have data with about 20 different expenditure categories which I have created proportions of the PostResolution/PreResolution for each.
I am looking for guidance on how I would go about creating a graph similar to the one below with each of the categories on the side and box plots along the center line of 1 where more than 1 means an increase in spending and lower than 1 means less spending. There are pretty severe outliers and I would need to suppress those as well.
I would like to also do this with means.
A sample of my data is the same I have been working with:
data WORK.NEWPROPYES;
infile datalines dsd truncover;
input PID:BEST. STUDYNO:32. TRICHOT_1YR:DSFMT. ExpenseType:$20. Full_ALC_Exp:BEST12. PostResYr:32. PreResYr:32. Ratio:32.;
format PID BEST. TRICHOT_1YR DSFMT. Full_ALC_Exp BEST12.;
label PID="SUBJID#" STUDYNO="Sample size: 55,144,41,185,191, Total:616" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123" Full_ALC_Exp="1 year complete TLFB alcohol and spending data as well: 1=Yes, 0=No (number of yes is 412)";
datalines;
486 2 RA AlcSpend 1 1 5109.8931528 0.0001956988
495 2 RNA AlcSpend 1 1 992.86168868 0.0010071896
496 2 RNA AlcSpend 1 5.49076816 136.7768901 0.0401439758
533 2 RA AlcSpend 1 1 2093.6517299 0.0004776344
557 2 RNA AlcSpend 1 253.605709 12281.521094 0.0206493729
688 2 RNA AlcSpend 1 16.762896344 108.83831275 0.1540165032
724 2 UR AlcSpend 1 287.32751341 708.61033897 0.4054802726
831 2 RA AlcSpend 1 1 1787.5024202 0.0005594398
907 2 UR AlcSpend 1 125.43157058 4161.9212386 0.0301379011
936 2 RNA AlcSpend 1 28.566739653 1236.7097054 0.0230989856
992 2 RA AlcSpend 1 1 9974.247622 0.0001002582
1034 2 RA AlcSpend 1 82.908408353 2348.3245172 0.0353053455
1035 2 RA AlcSpend 1 30.32631878 1000.1959156 0.0303203785
1051 2 RA AlcSpend 1 1 375.23068 0.0026650273
1065 2 RA AlcSpend 1 22.994739085 6093.8047934 0.0037734617
1074 2 UR AlcSpend 1 1188.8332159 4810.0139303 0.2471579569
1078 2 RA AlcSpend 1 1 7103.8983064 0.0001407678
1112 2 RNA AlcSpend 1 14.930001421 315.3537712 0.047343659
1119 2 UR AlcSpend 1 470.22110048 3039.7531216 0.154690556
1210 2 UR AlcSpend 1 71.823059854 1019.655911 0.0704385264
;;;;
This is very similar to a forest plot, perhaps this will get you started:
@joebacon wrote:
Hi all,
I really want to try to recreate a graph that I saw in a journal. The best I could do was get a picture of it. I have data with about 20 different expenditure categories which I have created proportions of the PostResolution/PreResolution for each.
I am looking for guidance on how I would go about creating a graph similar to the one below with each of the categories on the side and box plots along the center line of 1 where more than 1 means an increase in spending and lower than 1 means less spending. There are pretty severe outliers and I would need to suppress those as well.
I would like to also do this with means.
A sample of my data is the same I have been working with:
data WORK.NEWPROPYES; infile datalines dsd truncover; input PID:BEST. STUDYNO:32. TRICHOT_1YR:DSFMT. ExpenseType:$20. Full_ALC_Exp:BEST12. PostResYr:32. PreResYr:32. Ratio:32.; format PID BEST. TRICHOT_1YR DSFMT. Full_ALC_Exp BEST12.; label PID="SUBJID#" STUDYNO="Sample size: 55,144,41,185,191, Total:616" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123" Full_ALC_Exp="1 year complete TLFB alcohol and spending data as well: 1=Yes, 0=No (number of yes is 412)"; datalines; 486 2 RA AlcSpend 1 1 5109.8931528 0.0001956988 495 2 RNA AlcSpend 1 1 992.86168868 0.0010071896 496 2 RNA AlcSpend 1 5.49076816 136.7768901 0.0401439758 533 2 RA AlcSpend 1 1 2093.6517299 0.0004776344 557 2 RNA AlcSpend 1 253.605709 12281.521094 0.0206493729 688 2 RNA AlcSpend 1 16.762896344 108.83831275 0.1540165032 724 2 UR AlcSpend 1 287.32751341 708.61033897 0.4054802726 831 2 RA AlcSpend 1 1 1787.5024202 0.0005594398 907 2 UR AlcSpend 1 125.43157058 4161.9212386 0.0301379011 936 2 RNA AlcSpend 1 28.566739653 1236.7097054 0.0230989856 992 2 RA AlcSpend 1 1 9974.247622 0.0001002582 1034 2 RA AlcSpend 1 82.908408353 2348.3245172 0.0353053455 1035 2 RA AlcSpend 1 30.32631878 1000.1959156 0.0303203785 1051 2 RA AlcSpend 1 1 375.23068 0.0026650273 1065 2 RA AlcSpend 1 22.994739085 6093.8047934 0.0037734617 1074 2 UR AlcSpend 1 1188.8332159 4810.0139303 0.2471579569 1078 2 RA AlcSpend 1 1 7103.8983064 0.0001407678 1112 2 RNA AlcSpend 1 14.930001421 315.3537712 0.047343659 1119 2 UR AlcSpend 1 470.22110048 3039.7531216 0.154690556 1210 2 UR AlcSpend 1 71.823059854 1019.655911 0.0704385264 ;;;;
This is very similar to a forest plot, perhaps this will get you started:
@joebacon wrote:
Hi all,
I really want to try to recreate a graph that I saw in a journal. The best I could do was get a picture of it. I have data with about 20 different expenditure categories which I have created proportions of the PostResolution/PreResolution for each.
I am looking for guidance on how I would go about creating a graph similar to the one below with each of the categories on the side and box plots along the center line of 1 where more than 1 means an increase in spending and lower than 1 means less spending. There are pretty severe outliers and I would need to suppress those as well.
I would like to also do this with means.
A sample of my data is the same I have been working with:
data WORK.NEWPROPYES; infile datalines dsd truncover; input PID:BEST. STUDYNO:32. TRICHOT_1YR:DSFMT. ExpenseType:$20. Full_ALC_Exp:BEST12. PostResYr:32. PreResYr:32. Ratio:32.; format PID BEST. TRICHOT_1YR DSFMT. Full_ALC_Exp BEST12.; label PID="SUBJID#" STUDYNO="Sample size: 55,144,41,185,191, Total:616" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123" Full_ALC_Exp="1 year complete TLFB alcohol and spending data as well: 1=Yes, 0=No (number of yes is 412)"; datalines; 486 2 RA AlcSpend 1 1 5109.8931528 0.0001956988 495 2 RNA AlcSpend 1 1 992.86168868 0.0010071896 496 2 RNA AlcSpend 1 5.49076816 136.7768901 0.0401439758 533 2 RA AlcSpend 1 1 2093.6517299 0.0004776344 557 2 RNA AlcSpend 1 253.605709 12281.521094 0.0206493729 688 2 RNA AlcSpend 1 16.762896344 108.83831275 0.1540165032 724 2 UR AlcSpend 1 287.32751341 708.61033897 0.4054802726 831 2 RA AlcSpend 1 1 1787.5024202 0.0005594398 907 2 UR AlcSpend 1 125.43157058 4161.9212386 0.0301379011 936 2 RNA AlcSpend 1 28.566739653 1236.7097054 0.0230989856 992 2 RA AlcSpend 1 1 9974.247622 0.0001002582 1034 2 RA AlcSpend 1 82.908408353 2348.3245172 0.0353053455 1035 2 RA AlcSpend 1 30.32631878 1000.1959156 0.0303203785 1051 2 RA AlcSpend 1 1 375.23068 0.0026650273 1065 2 RA AlcSpend 1 22.994739085 6093.8047934 0.0037734617 1074 2 UR AlcSpend 1 1188.8332159 4810.0139303 0.2471579569 1078 2 RA AlcSpend 1 1 7103.8983064 0.0001407678 1112 2 RNA AlcSpend 1 14.930001421 315.3537712 0.047343659 1119 2 UR AlcSpend 1 470.22110048 3039.7531216 0.154690556 1210 2 UR AlcSpend 1 71.823059854 1019.655911 0.0704385264 ;;;;
You don't know if you can do something until you try 🙂
@joebacon wrote:
Thank you! This is definitely where I need to go. I am in way over my head, but were going to figure it out. Just one piece of a code at a time!
Appreciate you, Reeza.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.