Hi all,
I really want to try to recreate a graph that I saw in a journal. The best I could do was get a picture of it. I have data with about 20 different expenditure categories which I have created proportions of the PostResolution/PreResolution for each.
I am looking for guidance on how I would go about creating a graph similar to the one below with each of the categories on the side and box plots along the center line of 1 where more than 1 means an increase in spending and lower than 1 means less spending. There are pretty severe outliers and I would need to suppress those as well.
I would like to also do this with means.
A sample of my data is the same I have been working with:
data WORK.NEWPROPYES;
infile datalines dsd truncover;
input PID:BEST. STUDYNO:32. TRICHOT_1YR:DSFMT. ExpenseType:$20. Full_ALC_Exp:BEST12. PostResYr:32. PreResYr:32. Ratio:32.;
format PID BEST. TRICHOT_1YR DSFMT. Full_ALC_Exp BEST12.;
label PID="SUBJID#" STUDYNO="Sample size: 55,144,41,185,191, Total:616" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123" Full_ALC_Exp="1 year complete TLFB alcohol and spending data as well: 1=Yes, 0=No (number of yes is 412)";
datalines;
486 2 RA AlcSpend 1 1 5109.8931528 0.0001956988
495 2 RNA AlcSpend 1 1 992.86168868 0.0010071896
496 2 RNA AlcSpend 1 5.49076816 136.7768901 0.0401439758
533 2 RA AlcSpend 1 1 2093.6517299 0.0004776344
557 2 RNA AlcSpend 1 253.605709 12281.521094 0.0206493729
688 2 RNA AlcSpend 1 16.762896344 108.83831275 0.1540165032
724 2 UR AlcSpend 1 287.32751341 708.61033897 0.4054802726
831 2 RA AlcSpend 1 1 1787.5024202 0.0005594398
907 2 UR AlcSpend 1 125.43157058 4161.9212386 0.0301379011
936 2 RNA AlcSpend 1 28.566739653 1236.7097054 0.0230989856
992 2 RA AlcSpend 1 1 9974.247622 0.0001002582
1034 2 RA AlcSpend 1 82.908408353 2348.3245172 0.0353053455
1035 2 RA AlcSpend 1 30.32631878 1000.1959156 0.0303203785
1051 2 RA AlcSpend 1 1 375.23068 0.0026650273
1065 2 RA AlcSpend 1 22.994739085 6093.8047934 0.0037734617
1074 2 UR AlcSpend 1 1188.8332159 4810.0139303 0.2471579569
1078 2 RA AlcSpend 1 1 7103.8983064 0.0001407678
1112 2 RNA AlcSpend 1 14.930001421 315.3537712 0.047343659
1119 2 UR AlcSpend 1 470.22110048 3039.7531216 0.154690556
1210 2 UR AlcSpend 1 71.823059854 1019.655911 0.0704385264
;;;;
This is very similar to a forest plot, perhaps this will get you started:
@joebacon wrote:
Hi all,
I really want to try to recreate a graph that I saw in a journal. The best I could do was get a picture of it. I have data with about 20 different expenditure categories which I have created proportions of the PostResolution/PreResolution for each.
I am looking for guidance on how I would go about creating a graph similar to the one below with each of the categories on the side and box plots along the center line of 1 where more than 1 means an increase in spending and lower than 1 means less spending. There are pretty severe outliers and I would need to suppress those as well.
I would like to also do this with means.
A sample of my data is the same I have been working with:
data WORK.NEWPROPYES; infile datalines dsd truncover; input PID:BEST. STUDYNO:32. TRICHOT_1YR:DSFMT. ExpenseType:$20. Full_ALC_Exp:BEST12. PostResYr:32. PreResYr:32. Ratio:32.; format PID BEST. TRICHOT_1YR DSFMT. Full_ALC_Exp BEST12.; label PID="SUBJID#" STUDYNO="Sample size: 55,144,41,185,191, Total:616" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123" Full_ALC_Exp="1 year complete TLFB alcohol and spending data as well: 1=Yes, 0=No (number of yes is 412)"; datalines; 486 2 RA AlcSpend 1 1 5109.8931528 0.0001956988 495 2 RNA AlcSpend 1 1 992.86168868 0.0010071896 496 2 RNA AlcSpend 1 5.49076816 136.7768901 0.0401439758 533 2 RA AlcSpend 1 1 2093.6517299 0.0004776344 557 2 RNA AlcSpend 1 253.605709 12281.521094 0.0206493729 688 2 RNA AlcSpend 1 16.762896344 108.83831275 0.1540165032 724 2 UR AlcSpend 1 287.32751341 708.61033897 0.4054802726 831 2 RA AlcSpend 1 1 1787.5024202 0.0005594398 907 2 UR AlcSpend 1 125.43157058 4161.9212386 0.0301379011 936 2 RNA AlcSpend 1 28.566739653 1236.7097054 0.0230989856 992 2 RA AlcSpend 1 1 9974.247622 0.0001002582 1034 2 RA AlcSpend 1 82.908408353 2348.3245172 0.0353053455 1035 2 RA AlcSpend 1 30.32631878 1000.1959156 0.0303203785 1051 2 RA AlcSpend 1 1 375.23068 0.0026650273 1065 2 RA AlcSpend 1 22.994739085 6093.8047934 0.0037734617 1074 2 UR AlcSpend 1 1188.8332159 4810.0139303 0.2471579569 1078 2 RA AlcSpend 1 1 7103.8983064 0.0001407678 1112 2 RNA AlcSpend 1 14.930001421 315.3537712 0.047343659 1119 2 UR AlcSpend 1 470.22110048 3039.7531216 0.154690556 1210 2 UR AlcSpend 1 71.823059854 1019.655911 0.0704385264 ;;;;
This is very similar to a forest plot, perhaps this will get you started:
@joebacon wrote:
Hi all,
I really want to try to recreate a graph that I saw in a journal. The best I could do was get a picture of it. I have data with about 20 different expenditure categories which I have created proportions of the PostResolution/PreResolution for each.
I am looking for guidance on how I would go about creating a graph similar to the one below with each of the categories on the side and box plots along the center line of 1 where more than 1 means an increase in spending and lower than 1 means less spending. There are pretty severe outliers and I would need to suppress those as well.
I would like to also do this with means.
A sample of my data is the same I have been working with:
data WORK.NEWPROPYES; infile datalines dsd truncover; input PID:BEST. STUDYNO:32. TRICHOT_1YR:DSFMT. ExpenseType:$20. Full_ALC_Exp:BEST12. PostResYr:32. PreResYr:32. Ratio:32.; format PID BEST. TRICHOT_1YR DSFMT. Full_ALC_Exp BEST12.; label PID="SUBJID#" STUDYNO="Sample size: 55,144,41,185,191, Total:616" TRICHOT_1YR="RA 273, UR 140, RNA 80, Missing 123" Full_ALC_Exp="1 year complete TLFB alcohol and spending data as well: 1=Yes, 0=No (number of yes is 412)"; datalines; 486 2 RA AlcSpend 1 1 5109.8931528 0.0001956988 495 2 RNA AlcSpend 1 1 992.86168868 0.0010071896 496 2 RNA AlcSpend 1 5.49076816 136.7768901 0.0401439758 533 2 RA AlcSpend 1 1 2093.6517299 0.0004776344 557 2 RNA AlcSpend 1 253.605709 12281.521094 0.0206493729 688 2 RNA AlcSpend 1 16.762896344 108.83831275 0.1540165032 724 2 UR AlcSpend 1 287.32751341 708.61033897 0.4054802726 831 2 RA AlcSpend 1 1 1787.5024202 0.0005594398 907 2 UR AlcSpend 1 125.43157058 4161.9212386 0.0301379011 936 2 RNA AlcSpend 1 28.566739653 1236.7097054 0.0230989856 992 2 RA AlcSpend 1 1 9974.247622 0.0001002582 1034 2 RA AlcSpend 1 82.908408353 2348.3245172 0.0353053455 1035 2 RA AlcSpend 1 30.32631878 1000.1959156 0.0303203785 1051 2 RA AlcSpend 1 1 375.23068 0.0026650273 1065 2 RA AlcSpend 1 22.994739085 6093.8047934 0.0037734617 1074 2 UR AlcSpend 1 1188.8332159 4810.0139303 0.2471579569 1078 2 RA AlcSpend 1 1 7103.8983064 0.0001407678 1112 2 RNA AlcSpend 1 14.930001421 315.3537712 0.047343659 1119 2 UR AlcSpend 1 470.22110048 3039.7531216 0.154690556 1210 2 UR AlcSpend 1 71.823059854 1019.655911 0.0704385264 ;;;;
You don't know if you can do something until you try 🙂
@joebacon wrote:
Thank you! This is definitely where I need to go. I am in way over my head, but were going to figure it out. Just one piece of a code at a time!
Appreciate you, Reeza.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.