My client has asked me for a specific ordering of their output, but I am having a hard time figuring out how to do this programatically.
I have a Medical History (MH) dataset, and I am supposed to count the number of subjects in each SOC/PT MH term grouped by an assigned diagnosis. Then I keep any terms that have 10% or more in either diagnosis. This part I have been able to accomplish.
Next, they want me to sort by the HKD diagnosis percent (HKD_PCT) and keep the first SOC, but then any subsequest terms that have the same SOC should be moved up in order to be together.
Here is what my summary dataset currently looks like when sorted by HKD_PCT:
| MHSOC | MHPT | HKD_CNT | CKD_CNT | HKD_PCT | CKD_PCT | HKD | CKD |
| Renal and urinary disorders | Hypertensive nephropathy | 35 | . | 0.94595 | . | 35 (95%) | 0 (0%) |
| Vascular disorders | Hypertension | 30 | 71 | 0.81081 | 0.76344 | 30 (81%) | 71 (76%) |
| Renal and urinary disorders | Chronic kidney disease | 14 | 90 | 0.37838 | 0.96774 | 14 (38%) | 90 (97%) |
| Metabolism and nutrition disorders | Vitamin D deficiency | 12 | 22 | 0.32432 | 0.23656 | 12 (32%) | 22 (24%) |
| Metabolism and nutrition disorders | Hyperlipidaemia | 11 | 30 | 0.2973 | 0.32258 | 11 (30%) | 30 (32%) |
| Metabolism and nutrition disorders | Obesity | 9 | 14 | 0.24324 | 0.15054 | 9 (24%) | 14 (15%) |
| Metabolism and nutrition disorders | Dyslipidaemia | 8 | 9 | 0.21622 | 0.09677 | 8 (22%) | 9 (10%) |
| Renal and urinary disorders | Acute kidney injury | 5 | 3 | 0.13514 | 0.03226 | 5 (14%) | 3 (3%) |
| Metabolism and nutrition disorders | Gout | 3 | 22 | 0.08108 | 0.23656 | 3 (8%) | 22 (24%) |
| Vascular disorders | Essential hypertension | 1 | 11 | 0.02703 | 0.11828 | 1 (3%) | 11 (12%) |
SAS Code to create the above table:
proc sql;
create table MH_SOC_SUMMARY (MHBODSYS char(100), MHDECOD char(100), HKD_CNT num, CKD_CNT num, HKD_PCT num, CKD_PCT num, HKD char(25), CKD char(25));
insert into mh_soc_summary(mhbodsys, mhdecod, hkd_cnt, ckd_cnt, hkd_pct, ckd_pct)
values('Renal and urinary disorders','Hypertensive nephropathy',35,.,0.94595,.)
values('Vascular disorders','Hypertension',30,71,0.81081,0.76344)
values('Renal and urinary disorders','Chronic kidney disease',14,90,0.37838,0.96774)
values('Metabolism and nutrition disorders','Vitamin D deficiency',12,22,0.32432,0.23656)
values('Metabolism and nutrition disorders','Hyperlipidaemia',11,30,0.2973,0.32258)
values('Metabolism and nutrition disorders','Obesity',9,14,0.24324,0.15054)
values('Metabolism and nutrition disorders','Dyslipidaemia',8,9,0.21622,0.09677)
values('Renal and urinary disorders','Acute kidney injury',5,3,0.13514,0.03226)
values('Metabolism and nutrition disorders','Gout',3,22,0.08108,0.23656)
values('Vascular disorders','Essential hypertension',1,11,0.02703,0.11828);
quit;
data mh_soc_summary;
set mh_soc_summary;
if hkd_cnt^=. then HKD=compress(put(hkd_cnt,9.0)) || ' (' || compress(put(hkd_pct,percent9.0)) || ')';
else HKD='0 (0%)';
if ckd_cnt^=. then CKD=compress(put(ckd_cnt,9.0)) || ' (' || compress(put(ckd_pct,percent9.0)) || ')';
else CKD='0 (0%)';
run;
This is the order that the table should show (where I manually added SRT1 and SRT2 variable values):
| MHSOC | MHPT | HKD_CNT | CKD_CNT | HKD_PCT | CKD_PCT | HKD | CKD | SRT1 | SRT2 |
| Renal and urinary disorders | Hypertensive nephropathy | 35 | . | 0.94595 | . | 35 (95%) | 0 (0%) | 1 | 1 |
| Renal and urinary disorders | Chronic kidney disease | 14 | 90 | 0.37838 | 0.96774 | 14 (38%) | 90 (97%) | 1 | 2 |
| Renal and urinary disorders | Acute kidney injury | 5 | 3 | 0.13514 | 0.03226 | 5 (14%) | 3 (3%) | 1 | 3 |
| Vascular disorders | Hypertension | 30 | 71 | 0.81081 | 0.76344 | 30 (81%) | 71 (76%) | 2 | 1 |
| Vascular disorders | Essential hypertension | 1 | 11 | 0.02703 | 0.11828 | 1 (3%) | 11 (12%) | 2 | 2 |
| Metabolism and nutrition disorders | Vitamin D deficiency | 12 | 22 | 0.32432 | 0.23656 | 12 (32%) | 22 (24%) | 3 | 1 |
| Metabolism and nutrition disorders | Hyperlipidaemia | 11 | 30 | 0.2973 | 0.32258 | 11 (30%) | 30 (32%) | 3 | 2 |
| Metabolism and nutrition disorders | Obesity | 9 | 14 | 0.24324 | 0.15054 | 9 (24%) | 14 (15%) | 3 | 3 |
| Metabolism and nutrition disorders | Dyslipidaemia | 8 | 9 | 0.21622 | 0.09677 | 8 (22%) | 9 (10%) | 3 | 4 |
| Metabolism and nutrition disorders | Gout | 3 | 22 | 0.08108 | 0.23656 | 3 (8%) | 22 (24%) | 3 | 5 |
Since SOC: "Renal and urinary disorders" and PT: "Hypertensive nephropathy" has the highest HKD percent, it should be first. Then the other 2 subsequent "Renal and urinary disorders" terms should be right after it.
SOC: "Vascular disorders" and PT: "Hypertension" is the next highest HKD percent, it will be the next SOC with all other "Vascular disorders" terms right after it.
I can't just hardcode this because the report gets updated weekly, and these terms could change with each update. I can't think of how to utilize first.MHSOC or any other method that I would typically use. Any suggestions on how to achieve the SRT1 and SRT2 variables I manually added above?
It is much easier to share your data as a simple data step. If you already have an example dataset you created you can use this %DSPOST() macro to help generate such a step.
data MH_SOC_SUMMARY ;
infile datalines dsd dlm='|' truncover;
input MHSOC :$100. MHPT :$100. HKD_CNT CKD_CNT HKD_PCT CKD_PCT ;
length HKD CKD $21 ;
if missing(hkd_cnt) then HKD='0 (0%)';
else HKD=catx(' ',put(hkd_cnt,percent9.),cats('(',put(hkd_pct,percent9.),')'));
if missing(ckd_cnt) then CKD='0 (0%)';
else CKD=catx(' ',put(ckd_cnt,percent9.),cats('(',put(ckd_pct,percent9.),')'));
datalines4;
Renal and urinary disorders|Hypertensive nephropathy|35||0.94595|
Vascular disorders|Hypertension|30|71|0.81081|0.76344
Renal and urinary disorders|Chronic kidney disease|14|90|0.37838|0.96774
Metabolism and nutrition disorders|Vitamin D deficiency|12|22|0.32432|0.23656
Metabolism and nutrition disorders|Hyperlipidaemia|11|30|0.2973|0.32258
Metabolism and nutrition disorders|Obesity|9|14|0.24324|0.15054
Metabolism and nutrition disorders|Dyslipidaemia|8|9|0.21622|0.09677
Renal and urinary disorders|Acute kidney injury|5|3|0.13514|0.03226
Metabolism and nutrition disorders|Gout|3|22|0.08108|0.23656
Vascular disorders|Essential hypertension|1|11|0.02703|0.11828
;;;;
Your description is missing a step, which probably explains why your are having a problem coding it. To sort the way you want to all of the values of MHSOC should have same value of HDK_PCT. But your data does not.
So perhaps you mean you want to sort by the MAXimum value of HKD_PCT per MHSOC? Or perhaps the SUM of the HKD_CNT variable?
If so then first add that variable to your data and then you can use it when ordering the data.
proc sql;
create table for_report as
select a.*, max(hkd_pct) as max_hkd_pct
from mh_soc_summary a
group by mhsoc
order by max_hkd_pct desc, mhsoc, hkd_pct desc
;
quit;
Please provide data as working SAS data step code (examples and instructions).
I'm sorry about that. I didn't verify my code ran clean before I posted it. I have updated the SQL code in the original post, but here is the same data in a DATA step for you:
data mh_soc_summary;
length MHSOC $100 MHPT $100 HKD_CNT 8 CKD_CNT 8 HKD_PCT 8 CKD_PCT 8;
mhsoc='Renal and urinary disorders';
mhpt='Hypertensive nephropathy';
hkd_cnt=35;
ckd_cnt=.;
hkd_pct=0.94595;
ckd_pct=.;
output;
mhsoc='Vascular disorders';
mhpt='Hypertension';
hkd_cnt=30;
ckd_cnt=71;
hkd_pct=0.81081;
ckd_pct=0.76344;
output;
mhsoc='Renal and urinary disorders';
mhpt='Chronic kidney disease';
hkd_cnt=14;
ckd_cnt=90;
hkd_pct=0.37838;
ckd_pct=0.96774;
output;
mhsoc='Metabolism and nutrition disorders';
mhpt='Vitamin D deficiency';
hkd_cnt=12;
ckd_cnt=22;
hkd_pct=0.32432;
ckd_pct=0.23656;
output;
mhsoc='Metabolism and nutrition disorders';
mhpt='Hyperlipidaemia';
hkd_cnt=11;
ckd_cnt=30;
hkd_pct=0.2973;
ckd_pct=0.32258;
output;
mhsoc='Metabolism and nutrition disorders';
mhpt='Obesity';
hkd_cnt=9;
ckd_cnt=14;
hkd_pct=0.24324;
ckd_pct=0.15054;
output;
mhsoc='Metabolism and nutrition disorders';
mhpt='Dyslipidaemia';
hkd_cnt=8;
ckd_cnt=9;
hkd_pct=0.21622;
ckd_pct=0.09677;
output;
mhsoc='Renal and urinary disorders';
mhpt='Acute kidney injury';
hkd_cnt=5;
ckd_cnt=3;
hkd_pct=0.13514;
ckd_pct=0.03226;
output;
mhsoc='Metabolism and nutrition disorders';
mhpt='Gout';
hkd_cnt=3;
ckd_cnt=22;
hkd_pct=0.08108;
ckd_pct=0.23656;
output;
mhsoc='Vascular disorders';
mhpt='Essential hypertension';
hkd_cnt=1;
ckd_cnt=11;
hkd_pct=0.02703;
ckd_pct=0.11828;
output;
run;
data mh_soc_summary;
set mh_soc_summary;
if hkd_cnt^=. then HKD=compress(put(hkd_cnt,9.0)) || ' (' || compress(put(hkd_pct,percent9.0)) || ')';
else HKD='0 (0%)';
if ckd_cnt^=. then CKD=compress(put(ckd_cnt,9.0)) || ' (' || compress(put(ckd_pct,percent9.0)) || ')';
else CKD='0 (0%)';
run;
There might be a simpler way, but the code below that I tried uses a PROC FREQ step to get the unique values of Mhsoc in data order. I store that in an output data set and then build a format adding the _n_ value to the Label variable. I then assign that format to a new variable that I can use in a PROC SORT.
data mh_soc_summary;
set mh_soc_summary;
if hkd_cnt^=. then HKD=compress(put(hkd_cnt,9.0)) || ' (' || compress(put(hkd_pct,percent9.0)) || ')';
else HKD='0 (0%)';
if ckd_cnt^=. then CKD=compress(put(ckd_cnt,9.0)) || ' (' || compress(put(ckd_pct,percent9.0)) || ')';
else CKD='0 (0%)';
run;
proc freq order=data noprint;
tables mhsoc / out=out(keep=mhsoc);
run;
data out;
set out;
label=_n_;
start=mhsoc;
fmtname='$soc';
run;
proc format cntlin=out fmtlib;
select $soc;
run;
data want;
set mh_soc_summary;
flag=put(mhsoc,$soc.);
run;
proc sort data=want out=final;
by flag;
run;
proc print data=final;
run;
It is much easier to share your data as a simple data step. If you already have an example dataset you created you can use this %DSPOST() macro to help generate such a step.
data MH_SOC_SUMMARY ;
infile datalines dsd dlm='|' truncover;
input MHSOC :$100. MHPT :$100. HKD_CNT CKD_CNT HKD_PCT CKD_PCT ;
length HKD CKD $21 ;
if missing(hkd_cnt) then HKD='0 (0%)';
else HKD=catx(' ',put(hkd_cnt,percent9.),cats('(',put(hkd_pct,percent9.),')'));
if missing(ckd_cnt) then CKD='0 (0%)';
else CKD=catx(' ',put(ckd_cnt,percent9.),cats('(',put(ckd_pct,percent9.),')'));
datalines4;
Renal and urinary disorders|Hypertensive nephropathy|35||0.94595|
Vascular disorders|Hypertension|30|71|0.81081|0.76344
Renal and urinary disorders|Chronic kidney disease|14|90|0.37838|0.96774
Metabolism and nutrition disorders|Vitamin D deficiency|12|22|0.32432|0.23656
Metabolism and nutrition disorders|Hyperlipidaemia|11|30|0.2973|0.32258
Metabolism and nutrition disorders|Obesity|9|14|0.24324|0.15054
Metabolism and nutrition disorders|Dyslipidaemia|8|9|0.21622|0.09677
Renal and urinary disorders|Acute kidney injury|5|3|0.13514|0.03226
Metabolism and nutrition disorders|Gout|3|22|0.08108|0.23656
Vascular disorders|Essential hypertension|1|11|0.02703|0.11828
;;;;
Your description is missing a step, which probably explains why your are having a problem coding it. To sort the way you want to all of the values of MHSOC should have same value of HDK_PCT. But your data does not.
So perhaps you mean you want to sort by the MAXimum value of HKD_PCT per MHSOC? Or perhaps the SUM of the HKD_CNT variable?
If so then first add that variable to your data and then you can use it when ordering the data.
proc sql;
create table for_report as
select a.*, max(hkd_pct) as max_hkd_pct
from mh_soc_summary a
group by mhsoc
order by max_hkd_pct desc, mhsoc, hkd_pct desc
;
quit;
Dive into keynotes, announcements and breakthroughs on demand.
Explore Now →Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.