Dear all,
anyone can help with this, the idea is create a variable label, with the SOC variable nested to the PTERM variable, and sorting outSOC and PTERM by descending frequency:, the idea is this:
The idea is a variable LABEL taking account:
SOC (Family) (High frequency) GASTRO 41
pterm (subfamily) (High frequency) constipation 16
nausea 14
abdominal disconfort 12
METABOLISM
.
.
.pterm(low frequency)
SOC (Low Frequency)
.
.
.pterm(low frequency)
data have1;
length soc pterm $20. npterm nsoc 8.;
input soc $ pterm $ npterm nsoc;
cards;
cardiacdisorder acute 6 21
cardiacdisorder angina 7 21
cardiacdisorder coronary 8 21
gastro constipation 16 16
investigations headache 3 9
investigations backpain 4 9
investigations leftpain 2 9
eye visionblurred 1 1
;
run;
proc sort data=have1;by descending nsoc descending npterm;run;
data want1;
set have1;
by descending nsoc descending npterm;
if first.nsoc then do;
label=upcase(soc);
output;
end;
if first.npterm then do;
label=pterm;
output;
end;
run;
data have2;
length soc pterm $20. npterm2 nsoc2 8.;
input soc $ pterm $ npterm2 nsoc2;
cards;
gastro abdominalpain 15 41
gastro abdominaldisconfort 12 41
gastro nausea 14 41
eye eyepain 5 9
eye ocularpathy 4 9
Metabolism dehydratation 20 39
metabolism decreasedapetite 19 39
cardiacdisorder miocardio 6 8
investigations neckpain 1 1
run;
proc sort data=have2;by descending nsoc2 descending npterm2;run;
data want2;
set have2;
by descending nsoc2 descending npterm2;
if first.nsoc2 then do;
label=upcase(soc);
output;
end;
if first.npterm2 then do;
label=pterm;
output;
end;
run;
proc sql noprint;
create table wanted as
select a.*,b.nsoc2,b.npterm2
from want1 as a full join want2 as b
on a.label=b.label
order by nsoc;
quit;
I think you might still have issues with combining this data. What do you want to do with preferred terms that are in both source files?
Let's assume that you never have that case, or if you do that you want to keep both records.
So combine the files FIRST, then find the maximum NSOC so that you can re-order it and apply your rules for generating the "label" variable.
Something like this:
proc sql noprint ;
create table report1 as
select
max(nsoc) as nsoc
,npterm
,soc
,pterm
from (select * from have1 union select * from have2)
group by soc
order by 1 desc, 2 desc
;
quit;
data want;
set report1 ;
by descending nsoc ;
if first.nsoc then do;
label=soc; count=nsoc; output;
end;
label=pterm; count=npterm; output;
run;
proc print;
var label count;
run;
Not sure I understand the question. But why are you processing by the COUNTS?
data want1;
set have1;
by descending nsoc descending npterm;
if first.nsoc then do;
label=upcase(soc);
output;
end;
if first.npterm then do;
label=pterm;
output;
end;
run;
Do you really want to combine terms just because they had the same number of occurrences?
Hi Tom, because I need to, Alphabetically is the easy one, but I need to do it in the descending frequency in SOC and PTERM...
I need the sorting out in descending frequency for SOC and PTERM, to finally after merging the two datasets (have1, and have2), to get something like that:
GASTRO 41
constipation 16
abdominalpain 15 *from dataset have2
nausea 14 *from dataset have2
abdominaldisconfort 12 *from dataset have2
METABOLISM 39 *from dataset have 2
dehydratation 20 *from dataset have2
decreasedapetite 19 *from dataset have2
CARDIACDISORDER 21
coronary 8
angina 7
acute 6
miocardio 6 * from dataset have2
Cheers, and thank you
I think you might still have issues with combining this data. What do you want to do with preferred terms that are in both source files?
Let's assume that you never have that case, or if you do that you want to keep both records.
So combine the files FIRST, then find the maximum NSOC so that you can re-order it and apply your rules for generating the "label" variable.
Something like this:
proc sql noprint ;
create table report1 as
select
max(nsoc) as nsoc
,npterm
,soc
,pterm
from (select * from have1 union select * from have2)
group by soc
order by 1 desc, 2 desc
;
quit;
data want;
set report1 ;
by descending nsoc ;
if first.nsoc then do;
label=soc; count=nsoc; output;
end;
label=pterm; count=npterm; output;
run;
proc print;
var label count;
run;
Thank you for this Tom, but...like you say in the email, What do you want to do with preferred terms that are in both source files?
This is the problem I really I am interested:
If we twist a bit have1 (soc,pterm,npterm,nsoc), and have2 (soc,pterm,npterm2,nsoc2), where we have GASTRO(constipation) and INVESTIGATIONS(lefpain) in the two datasets, i am interesting to a final dataset with (label,nsoc,npterm,nsoc2,npterm2).
Culd you help me with this?
data have1;
length soc pterm $20. npterm nsoc 8.;
input soc $ pterm $ npterm nsoc;
cards;
cardiacdisorder acute 6 21
cardiacdisorder angina 7 21
cardiacdisorder coronary 8 21
gastro constipation 16 16
investigations headache 3 9
investigations backpain 4 9
investigations leftpain 2 9
eye visionblurred 1 6
eye nausea 5 6
;
run;
data have2;
length soc pterm $20. npterm2 nsoc2 8.;
input soc $ pterm $ npterm2 nsoc2;
cards;
gastro abdominalpain 15 41
gastro abdominaldisconfort 12 41
gastro nausea 14 41
gastro constipation 18 18
eye eyepain 5 9
eye ocularpathy 4 9
metabolism dehydratation 20 39
metabolism decreasedapetite 19 39
cardiacdisorder miocardio 6 6
investigations neckpain 1 4
investigations leftpain 3 4
;
run;
Finally, this is the kind of final dataset I want to get:
I want to keep label, np1 and np2:
label np1 np2
GASTRO 16 59
constipations 1 6 18
abdominalpain 15
nausea 14
abdominaldisconfort 12
INVESTIGATIONS 9 4
backpain 4
headache 3
leftpain 2 3
neckpain 1
Thank you.
sorry this is wrong: I am interesting to a final dataset with (label,nsoc,npterm,nsoc2,npterm2)...I am interesting in the above dataset, label, np1 and np2
sorry Tom...it was an easy twist...Thank you for all your help.
proc sql noprint ;
create table report1 as
select
max(nsoc) as nsoc
,max(nsoc2) as nsoc2
,npterm
,npterm2
,soc
,pterm
from temp
group by soc,pterm
order by 1 desc, 2 desc
;
quit;
data want (keep=label count1 count2);
set report1 ;
by descending nsoc ;
if first.nsoc then do;
label=soc; count1=nsoc;count2=nsoc2; output;
end;
label=pterm; count1=npterm;count2=npterm2; output;
run;
proc print data=want;
var label count1 count2;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.