BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michtka
Fluorite | Level 6

Dear all,

anyone can help with this, the idea is create a variable label, with the SOC variable nested to the PTERM variable, and sorting outSOC and PTERM  by descending frequency:, the idea is this:

The idea is a variable LABEL taking account:

SOC (Family) (High frequency)  GASTRO 41

pterm (subfamily) (High frequency) constipation 16

                                                    nausea 14

                                                    abdominal disconfort 12

                                                    METABOLISM

.                                                  

.

.pterm(low frequency)

SOC (Low Frequency)

pterm (subfamily) (High frequency)
I think need to be something like merge many many, i try with join FULL but i ma missing something here.
Thanks in advance.

.

.

.pterm(low frequency)

data have1;

length soc pterm $20. npterm nsoc 8.;

input soc $ pterm $ npterm nsoc;

cards;

cardiacdisorder  acute      6     21

cardiacdisorder  angina     7     21

cardiacdisorder  coronary   8     21

gastro           constipation 16  16

investigations   headache    3     9

investigations   backpain    4     9

investigations   leftpain    2     9

eye              visionblurred 1   1

;

run;

proc sort data=have1;by descending nsoc descending npterm;run;

data want1;

set have1;

by descending nsoc descending npterm;

if first.nsoc then do;

label=upcase(soc);

output;

end;

if first.npterm then do;

label=pterm;

output;

end;

run;

data have2;

length soc pterm $20. npterm2 nsoc2 8.;

input soc $ pterm $ npterm2 nsoc2;

cards;

gastro           abdominalpain        15     41    

gastro           abdominaldisconfort  12     41

gastro           nausea               14     41

eye              eyepain              5       9

eye              ocularpathy          4       9

Metabolism       dehydratation        20     39

metabolism       decreasedapetite     19     39

cardiacdisorder  miocardio            6       8

investigations   neckpain             1       1

run;

proc sort data=have2;by descending nsoc2 descending npterm2;run;

data want2;

set have2;

by descending nsoc2 descending npterm2;

if first.nsoc2 then do;

label=upcase(soc);

output;

end;

if first.npterm2 then do;

label=pterm;

output;

end;

run;

proc sql noprint;

create table wanted as

select a.*,b.nsoc2,b.npterm2

from want1 as a full join want2 as b

on a.label=b.label

order by nsoc;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I think you might still have issues with combining this data. What do you want to do with preferred terms that are in both source files?

Let's assume that you never have that case, or if you do that you want to keep both records.

So combine the files FIRST, then find the maximum NSOC so that you can re-order it and apply your rules for generating the "label" variable.

Something like this:

proc sql noprint ;

  create table report1 as

    select

      max(nsoc) as nsoc

     ,npterm

     ,soc

     ,pterm

    from (select * from have1 union select * from have2)

    group by soc

    order by 1 desc, 2 desc

  ;

quit;

data want;

  set report1 ;

  by descending nsoc ;

  if first.nsoc then do;

    label=soc; count=nsoc; output;

  end;

  label=pterm; count=npterm; output;

run;

proc print;

  var label count;

run;

View solution in original post

6 REPLIES 6
Tom
Super User Tom
Super User

Not sure I understand the question.  But why are you processing by the COUNTS? 

data want1;

set have1;

by descending nsoc descending npterm;

if first.nsoc then do;

label=upcase(soc);

output;

end;

if first.npterm then do;

label=pterm;

output;

end;

run;

Do you really want to combine terms just because they had the same number of occurrences?

michtka
Fluorite | Level 6

Hi Tom, because I need to, Alphabetically is the easy one, but I need to do it in the descending frequency in SOC and PTERM...

I need the sorting out in descending frequency for SOC and PTERM, to finally after merging the two datasets (have1, and have2), to get something like that:

GASTRO                 41

constipation             16 

abdominalpain          15     *from dataset have2 

nausea                    14     *from dataset have2

abdominaldisconfort  12     *from dataset have2

METABOLISM         39      *from dataset have 2

dehydratation           20      *from dataset have2

decreasedapetite      19     *from dataset have2

CARDIACDISORDER 21

coronary                    8    

angina                        7    

acute                         6    

miocardio                   6      * from dataset have2



Cheers, and thank you

Tom
Super User Tom
Super User

I think you might still have issues with combining this data. What do you want to do with preferred terms that are in both source files?

Let's assume that you never have that case, or if you do that you want to keep both records.

So combine the files FIRST, then find the maximum NSOC so that you can re-order it and apply your rules for generating the "label" variable.

Something like this:

proc sql noprint ;

  create table report1 as

    select

      max(nsoc) as nsoc

     ,npterm

     ,soc

     ,pterm

    from (select * from have1 union select * from have2)

    group by soc

    order by 1 desc, 2 desc

  ;

quit;

data want;

  set report1 ;

  by descending nsoc ;

  if first.nsoc then do;

    label=soc; count=nsoc; output;

  end;

  label=pterm; count=npterm; output;

run;

proc print;

  var label count;

run;

michtka
Fluorite | Level 6

Thank you for this Tom, but...like you say in the email, What do you want to do with preferred terms that are in both source files?

This is the problem I really I am interested:

If we twist a bit have1 (soc,pterm,npterm,nsoc), and have2 (soc,pterm,npterm2,nsoc2), where we have GASTRO(constipation) and INVESTIGATIONS(lefpain) in the two datasets, i am interesting to a final dataset with (label,nsoc,npterm,nsoc2,npterm2).

Culd you help me with this?

data have1;

length soc pterm $20. npterm nsoc 8.;

input soc $ pterm $ npterm nsoc;

cards;

cardiacdisorder  acute      6     21

cardiacdisorder  angina     7     21

cardiacdisorder  coronary   8     21

gastro           constipation 16  16

investigations   headache    3     9

investigations   backpain    4     9

investigations   leftpain    2     9

eye              visionblurred 1   6

eye              nausea        5   6

;

run;

data have2;

length soc pterm $20. npterm2 nsoc2 8.;

input soc $ pterm $ npterm2 nsoc2;

cards;

gastro           abdominalpain        15     41    

gastro           abdominaldisconfort  12     41

gastro           nausea               14     41

gastro           constipation         18     18  

eye              eyepain              5       9

eye              ocularpathy          4       9

metabolism       dehydratation        20     39

metabolism       decreasedapetite     19     39

cardiacdisorder  miocardio            6       6

investigations   neckpain             1       4

investigations   leftpain             3       4

;

run;

Finally, this is the kind of final dataset I want to get: 

I want to keep label, np1 and np2:

label                           np1                 np2

GASTRO                     16                  59

constipations               1 6                18

abdominalpain                                  15

nausea                                             14 

abdominaldisconfort                          12

INVESTIGATIONS     9                       4

backpain                   4    

headache                  3                                             

leftpain                      2                      3    

neckpain                                           1  

Thank you.

michtka
Fluorite | Level 6

sorry this is wrong:  I am interesting to a final dataset with (label,nsoc,npterm,nsoc2,npterm2)...I am interesting in the above dataset, label, np1 and np2

michtka
Fluorite | Level 6

sorry Tom...it was an easy twist...Thank you for all your help.

proc sql noprint ;

  create table report1 as

    select

      max(nsoc) as nsoc

     ,max(nsoc2) as nsoc2

     ,npterm

  ,npterm2

     ,soc

     ,pterm

    from temp

    group by soc,pterm

    order by 1 desc, 2 desc

  ;

quit;

data want (keep=label count1 count2);

  set report1 ;

  by descending nsoc ;

  if first.nsoc then do;

    label=soc; count1=nsoc;count2=nsoc2; output;

  end;

  label=pterm; count1=npterm;count2=npterm2; output;

run;

proc print data=want;

 

  var label count1 count2;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1035 views
  • 3 likes
  • 2 in conversation