Help using Base SAS procedures

Merge of two datasets many many records

Accepted Solution Solved
Reply
Super Contributor
Posts: 301
Accepted Solution

Merge of two datasets many many records

Dear all,

anyone can help with this, the idea is create a variable label, with the SOC variable nested to the PTERM variable, and sorting outSOC and PTERM  by descending frequency:, the idea is this:

The idea is a variable LABEL taking account:

SOC (Family) (High frequency)  GASTRO 41

pterm (subfamily) (High frequency) constipation 16

                                                    nausea 14

                                                    abdominal disconfort 12

                                                    METABOLISM

.                                                  

.

.pterm(low frequency)

SOC (Low Frequency)

pterm (subfamily) (High frequency)
I think need to be something like merge many many, i try with join FULL but i ma missing something here.
Thanks in advance.

.

.

.pterm(low frequency)

data have1;

length soc pterm $20. npterm nsoc 8.;

input soc $ pterm $ npterm nsoc;

cards;

cardiacdisorder  acute      6     21

cardiacdisorder  angina     7     21

cardiacdisorder  coronary   8     21

gastro           constipation 16  16

investigations   headache    3     9

investigations   backpain    4     9

investigations   leftpain    2     9

eye              visionblurred 1   1

;

run;

proc sort data=have1;by descending nsoc descending npterm;run;

data want1;

set have1;

by descending nsoc descending npterm;

if first.nsoc then do;

label=upcase(soc);

output;

end;

if first.npterm then do;

label=pterm;

output;

end;

run;

data have2;

length soc pterm $20. npterm2 nsoc2 8.;

input soc $ pterm $ npterm2 nsoc2;

cards;

gastro           abdominalpain        15     41    

gastro           abdominaldisconfort  12     41

gastro           nausea               14     41

eye              eyepain              5       9

eye              ocularpathy          4       9

Metabolism       dehydratation        20     39

metabolism       decreasedapetite     19     39

cardiacdisorder  miocardio            6       8

investigations   neckpain             1       1

run;

proc sort data=have2;by descending nsoc2 descending npterm2;run;

data want2;

set have2;

by descending nsoc2 descending npterm2;

if first.nsoc2 then do;

label=upcase(soc);

output;

end;

if first.npterm2 then do;

label=pterm;

output;

end;

run;

proc sql noprint;

create table wanted as

select a.*,b.nsoc2,b.npterm2

from want1 as a full join want2 as b

on a.label=b.label

order by nsoc;

quit;


Accepted Solutions
Solution
‎05-12-2013 09:16 AM
Super User
Super User
Posts: 7,042

Re: Merge of two datasets many many records

I think you might still have issues with combining this data. What do you want to do with preferred terms that are in both source files?

Let's assume that you never have that case, or if you do that you want to keep both records.

So combine the files FIRST, then find the maximum NSOC so that you can re-order it and apply your rules for generating the "label" variable.

Something like this:

proc sql noprint ;

  create table report1 as

    select

      max(nsoc) as nsoc

     ,npterm

     ,soc

     ,pterm

    from (select * from have1 union select * from have2)

    group by soc

    order by 1 desc, 2 desc

  ;

quit;

data want;

  set report1 ;

  by descending nsoc ;

  if first.nsoc then do;

    label=soc; count=nsoc; output;

  end;

  label=pterm; count=npterm; output;

run;

proc print;

  var label count;

run;

View solution in original post


All Replies
Super User
Super User
Posts: 7,042

Re: Merge of two datasets many many records

Not sure I understand the question.  But why are you processing by the COUNTS? 

data want1;

set have1;

by descending nsoc descending npterm;

if first.nsoc then do;

label=upcase(soc);

output;

end;

if first.npterm then do;

label=pterm;

output;

end;

run;

Do you really want to combine terms just because they had the same number of occurrences?

Super Contributor
Posts: 301

Re: Merge of two datasets many many records

Hi Tom, because I need to, Alphabetically is the easy one, but I need to do it in the descending frequency in SOC and PTERM...

I need the sorting out in descending frequency for SOC and PTERM, to finally after merging the two datasets (have1, and have2), to get something like that:

GASTRO                 41

constipation             16 

abdominalpain          15     *from dataset have2 

nausea                    14     *from dataset have2

abdominaldisconfort  12     *from dataset have2

METABOLISM         39      *from dataset have 2

dehydratation           20      *from dataset have2

decreasedapetite      19     *from dataset have2

CARDIACDISORDER 21

coronary                    8    

angina                        7    

acute                         6    

miocardio                   6      * from dataset have2



Cheers, and thank you

Solution
‎05-12-2013 09:16 AM
Super User
Super User
Posts: 7,042

Re: Merge of two datasets many many records

I think you might still have issues with combining this data. What do you want to do with preferred terms that are in both source files?

Let's assume that you never have that case, or if you do that you want to keep both records.

So combine the files FIRST, then find the maximum NSOC so that you can re-order it and apply your rules for generating the "label" variable.

Something like this:

proc sql noprint ;

  create table report1 as

    select

      max(nsoc) as nsoc

     ,npterm

     ,soc

     ,pterm

    from (select * from have1 union select * from have2)

    group by soc

    order by 1 desc, 2 desc

  ;

quit;

data want;

  set report1 ;

  by descending nsoc ;

  if first.nsoc then do;

    label=soc; count=nsoc; output;

  end;

  label=pterm; count=npterm; output;

run;

proc print;

  var label count;

run;

Super Contributor
Posts: 301

Re: Merge of two datasets many many records

Thank you for this Tom, but...like you say in the email, What do you want to do with preferred terms that are in both source files?

This is the problem I really I am interested:

If we twist a bit have1 (soc,pterm,npterm,nsoc), and have2 (soc,pterm,npterm2,nsoc2), where we have GASTRO(constipation) and INVESTIGATIONS(lefpain) in the two datasets, i am interesting to a final dataset with (label,nsoc,npterm,nsoc2,npterm2).

Culd you help me with this?

data have1;

length soc pterm $20. npterm nsoc 8.;

input soc $ pterm $ npterm nsoc;

cards;

cardiacdisorder  acute      6     21

cardiacdisorder  angina     7     21

cardiacdisorder  coronary   8     21

gastro           constipation 16  16

investigations   headache    3     9

investigations   backpain    4     9

investigations   leftpain    2     9

eye              visionblurred 1   6

eye              nausea        5   6

;

run;

data have2;

length soc pterm $20. npterm2 nsoc2 8.;

input soc $ pterm $ npterm2 nsoc2;

cards;

gastro           abdominalpain        15     41    

gastro           abdominaldisconfort  12     41

gastro           nausea               14     41

gastro           constipation         18     18  

eye              eyepain              5       9

eye              ocularpathy          4       9

metabolism       dehydratation        20     39

metabolism       decreasedapetite     19     39

cardiacdisorder  miocardio            6       6

investigations   neckpain             1       4

investigations   leftpain             3       4

;

run;

Finally, this is the kind of final dataset I want to get: 

I want to keep label, np1 and np2:

label                           np1                 np2

GASTRO                     16                  59

constipations               1 6                18

abdominalpain                                  15

nausea                                             14 

abdominaldisconfort                          12

INVESTIGATIONS     9                       4

backpain                   4    

headache                  3                                             

leftpain                      2                      3    

neckpain                                           1  

Thank you.

Super Contributor
Posts: 301

Re: Merge of two datasets many many records

sorry this is wrong:  I am interesting to a final dataset with (label,nsoc,npterm,nsoc2,npterm2)...I am interesting in the above dataset, label, np1 and np2

Super Contributor
Posts: 301

Re: Merge of two datasets many many records

sorry Tom...it was an easy twist...Thank you for all your help.

proc sql noprint ;

  create table report1 as

    select

      max(nsoc) as nsoc

     ,max(nsoc2) as nsoc2

     ,npterm

  ,npterm2

     ,soc

     ,pterm

    from temp

    group by soc,pterm

    order by 1 desc, 2 desc

  ;

quit;

data want (keep=label count1 count2);

  set report1 ;

  by descending nsoc ;

  if first.nsoc then do;

    label=soc; count1=nsoc;count2=nsoc2; output;

  end;

  label=pterm; count1=npterm;count2=npterm2; output;

run;

proc print data=want;

 

  var label count1 count2;

run;

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 317 views
  • 3 likes
  • 2 in conversation