BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
michtka
Fluorite | Level 6

Hi everyone, I've got the next datset( have) with 4 variables :

                    aesoc                                           pref_term             aeterm                            count

       Blood and lymphatic system disorders    Anaemia                ANEMIA                                 3

       Blood and lymphatic system disorders    Anaemia                ANEMY                                  1

       Blood and lymphatic system                  Febrile neutropenia    FEBRILE NEUTOPENIA        1

       Blood and lymphatic system                  Febrile neutropenia    FEBRILE NEUTROPENIA       1

       Blood and lymphatic                              neutropenia            NEUTROPENIA FEBRILE       1

       Blood and lymphatic                              neutropenia            NEUTROPENIA                        6

       Blood                                                   Thrombocythaemia       PIASTRINOSIS                   1

       Blood                                                    Thrombocythaenia       THROMBOCYTOPENIA        1

I would like to add thes 3 new variables grp, sgrp and ssgrp and obtain this new dataset (final) with grp sgrp ssgrp col0 and col1, something like:

GRP SGRP SSGRP                     col 0    col1

     1       0      0                 Blood and lymphatic system disorders

      1       1     0                      anemia

       1       1     1                           ANEMIA      3                                       

       1       1      2                               ANEMY       1

        2      0       0                              Blood and lymphatic system

        2      1       0                          Febrile neutropenia

        2      1       1                         FEBRILE NEUTOPENIA 1

        2       1      2                              FEBRILE NEUTRPENIA 1

        3       0      0                                    Blood and lymphatic

        3       1      0                                   Neutropenia

        3       1      1                                        NEUTROPENIA FEBRILE 1

        3       1      2                                            NEUTROPENIA 6

        4       0      0                                                  blood

        4      1       0                                                  Thombocythaemia

        4      1       1                                                        PIASTRINOSIS 1

        4      1       2                                                          THROMBOCUTOPENIA 1

The dataset (want) with col0 and col1 has been generated (Astounding's code) by:

data want (keep=col0 col1);

   set have;

   by assoc pref_term aeterm count;

   if first.assoc then do;

      col0=assoc;

      output;

   end;

   if first.pref_term then do;

      col0 = pref_term;

      output;

   end;

   col0 = aeterm;

  col1 = count;

   output;

run;

Could you please add the three variables grp, sgrp and ssgrp to the above code to obtain teh right enumeration (grp sgrp and ssgrp) for the dataset (final) with these 5 variables: grp sgrp ssgrp col0 col1?

Thanks in advance.

J V.

1 ACCEPTED SOLUTION

Accepted Solutions
Haikuo
Onyx | Level 15

Minor tweak on Astounding's code will do:

data have;

input (aesoc pref_term aeterm) (:&$ 40.) count;

cards;

       Blood and lymphatic system disorders    Anaemia                ANEMIA                                 3

       Blood and lymphatic system disorders    Anaemia                ANEMY                                  1

       Blood and lymphatic system                  Febrile neutropenia    FEBRILE NEUTOPENIA        1

       Blood and lymphatic system                  Febrile neutropenia    FEBRILE NEUTROPENIA       1

       Blood and lymphatic                              neutropenia            NEUTROPENIA FEBRILE       1

       Blood and lymphatic                              neutropenia            NEUTROPENIA                        6

       Blood                                                   Thrombocythaemia       PIASTRINOSIS                   1

       Blood                                                    Thrombocythaenia       THROMBOCYTOPENIA        1

       ;

proc sort data=have;

by  desending aesoc pref_term aeterm count;;

run;

data want (keep=col1 col0 grp sgrp ssgrp) ;

   retain  grp sgrp ssgrp 0;

   set have;

   length col0 col1 $40;

   by  descending aesoc pref_term aeterm count;

   if first.aesoc then do;

      sgrp=0;

      ssgrp=0;

      col0=aesoc;

      grp+1;

      output;

   end;

   if first.pref_term then do;

     ssgrp=0;

      col0 = pref_term;

      sgrp+1;

      output;

   end;

   if first.aeterm then do;

   ssgrp+1;

   col0 = aeterm;

  col1 = count;

   output;

   end;

run;

proc print;run;

Regards,

Haikuo

View solution in original post

6 REPLIES 6
Haikuo
Onyx | Level 15

Minor tweak on Astounding's code will do:

data have;

input (aesoc pref_term aeterm) (:&$ 40.) count;

cards;

       Blood and lymphatic system disorders    Anaemia                ANEMIA                                 3

       Blood and lymphatic system disorders    Anaemia                ANEMY                                  1

       Blood and lymphatic system                  Febrile neutropenia    FEBRILE NEUTOPENIA        1

       Blood and lymphatic system                  Febrile neutropenia    FEBRILE NEUTROPENIA       1

       Blood and lymphatic                              neutropenia            NEUTROPENIA FEBRILE       1

       Blood and lymphatic                              neutropenia            NEUTROPENIA                        6

       Blood                                                   Thrombocythaemia       PIASTRINOSIS                   1

       Blood                                                    Thrombocythaenia       THROMBOCYTOPENIA        1

       ;

proc sort data=have;

by  desending aesoc pref_term aeterm count;;

run;

data want (keep=col1 col0 grp sgrp ssgrp) ;

   retain  grp sgrp ssgrp 0;

   set have;

   length col0 col1 $40;

   by  descending aesoc pref_term aeterm count;

   if first.aesoc then do;

      sgrp=0;

      ssgrp=0;

      col0=aesoc;

      grp+1;

      output;

   end;

   if first.pref_term then do;

     ssgrp=0;

      col0 = pref_term;

      sgrp+1;

      output;

   end;

   if first.aeterm then do;

   ssgrp+1;

   col0 = aeterm;

  col1 = count;

   output;

   end;

run;

proc print;run;

Regards,

Haikuo

michtka
Fluorite | Level 6

Hi Haikuo, as you can see is not right, Febrile neutropenia (record 5) has to be 2 1 0 and so on, still of 1 2 0...could you ammend it , please?

Thnaks.

             1     1       0       0      Blood and lymphatic system disorders

              2     1       1       0      Anaemia

              3     1       1       1      ANEMIA                                   3

              4     1       1       2      ANEMY                                    1

              5     1       2       0      Febrile neutropenia

              6     1       2       1      FEBRILE NEUTOPENIA                       1

              7     1       2       2      FEBRILE NEUTROPENIA                      1

              8     1       2       3      NEUTROPENIA FEBRILE                      1

              9     1       3       0      Neutropenia

             10     1       3       1      NEUTROPENIA                              6

             11     1       4       0      Thrombocythaemia

             12     1       4       1      PIASTRINOSIS                             1

             13     1       5       0      Thrombocytopenia

             14     1       5       1      THROMBOCYTOPENIA                         1

Haikuo
Onyx | Level 15

OK, first let 's sort this out: what version of data you are using? Apparantly the data you are showing right now can not come from the raw data of your first post.

Second, you need to let me know what is the underline meaning of these 3 new variables. I thought that they are corresponding to the group level.

michtka
Fluorite | Level 6

data have(rename=(f1=aesoc f2=pref_term f3=aeterm f4=count)keep=f1-f4);

length f1-f4 $40;

infile datalines truncover;

input (f1-f4) (&);

datalines;

Blood and lymphatic system disorders    Anaemia                ANEMIA                                 3

       Blood and lymphatic system disorders    Anaemia                ANEMY                                  1

       Blood and lymphatic system disorders    Febrile neutropenia    FEBRILE NEUTOPENIA        1

       Blood and lymphatic system disorders    Febrile neutropenia    FEBRILE NEUTROPENIA       1

       Blood and lymphatic system disorders    Febrile neutropenia    NEUTROPENIA FEBRILE       1

       Blood and lymphatic system disorders    Neutropenia            NEUTROPENIA                        6

       Blood and lymphatic system disorders    Thrombocythaemia       PIASTRINOSIS                   1

       Blood and lymphatic system disorders    Thrombocytopenia       THROMBOCYTOPENIA        1

;

run;

Please, se above...this is the raw of data, and after aplying your code:

I obtained the above sequence.

Could you amend this?

michtka
Fluorite | Level 6

YES! YOU RIGHT.

Sorry, I used a different raw dataset.

My apologyze.

Thnaks very much for your help.

J V

michtka
Fluorite | Level 6

Thnak you again Haikuo, It works perfectly, thanks.

             1     1       0       0      Blood and lymphatic system disorders

              2     1       1       0      Anaemia

              3     1       1       1      ANEMIA                                   3

              4     1       1       2      ANEMY                                    1

              5     2       0       0      Blood and lymphatic system

              6     2       1       0      Febrile neutropenia

              7     2       1       1      FEBRILE NEUTOPENIA                       1

              8     2       1       2      FEBRILE NEUTROPENIA                      1

              9     3       0       0      Blood and lymphatic

             10     3       1       0      Febrile neutropenia

             11     3       1       1      NEUTROPENIA FEBRILE                      1

             12     3       2       0      Neutropenia

             13     3       2       1      NEUTROPENIA                              6

             14     4       0       0      Blood

             15     4       1       0      Thrombocythaemia

             16     4       1       1      PIASTRINOSIS                             1

             17     4       2       0      Thrombocytopenia

             18     4       2       1      THROMBOCYTOPENIA                         1

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1729 views
  • 0 likes
  • 2 in conversation