Help using Base SAS procedures

group subgroup variables

Accepted Solution Solved
Reply
Super Contributor
Posts: 301
Accepted Solution

group subgroup variables

Hi everyone, I've got the next datset with 4 variables:

                    aesoc                                           pref_term             aeterm                            count

       Blood and lymphatic system disorders    Anaemia                ANEMIA                                 3

       Blood and lymphatic system disorders    Anaemia                ANEMY                                  1

       Blood and lymphatic system disorders    Febrile neutropenia    FEBRILE NEUTOPENIA        1

       Blood and lymphatic system disorders    Febrile neutropenia    FEBRILE NEUTROPENIA       1

       Blood and lymphatic system disorders    Febrile neutropenia    NEUTROPENIA FEBRILE       1

       Blood and lymphatic system disorders    Neutropenia            NEUTROPENIA                        6

       Blood and lymphatic system disorders    Thrombocythaemia       PIASTRINOSIS                   1

       Blood and lymphatic system disorders    Thrombocytopenia       THROMBOCYTOPENIA        1

         I would like to transform this dataset to another dataset with two columns :

                                           col0                                     col1     

          BLOOD AND LYMPHATIC SYSTEM DISORDERS

          Anaemia

          ANEMIA                                                               3                     

          ANEMY                                                                1

          Febrile neutropenia

          FEBRILE NEUTOPENIA                                         1

          FEBRILE NEUTROPENIA                                       1

          NEUTROPENIA FEBRILE                                       1

          Neutropenia

          NEUTROPENIA                                                     6   

          Thrombocythaemia

          PIASTRINOSIS                                                     1 

          THROMBOCYTOPENIA                                         1

I assume that I wil ned to create two new variables: group for aesoc, and subgroup for aept, to make this clasiffication in col0, but I would like you to help me

with this.

Thanks in advance,

J V.


Accepted Solutions
Solution
‎03-18-2012 05:36 PM
Super User
Posts: 5,498

group subgroup variables

Here's a simple version to try.

data want (keep=col0 col1);

   set have;

   by assoc pref_term aeterm count;

   if first.assoc then do;

      col0=assoc;

      output;

   end;

   if first.pref_term then do;

      col0 = pref_term;

      output;

   end;

   col0 = aeterm;

  col1 = count;

   output;

run;

I assume that's a typo in the last line, for pref_term.  Good luck.

View solution in original post


All Replies
Respected Advisor
Posts: 4,920

group subgroup variables

try this:

%let nPos = 4; /* order of last column */

data have(keep=fSmiley Happy;
array f{&nPos.} $40;
infile datalines truncover;
input (f(*)) (&);
datalines;
Blood and lymphatic system disorders    Anaemia                ANEMIA                                 3
       Blood and lymphatic system disorders    Anaemia                ANEMY                                  1
       Blood and lymphatic system disorders    Febrile neutropenia    FEBRILE NEUTOPENIA        1
       Blood and lymphatic system disorders    Febrile neutropenia    FEBRILE NEUTROPENIA       1
       Blood and lymphatic system disorders    Febrile neutropenia    NEUTROPENIA FEBRILE       1
       Blood and lymphatic system disorders    Neutropenia            NEUTROPENIA                        6
       Blood and lymphatic system disorders    Thrombocythaemia       PIASTRINOSIS                   1
       Blood and lymphatic system disorders    Thrombocytopenia       THROMBOCYTOPENIA        1
;
run;

data want(keep=text number);
set have;
array f{*} f:;
length text number $ 40;
number = ""; text = "";
do place = 1 to &nPos.-2;
if lag(f(place)) ne f(place) or not missing(text) then do;
  text = f(place);
  output;
end;
end;
text = f(&nPos.-1);
number = f(&nPos);
output;
run;

proc sql;
select text , number from want;
quit;

PG

PG
Super Contributor
Posts: 301

group subgroup variables

I would like to use a code withouth macrovariables and arrrays Smiley Sad....still looking for help,

but anyway...you did your best..thanks.

Respected Advisor
Posts: 4,920

group subgroup variables

Here is my second best... (inferior solution, without arrays or macro variables)

data have(keep=f1-f4);
length f1-f4 $40;
infile datalines truncover;
input (f1-f4) (&);
datalines;
Blood and lymphatic system disorders    Anaemia                ANEMIA                                 3
       Blood and lymphatic system disorders    Anaemia                ANEMY                                  1
       Blood and lymphatic system disorders    Febrile neutropenia    FEBRILE NEUTOPENIA        1
       Blood and lymphatic system disorders    Febrile neutropenia    FEBRILE NEUTROPENIA       1
       Blood and lymphatic system disorders    Febrile neutropenia    NEUTROPENIA FEBRILE       1
       Blood and lymphatic system disorders    Neutropenia            NEUTROPENIA                        6
       Blood and lymphatic system disorders    Thrombocythaemia       PIASTRINOSIS                   1
       Blood and lymphatic system disorders    Thrombocytopenia       THROMBOCYTOPENIA        1
;
run;

data want(keep=text number);
set have;
length text number $ 40;
number = ""; text = "";
if lag(f1) ne f1 or not missing(text) then do;
text = f1;
output;
end;
if lag(f2) ne f2 or not missing(text) then do;
text = f2;
output;
end;
text = f3;
number = f4;
output;
run;

proc sql;
select text , number from want;
quit;

PG

PG
Solution
‎03-18-2012 05:36 PM
Super User
Posts: 5,498

group subgroup variables

Here's a simple version to try.

data want (keep=col0 col1);

   set have;

   by assoc pref_term aeterm count;

   if first.assoc then do;

      col0=assoc;

      output;

   end;

   if first.pref_term then do;

      col0 = pref_term;

      output;

   end;

   col0 = aeterm;

  col1 = count;

   output;

run;

I assume that's a typo in the last line, for pref_term.  Good luck.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 732 views
  • 3 likes
  • 3 in conversation