DATA Step, Macro, Functions and more

Create variables as per separators in string

Accepted Solution Solved
Reply
Contributor
Posts: 31
Accepted Solution

Create variables as per separators in string

Hi All,

 

I have data as below.

 

Subject           Disease

X1                  Fever^Acidity^Vomiting

X2                  Chills^Nausea

 

I need output as below.

 

Subject    Disease1     Disease2      Disease3

X1            Fever           Acidity          Vomiting

X2            Chills            Nausea

 

We may have more data in Disease column sepearated by ^.I am looking for macro which can count number of ^ in string and create new variables in dataset for each of disease .

 

To start with, i have used countc function for occurence of ^ but after that i am not getting any way to move ahead.

 

Can anyone please suggest way forward.

 

Regards,

Rajesh


Accepted Solutions
Solution
‎08-31-2017 10:18 AM
Super User
Super User
Posts: 7,970

Re: Create variables as per separators in string

Well, two ways - simplest I find:

data inter;
  set have;
  do i=1 to countw(disease,"^");
    d=scan(disease,i,"^");
    output;
  end;
run;
proc transpose data=inter out=want prefix=disease;
  by subject;
  var d;
run;

You can also find max(count of ^) then use arrays, but thats more code.

View solution in original post


All Replies
Solution
‎08-31-2017 10:18 AM
Super User
Super User
Posts: 7,970

Re: Create variables as per separators in string

Well, two ways - simplest I find:

data inter;
  set have;
  do i=1 to countw(disease,"^");
    d=scan(disease,i,"^");
    output;
  end;
run;
proc transpose data=inter out=want prefix=disease;
  by subject;
  var d;
run;

You can also find max(count of ^) then use arrays, but thats more code.

Super User
Posts: 11,343

Re: Create variables as per separators in string

The "easy" approach might be to just set a large number of variables assuming you'll never exceed that number.

 

data have;
   infile datalines missover;
   informat subject $5. disease $100.;
   input Subject           Disease ;
datalines;
X1 Fever^Acidity^Vomiting
X2 Chills^Nausea
X3 Fever^Acidity^SomethingMuchlonger
;
run;

data want;
   set have;
   array d $30 disease1-disease25;
   do i=1 to (countw(disease,'^'));
      d[i] = scan(disease,i,'^');
   end;
   drop i;
run;

A possibly more critical issue is actually how long to make your individul "disease" variables (which look more like symptoms). The array I used set them to 30 characters.

 

Execise for the interested reader is to capture the maximum number actually used and then remove unused (hint: retain max i, call symputx and a drop statement in another datastep.

Valued Guide
Posts: 765

Re: Create variables as per separators in string

Hi, here's another idea. I made the maximum number of diseses five (D1-D5) and the maimum lenght of any disease twenty. You could make those values anythimg you like.

 

data x;
infile datalines dsd dlm='^' missover;
input @;
substr(_infile_,find(_infile_,' '),1) = '^';
input id :$5. (d1-d5) (:$20.);
count = countc(_infile_,'^');
datalines;
X1 Fever^Acidity^Vomiting
X2 Chills^Nausea
X3 Bad Vibes^Bummed^Just Plain Tired
X1234 Upset Stomach
;

 

DATA SET: x

id       d1                 d2       d3                  d4    d5    count

X1       Fever            Acidity    Vomiting                          3
X2       Chills           Nausea                                       2
X3       Bad Vibes        Bummed     Just Plain Tired                  3
X1234    Upset Stomach                                                 1
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 81 views
  • 1 like
  • 4 in conversation