BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Dhana18
Obsidian | Level 7

Hi I have a data like this;

data symptoms;

input id symptoms $ ;

cards;

1 discharge

2 pain

3 sore

4 odor warts

;

run;

 

I transposed the data set and it looks this now

 

id  discharge  pain  sore  odor  

1  discharge

2                    pain              odor

3                            sore

4                                        odor

 

instead of this i would like to have like THIS (below)

id  discharge  pain  sore  odor  

1       1             0        0       0

2        0            1        0       1

3        0            0        1       0

4        0            0        0      1

I used if then statement like this;

if discharge ="discharge" then discharge="1";

 

 but i have hundreds of symptoms to write if then statement.  is there a shorter way macro or something to do it efficiently?

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

if you are OK with auto conversion log note:

 


data have;
input (id  discharge  pain  sore  odor  ) (:$10.);
cards;
1  discharge . . . 
2         .           pain      .        odor
3            . .                sore .
4                  . . .                      odor
;

data want;
set have;
array t(*) discharge--odor;
do _n_=1 to dim(t);
t(_n_)= vname(t(_n_))=t(_n_);
end;
run;

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

if you are OK with auto conversion log note:

 


data have;
input (id  discharge  pain  sore  odor  ) (:$10.);
cards;
1  discharge . . . 
2         .           pain      .        odor
3            . .                sore .
4                  . . .                      odor
;

data want;
set have;
array t(*) discharge--odor;
do _n_=1 to dim(t);
t(_n_)= vname(t(_n_))=t(_n_);
end;
run;
Dhana18
Obsidian | Level 7
It worked!
It worked for the symptoms variable without dash in between but did not work for the variable like this "nickel_dime_les" , "Vaginal_lesion" do I need to add or delete something on the code you sent to me earlier?
novinosrin
Tourmaline | Level 20

Can you post a better and a true representative sample of your real  plz? 

Dhana18
Obsidian | Level 7
Hi I have a data like this;
data symptoms;
input id symptoms $ ;
cards;
1 discharge
2 pain
3 sore
4 odor warts
;
run;

I transposed the data set and it looks this now

id vaginal_discharge pain_on_sex peniel_sore
1 peniel_sore
2 vaginal_discharge
3 pain_on_sex
4 pain_on_sex peniel_sore

instead of this i would like to have like THIS (below)
id vaginal_discharge pain_on_sex peniel_sore
1 0 0 1
2 1 0 0
3 0 1 0
4 0 1 1

The underscore between word of the variable is not working.

I used if then statement like this;
if discharge ="discharge" then discharge="1";

but i have hundreds of symptoms to write if then statement. is there a shorter way macro or something to do it efficiently?

Astounding
PROC Star

Regardless of the route you choose, macro language will not be involved.

 

One possibility:  add a dummy variable with a value of 1:

 

data have2;

set have;

dummy=1;
run;

 

Then you can use that in your transposing:

 

proc transpose data=have2 out=want;

var dummy;

id symptoms;

by id;

run;

 

Issues you will need to address beyond that:

 

  • This fills in the "1" values only, not the "0" values.  You will still need to go back and change missing values to 0.  That's a straightforward application for arrays.
  • If the variable SYMPTOMS really can contain a list rather than a single item, you will need to split it up.  For example, for ID=4, you will need to change it to two observations instead of 1.  One observation will have symtpoms="warts" and the other will have symptoms="odor".
  • It is entirely likely that the transposed data will be clumsy and difficult to work with.  You might be better served by never doing this at all.  But that really depends on where the analysis is headed down the road.
SuryaKiran
Meteorite | Level 14

Set a dummy variable and then you can use that as var in transpose.

data have;
set symptoms;
dummy=1;
run;
proc sort data=have;
by id;
proc transpose data=have out=want(drop= _Name_) ;
by id;
id symptoms;
var dummy;
run;

data want;
set want;
array num _numeric_;
do over num;
if missing(num) then num=0;
end;
run;

 

Thanks,
Suryakiran
SuryaKiran
Meteorite | Level 14

If you have spaces between the names that needs to be converted to variable names then use options validvarname=v7; before the transpose, this will convert the names to valid variable names (Replaces with "_" for all invalid characters for variable names.

Thanks,
Suryakiran
Reeza
Super User
You may want to consider adding IDLABEL so the labels are the full name. You can also look at GLMOD to automatically create the dummy variables but it’s probably more work.
Tom
Super User Tom
Super User

Fastest way might be use some code generation to generate an SQL statement to summarize by ID.

data symptoms;
  input id symptom $32. ;
cards;
1 discharge
2 pain
3 sore
4 odor
4 warts
;

proc sort data=symptoms(keep=symptom) nodupkey out=symptomlist ;
  by symptom;
run;

filename code temp;
data _null_;
  set symptomlist end=eof;
  file code ;
  if _n_=1 then put
 'proc sql noprint;'
/'create table want as select'
/' id'
  ;
  put ',max(symptom=' symptom :$quote. ') as ' symptom  ;
  if eof then put
 'from symptoms'
/'group by id'
/';'
/'quit;'
  ;
run;

%include code / source2;
proc print;
run;
Obs    id    discharge    odor    pain    sore    warts

 1      1        1          0       0       0       0
 2      2        0          0       1       0       0
 3      3        0          0       0       1       0
 4      4        0          1       0       0       1
137 +proc sql noprint;
138 +create table want as select
139 + id
140 +,max(symptom="discharge" ) as discharge
141 +,max(symptom="odor" ) as odor
142 +,max(symptom="pain" ) as pain
143 +,max(symptom="sore" ) as sore
144 +,max(symptom="warts" ) as warts
145 +from symptoms
146 +group by id
147 +;
NOTE: SAS threaded sort was used.
NOTE: Table WORK.WANT created, with 4 rows and 6 columns.

If your source data is NOT one observation per ID*SYMPTOM then you would need to change the logic of the test for whether a particular observation has an indication of that symptom. 

Also if your symptom values are not valid SAS variable names then you will have trouble.  You might need to generate unique names for the flag variables.

 

Ksharp
Super User

It is called design matrix.

 

data symptoms;
input id symptoms $ ;
dummy=1;
cards;
1 discharge
2 pain
3 sore
4 odor warts
;
run;

proc logistic data=symptoms outdesign=want(drop=dummy) outdesignonly;
class symptoms/param=glm;
model dummy=id symptoms/noint;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 2589 views
  • 1 like
  • 7 in conversation