- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi I have a data like this;
data symptoms;
input id symptoms $ ;
cards;
1 discharge
2 pain
3 sore
4 odor warts
;
run;
I transposed the data set and it looks this now
id discharge pain sore odor
1 discharge
2 pain odor
3 sore
4 odor
instead of this i would like to have like THIS (below)
id discharge pain sore odor
1 1 0 0 0
2 0 1 0 1
3 0 0 1 0
4 0 0 0 1
I used if then statement like this;
if discharge ="discharge" then discharge="1";
but i have hundreds of symptoms to write if then statement. is there a shorter way macro or something to do it efficiently?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if you are OK with auto conversion log note:
data have;
input (id discharge pain sore odor ) (:$10.);
cards;
1 discharge . . .
2 . pain . odor
3 . . sore .
4 . . . odor
;
data want;
set have;
array t(*) discharge--odor;
do _n_=1 to dim(t);
t(_n_)= vname(t(_n_))=t(_n_);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
if you are OK with auto conversion log note:
data have;
input (id discharge pain sore odor ) (:$10.);
cards;
1 discharge . . .
2 . pain . odor
3 . . sore .
4 . . . odor
;
data want;
set have;
array t(*) discharge--odor;
do _n_=1 to dim(t);
t(_n_)= vname(t(_n_))=t(_n_);
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It worked for the symptoms variable without dash in between but did not work for the variable like this "nickel_dime_les" , "Vaginal_lesion" do I need to add or delete something on the code you sent to me earlier?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you post a better and a true representative sample of your real plz?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data symptoms;
input id symptoms $ ;
cards;
1 discharge
2 pain
3 sore
4 odor warts
;
run;
I transposed the data set and it looks this now
id vaginal_discharge pain_on_sex peniel_sore
1 peniel_sore
2 vaginal_discharge
3 pain_on_sex
4 pain_on_sex peniel_sore
instead of this i would like to have like THIS (below)
id vaginal_discharge pain_on_sex peniel_sore
1 0 0 1
2 1 0 0
3 0 1 0
4 0 1 1
The underscore between word of the variable is not working.
I used if then statement like this;
if discharge ="discharge" then discharge="1";
but i have hundreds of symptoms to write if then statement. is there a shorter way macro or something to do it efficiently?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Regardless of the route you choose, macro language will not be involved.
One possibility: add a dummy variable with a value of 1:
data have2;
set have;
dummy=1;
run;
Then you can use that in your transposing:
proc transpose data=have2 out=want;
var dummy;
id symptoms;
by id;
run;
Issues you will need to address beyond that:
- This fills in the "1" values only, not the "0" values. You will still need to go back and change missing values to 0. That's a straightforward application for arrays.
- If the variable SYMPTOMS really can contain a list rather than a single item, you will need to split it up. For example, for ID=4, you will need to change it to two observations instead of 1. One observation will have symtpoms="warts" and the other will have symptoms="odor".
- It is entirely likely that the transposed data will be clumsy and difficult to work with. You might be better served by never doing this at all. But that really depends on where the analysis is headed down the road.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Set a dummy variable and then you can use that as var in transpose.
data have;
set symptoms;
dummy=1;
run;
proc sort data=have;
by id;
proc transpose data=have out=want(drop= _Name_) ;
by id;
id symptoms;
var dummy;
run;
data want;
set want;
array num _numeric_;
do over num;
if missing(num) then num=0;
end;
run;
Suryakiran
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you have spaces between the names that needs to be converted to variable names then use options validvarname=v7; before the transpose, this will convert the names to valid variable names (Replaces with "_" for all invalid characters for variable names.
Suryakiran
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Fastest way might be use some code generation to generate an SQL statement to summarize by ID.
data symptoms;
input id symptom $32. ;
cards;
1 discharge
2 pain
3 sore
4 odor
4 warts
;
proc sort data=symptoms(keep=symptom) nodupkey out=symptomlist ;
by symptom;
run;
filename code temp;
data _null_;
set symptomlist end=eof;
file code ;
if _n_=1 then put
'proc sql noprint;'
/'create table want as select'
/' id'
;
put ',max(symptom=' symptom :$quote. ') as ' symptom ;
if eof then put
'from symptoms'
/'group by id'
/';'
/'quit;'
;
run;
%include code / source2;
proc print;
run;
Obs id discharge odor pain sore warts 1 1 1 0 0 0 0 2 2 0 0 1 0 0 3 3 0 0 0 1 0 4 4 0 1 0 0 1
137 +proc sql noprint; 138 +create table want as select 139 + id 140 +,max(symptom="discharge" ) as discharge 141 +,max(symptom="odor" ) as odor 142 +,max(symptom="pain" ) as pain 143 +,max(symptom="sore" ) as sore 144 +,max(symptom="warts" ) as warts 145 +from symptoms 146 +group by id 147 +; NOTE: SAS threaded sort was used. NOTE: Table WORK.WANT created, with 4 rows and 6 columns.
If your source data is NOT one observation per ID*SYMPTOM then you would need to change the logic of the test for whether a particular observation has an indication of that symptom.
Also if your symptom values are not valid SAS variable names then you will have trouble. You might need to generate unique names for the flag variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It is called design matrix.
data symptoms;
input id symptoms $ ;
dummy=1;
cards;
1 discharge
2 pain
3 sore
4 odor warts
;
run;
proc logistic data=symptoms outdesign=want(drop=dummy) outdesignonly;
class symptoms/param=glm;
model dummy=id symptoms/noint;
run;