DATA Step, Macro, Functions and more

Creating "dummy variable" from four different variables

Reply
N/A
Posts: 0

Creating "dummy variable" from four different variables

I have a single dataset with merged data from four different datasets. Problem is, some of the merged data is similar and I need to figure out who all has data from which of the original four datasets. There are four identifying variables DataA, DataB, DataC and DataD. If an individual has data from DataA present in the merged file then DataA = 1 else DataA = 0 and so on for the remaining three variables. What I am trying to do is create a single variable with different values depending on the datasource i.e. if an individual has data from DataA and DataB then datasource = AB, if an individual has data from DataA, DataC and DataD then datasource = ACD. Basically, there are several possibly combinations that can be present, only one datasource, two, three or four - all of which combinations cannot be repeating. I was trying to write code manually but it is taking a long time and can have errors.

I tried to do a proc freq table to compare the four datasets, but I cannot seem to develop a table to see exactly what I would like to. Any help or advice is much appreciated! Thanks!
Frequent Contributor
Posts: 81

Re: Creating "dummy variable" from four different variables

data _1 _2 _3 _4;
id = 10;
if id=10 then output _1 _3 _4;
id = 20;
if id=20 then output _1 _2;
id = 30;
if id=30 then output _2 _3 _4;
run;

data all;
length datasource $4;
merge _1(in=a) _2(in=b) _3(in=c) _4(in=d);
by id;
if a then dataA='A';
if b then dataB='B';
if c then dataC='C';
if d then dataD='D';
datasource=cats(dataA,dataB,dataC,dataD);
run;
Valued Guide
Posts: 2,174

Re: Creating "dummy variable" from four different variables

merge a(in=ina) b(in=inb) ...... ;

fromD = put( ina*1000 + inb*100 +inc*10 +ind, z4. ) ;
well, that is the underlying info
You can write more processing to turn 0110 into BC, or just format it like :
proc format ; value fromD
1111 = ABCD
1110 = ABC
1101 = ABD
1100 = AB
1011 = ACD
1010 = AC
1001 = AD
1000 = A
0111 = BCD
0110 = BC
0101 = BD
0100 = B
0011 = CD
0010 = C
0001 = D
; if letters really help to make it clearer, then only at the surface is it different, so format the surface:
Message was edited by: Peter.C
Valued Guide
Posts: 765

Re: Creating "dummy variable" from four different variables

hi ... using this ...

data _1 _2 _3 _4;
id = 10; output _1 _3 _4;
id = 20; output _1 _2;
id = 30; output _2 _3 _4;
run;

try this ...

data all;
length datasource $4;
merge _1(in=a) _2(in=b) _3(in=c) _4(in=d);
by id;
datasource = catt(ifc(a,'A',''),ifc(b,'B',''),ifc(c,'C',''),ifc(d,'D',''));
run;

and get ...

Obs datasource id
1 ACD 10
2 AB 20
3 BCD 30

Message was edited by: MikeZdeb Message was edited by: MikeZdeb
Ask a Question
Discussion stats
  • 3 replies
  • 149 views
  • 0 likes
  • 4 in conversation