BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I have a single dataset with merged data from four different datasets. Problem is, some of the merged data is similar and I need to figure out who all has data from which of the original four datasets. There are four identifying variables DataA, DataB, DataC and DataD. If an individual has data from DataA present in the merged file then DataA = 1 else DataA = 0 and so on for the remaining three variables. What I am trying to do is create a single variable with different values depending on the datasource i.e. if an individual has data from DataA and DataB then datasource = AB, if an individual has data from DataA, DataC and DataD then datasource = ACD. Basically, there are several possibly combinations that can be present, only one datasource, two, three or four - all of which combinations cannot be repeating. I was trying to write code manually but it is taking a long time and can have errors.

I tried to do a proc freq table to compare the four datasets, but I cannot seem to develop a table to see exactly what I would like to. Any help or advice is much appreciated! Thanks!
3 REPLIES 3
NickR
Quartz | Level 8
data _1 _2 _3 _4;
id = 10;
if id=10 then output _1 _3 _4;
id = 20;
if id=20 then output _1 _2;
id = 30;
if id=30 then output _2 _3 _4;
run;

data all;
length datasource $4;
merge _1(in=a) _2(in=b) _3(in=c) _4(in=d);
by id;
if a then dataA='A';
if b then dataB='B';
if c then dataC='C';
if d then dataD='D';
datasource=cats(dataA,dataB,dataC,dataD);
run;
Peter_C
Rhodochrosite | Level 12
merge a(in=ina) b(in=inb) ...... ;

fromD = put( ina*1000 + inb*100 +inc*10 +ind, z4. ) ;
well, that is the underlying info
You can write more processing to turn 0110 into BC, or just format it like :
proc format ; value fromD
1111 = ABCD
1110 = ABC
1101 = ABD
1100 = AB
1011 = ACD
1010 = AC
1001 = AD
1000 = A
0111 = BCD
0110 = BC
0101 = BD
0100 = B
0011 = CD
0010 = C
0001 = D
; if letters really help to make it clearer, then only at the surface is it different, so format the surface:
Message was edited by: Peter.C
MikeZdeb
Rhodochrosite | Level 12
hi ... using this ...

data _1 _2 _3 _4;
id = 10; output _1 _3 _4;
id = 20; output _1 _2;
id = 30; output _2 _3 _4;
run;

try this ...

data all;
length datasource $4;
merge _1(in=a) _2(in=b) _3(in=c) _4(in=d);
by id;
datasource = catt(ifc(a,'A',''),ifc(b,'B',''),ifc(c,'C',''),ifc(d,'D',''));
run;

and get ...

Obs datasource id
1 ACD 10
2 AB 20
3 BCD 30

Message was edited by: MikeZdeb Message was edited by: MikeZdeb

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1162 views
  • 0 likes
  • 4 in conversation