Data Merge/Matching: Multiple matched obs into single destination ob

Reply
Occasional Contributor
Posts: 13

Data Merge/Matching: Multiple matched obs into single destination ob

#BeginnerQuestion

Hi everyone!

I have a question which is beyond my simple beginner SAS skills right now about match-merging two datasets into a new dataset, where multple matches are resolved down into a single ob in the new dataset.  In its most simple form, let's say I have myAlphaData which is basically the letters a to h:

Data myAlphaData;

input letter$;

datalines;

a

b

c

d

e

f

g

h

;

and I also have myAlphaNums:

Data myAlphaNums;

input letter$ number;

datalines;

a 100

b 20

c 33

d 444

h 88

a 1

a 101

a 1111

b 222

d 40

;

Ulimately I'm trying to come up with a dataset that looks like this (these rich text tables aren't working in my browser so I'll just have to text it out):

obs     letter      number1     number2     number3

1     a      100     1     101     1111

2     b     20     222

3     c     33

4     d     444     40

5     e

6     f

7     g

8     h     88

It appears I'll have to be dynamically creating variables, which is currently beyond my grasp right now and I'd love to learn how!

Joe _

Respected Advisor
Posts: 3,799

Re: Data Merge/Matching: Multiple matched obs into single destination ob

To dynamically create the enumerated variables number1-numberN PROC TRANSPOSE is a good choice.  It is about the only way to do that in a truly dynamic program.  Everything else requires some bit of figuring how many first.

Data alpha;
   input letter$ @@;
  
datalines;
a b c d e f g h
;;;;
   run;

Data Nums;
   input letter$ number @@;
   datalines;
a 100 b 20  c 33 d 444 h 88
a 1   a 101 a 1111 b 222 d 40
;;;;
   run;
proc sort;
  
by letter;
   run;
data a;
   merge alpha nums;
   by letter;
   run;
proc transpose out=b(drop=_name_) prefix=number;
   by letter;
   var number;
   run;
proc print;
  
run;
Occasional Contributor
Posts: 13

Re: Data Merge/Matching: Multiple matched obs into single destination ob

Thank you!  This is brilliant... so simple, beautiful and elegant!

I have an annexe of a question to this.  What do I do when I wish to 'compare' the dataset alpha with an addition dataset?  That is, let's say I've also got:

Data edible;

     input letter$ food$;

     datalines;

b banana     c chicken     f fries     g gateau     c carrot     c coleslaw     g grapes

;;;;

And, again I need to make my data look like this:

obs     letter     number1     number2     number3     food1     food2     food3

1     a     100     1     101     1111

2     b     20     222     .     banana

3     c     33     .     .     chicken     carrot     coleslaw

4     d     444     40

5     e    

6     f     .     .     .     fries

7     g     .     .     .     gateau     grapes

8     h     88

I'm inclined to just run through the steps you outlined again for a second pass, but with the new data, ie something like:

data c;

  merge b edible;

  by letter;

  run;

proc transpose out=d (drop=_name_) prefix=food;

  by letter;

  var food;

  run;

But I suspect there's something deeply flawed in my approach, and that you're going to tell me there's a way of doing it all together rather than having to go via the intermediate a and b datasets?

Thanks again for your help!

Joe

Occasional Contributor
Posts: 13

Re: Data Merge/Matching: Multiple matched obs into single destination ob

OK I've just tested my theory and I'm losing all the number values, not to mention only 'a' is getting anything out of the edibles dataset....

*still playing with it*

Trusted Advisor
Posts: 1,137

Re: Data Merge/Matching: Multiple matched obs into single destination ob

Hi,

i used arrays to get the output. Thought it will help you. please check and let me know if you have any questions

proc sort data=myAlphaNums;

  by letter;

run;

proc sort data=myAlphaData;

  by letter;

run;

/*array to transpose the data*/

data array;

  set myAlphaNums;

  by letter;

  retain num1-num4 count;

  if first.letter then count=1;

  else count+1;

  array mis(*) num1-num4;

  if first.letter then do i = 1 to dim(mis);

  mis(i)=.;

  end;

  mis(count)=number;

  if last.letter;

  drop count number i;

run;

/*merge to get the missing records*/

data final;

  merge array myAlphaData;

  by letter;

run;

Thanks,

Jagadish

Thanks,
Jag
Occasional Contributor
Posts: 13

Re: Data Merge/Matching: Multiple matched obs into single destination ob

Posted in reply to Jagadishkatam

Oh this is great as well!  Thanks Jagadish.

However, I've made one minor adjustment.  Given the array works with a guestimate that it needs 4 elements, before the array is instantiated/declared, I've wrapped the entire block within:

if count le 4 then do;

     array mis(*) num1-num4;

     if first.letter then do i = 1 to dim(mis);

     mis(i)=.;

     end;

     mis(count)=number;

end;

I know I could've just put it on the mis(count)=number statement at the end, but I thought this way may have been more efficient.  What do you think?

Joe

Super User
Super User
Posts: 7,039

Re: Data Merge/Matching: Multiple matched obs into single destination ob

Use CALL MISSING and eliminate the code for looping.

if first.letter then call missing(of num(*));

Trusted Advisor
Posts: 1,137

Re: Data Merge/Matching: Multiple matched obs into single destination ob

Hi Joe,

I checked your code and it worked, however if you see the output i got there are repetitive values for the same letter. for example check letter "a" and you have 4 records, but you need only the final record. so just another line of code, before the end statement as "if last.letter;" and it will give you the right output.

                   Obs    letter    number    num1    num2    num3    num4    count

                     1      a         100      100       .       .       .      1

                     2      a           1      100       1       .       .      2

                     3      a         101      100       1     101       .      3

                     4      a        1111      100       1     101    1111      4

                     5      b          20       20       .       .       .      1

                     6      b         222       20     222       .       .      2

                     7      c          33       33       .       .       .      1

                     8      d         444      444       .       .       .      1

                     9      d          40      444      40       .       .      2

                    10      e           .        .       .       .       .      .

                    11      f           .        .       .       .       .      .

                    12      g           .        .       .       .       .      .

                    13      h          88       88       .       .       .      1

Thanks,

Jagadish

Thanks,
Jag
Occasional Contributor
Posts: 13

Re: Data Merge/Matching: Multiple matched obs into single destination ob

Posted in reply to Jagadishkatam

Hi Jagadish,

Are you talking about the final bit at the end that you mentioned previously?

if last.letter;

  drop count number i;

??

Yes I've included that, however I was only 'paraphrasing' this particular snippet of the data step so I didn't have anything before or afterwards (including the drop statement) -- but thank you.

Also, I've incorporated Tom's suggestion of using CALL MISSING to 'blank out' the variables instead of looping through them.

Thanks again guys!

Joe

Respected Advisor
Posts: 3,799

Re: Data Merge/Matching: Multiple matched obs into single destination ob

In this data step... you need double at;

Data edible;

     input letter$ food$ @@;

     datalines;

b banana     c chicken     f fries     g gateau     c carrot     c coleslaw     g grapes

;;;;

Ask a Question
Discussion stats
  • 9 replies
  • 446 views
  • 8 likes
  • 4 in conversation