Solved: Re: How to - Merge Desired Rows

turcay · Posted 11-29-2015 03:59 PM

Hello everyone,

I have a dataset similar to below. I need to take first three values all of the “Column” variables. I added the picture which I have and I also added the desired output. I tried to use Proc Sort then I use last.ID option but It didn't work. Does anybody have any idea ?

data have;
Length ID $ 10 Column1 8 Column2 8 Column3 8;
Infile datalines missover dlm=",";
Input ID Column1 Column2 Column3 ;
datalines;
1,5,.,.,
2,10,.,.,
3,15,.,.,
1,15,10,.,
2,15,20,.,
3,15,30,.,
1,15,30,15,
2,15,30,30,
3,15,30,45,
;
run;
/*It didn't work*/
Proc Sort data=have;
by Id;
run;
/*It didn't work*/
data want;
set have;
BY Id;
If last.Id THEN OUTPUT want;
run;

Have

Want

Thank you.

PGStats · Posted 11-29-2015 05:43 PM

data want;
set have(keep=id column1 firstobs=1);
set have(keep=id column2 firstobs=4);
set have(keep=id column3 firstobs=7);
run;

PG

View solution in original post

PGStats · Posted 11-29-2015 05:43 PM

data want;
set have(keep=id column1 firstobs=1);
set have(keep=id column2 firstobs=4);
set have(keep=id column3 firstobs=7);
run;

PG

Tom · Posted 11-29-2015 05:47 PM

To get what you want you need another variable. You pictures show that for COLUMN1 you want the values from the first group of rows, for column2 the sets of records from the second group of rows, etc. But there is no group number variable in your dataset. If you really don't have such a variable but your data is sorted such that each group always starts with ID=1 then you might create one by using a data step like this.

data grouped;
   set have;
   group+(id=1);
run;

Then you want to merge the values by the groups.

data want ;
   merge grouped(where=(group=1) drop=column2 column3)
          grouped(where=(group=2) drop=column1 column3)
          grouped(where=(group=3) drop=column1 column2)
  ;
  by id;
run;

Now if you just want to take the first three non-missing values for each column you could do something like this:

data want ;
  do i=1 to 3 ;
     if not eof1 then set have (keep=column1 where=(column1 ne .)) end=eof1;
     if not eof2 then set  have (keep=column2 where=(column2 ne .)) end=eof2;
     if not eof3 then set  have (keep=column3 where=(column3 ne .)) end=eof3;
    output;
    call missing(of column1-column3);
  end;
  stop;
run;

But that would really ignore the ID column, but it is not clear what the ID column means in this case anyway.

turcay · Posted 11-29-2015 06:02 PM

Hello Tom,

Thank you for the detailed information. After I did some calculation on my data the table was created like that(my sample(have)) dataset.Then I needed to merge every columns for the first three rows. ID column is indispensable to me. Actually, on my real dataset there are 40 columns.Your first method(groups) seems to me more dynamic but I don't know how can I do this method for 40 columns. I also have to say that there are many columns on my dataset independently of Column variables and ID variable.

By the way, Thank you @PGStats.

turcay · Posted 11-30-2015 09:20 AM

Hello again @Tom and @PGStats,

I would like to ask you If I have another variables except from "Colum" and "ID" how can I get the other variables on desired dataset. For example, I addded a sample dataset, I put two variables more on the dataset. I need to add all other the variables on my desired dataset Which you helped me to create previous post. I added my last sample dataset below and also last desired output. The significant point is it may be many "Diff" variables on my dataset and the other variables may not start with the same name.

data have;
Length ID $ 10 Column1 8 Column2 8 Column3 8 Diff1 $ 10 Diff2 8;
Infile datalines missover dlm=",";
Input ID Column1 Column2 Column3 Diff1 Diff2;
datalines;
2,5,.,.,C,100
1,10,.,.,A,50
3,15,.,.,Y,150
2,15,10,.,C,100
1,15,20,.,A,50
3,15,30,.,Y,150
2,15,30,15,C,100
1,15,30,30,A,50
3,15,30,45,Y,150
;
run;

turcay · Posted 12-01-2015 05:10 AM

Hello again,

I would like to learn how can I get other variables different from "ID" and "Column" variables. If I use @PGStats method;

set have(keep=id column1 firstobs=1);
set have(keep=id column2 firstobs=4);
set have(keep=id column3 firstobs=7);

I can get the "ID" and "Column" variables but If I want to get the other variables may not start with the same name on dataset how can I exceute it. Here is my last sample dataset.

data have;
Length ID $ 10 Column1 8 Column2 8 Column3 8 Diff1 $ 10 Diff2 8 Alternate 8 OtherWise 8;
Infile datalines missover dlm=",";
Input ID Column1 Column2 Column3 Diff1 Diff2 Alternate OtherWise;
datalines;
2,5,.,.,C,100,11,13
1,10,.,.,A,50,12,14
3,15,.,.,Y,150,13,15
2,15,10,.,C,100,11,15
1,15,20,.,A,50,12,14
3,15,30,.,Y,150,13,15
2,15,30,15,C,100,11,13
1,15,30,30,A,50,12,14
3,15,30,45,Y,150,13,15
;
run;

Thank you.

How to - Merge Desired Rows

Re: How to - Merge Desired Rows

Re: How to - Merge Desired Rows

Re: How to - Merge Desired Rows

Re: How to - Merge Desired Rows

Re: How to - Merge Desired Rows

Re: How to - Merge Desired Rows

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away