BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
turcay
Lapis Lazuli | Level 10

Hello everyone,

I have a dataset similar to below. I need to take first three values all of the “Column” variables. I added the picture which I have and I also added the desired output. I tried to use Proc Sort then I use last.ID option but It didn't work. Does anybody have any idea ?

 

data have;
Length ID $ 10 Column1 8 Column2 8 Column3 8;
Infile datalines missover dlm=",";
Input ID Column1 Column2 Column3 ;
datalines;
1,5,.,.,
2,10,.,.,
3,15,.,.,
1,15,10,.,
2,15,20,.,
3,15,30,.,
1,15,30,15,
2,15,30,30,
3,15,30,45,
;
run;
/*It didn't work*/
Proc Sort data=have;
by Id;
run;
/*It didn't work*/
data want;
set have;
BY Id;
If last.Id THEN OUTPUT want;
run;

Have

Have.png

 

 

 

Want

 

Want.png

 

 

Thank  you.

 

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

 

data want;
set have(keep=id column1 firstobs=1);
set have(keep=id column2 firstobs=4);
set have(keep=id column3 firstobs=7);
run;
PG

View solution in original post

5 REPLIES 5
PGStats
Opal | Level 21

 

data want;
set have(keep=id column1 firstobs=1);
set have(keep=id column2 firstobs=4);
set have(keep=id column3 firstobs=7);
run;
PG
Tom
Super User Tom
Super User

To get what you want you need another variable. You pictures show that for COLUMN1 you want the values from the first group of rows, for column2 the sets of records from the second group of rows, etc. But there is no group number variable in your dataset. If you really don't have such a variable but your data is sorted such that each group always starts with ID=1 then you might create one by using a data step like this.

data grouped;
   set have;
   group+(id=1);
run;

Then you want to merge the values by the groups.

data want ;
   merge grouped(where=(group=1) drop=column2 column3)
          grouped(where=(group=2) drop=column1 column3)
          grouped(where=(group=3) drop=column1 column2)
  ;
  by id;
run;

 

Now if you just want to take the first three non-missing values for each column you could do something like this:

 

data want ;
  do i=1 to 3 ;
     if not eof1 then set have (keep=column1 where=(column1 ne .)) end=eof1;
     if not eof2 then set  have (keep=column2 where=(column2 ne .)) end=eof2;
     if not eof3 then set  have (keep=column3 where=(column3 ne .)) end=eof3;
    output;
    call missing(of column1-column3);
  end;
  stop;
run;

But that would really ignore the ID column, but it is not clear what the ID column means in this case anyway.

turcay
Lapis Lazuli | Level 10

Hello Tom,

 

Thank you for the detailed information. After I did some calculation on my data the table was created like that(my sample(have)) dataset.Then I needed to merge every columns for the first three rows. ID column is indispensable to me. Actually, on my real dataset there are 40 columns.Your first method(groups) seems to me more dynamic but I don't know how can I do this method for 40 columns. I also have to say that there are many columns on my dataset independently of Column variables and ID variable.

 

By the way, Thank you @PGStats.

turcay
Lapis Lazuli | Level 10

Hello again @Tom and @PGStats,

 

I would like to ask you If I have another variables except from "Colum" and "ID" how can I get the other variables on desired dataset. For example, I addded a sample dataset, I put two variables more on the dataset. I need to add all other the variables on my desired dataset Which you helped me to create previous post. I added my last sample dataset below and also last desired output. The significant point is it may be many "Diff" variables on my dataset and the other variables may not start with the same name.

 

data have;
Length ID $ 10 Column1 8 Column2 8 Column3 8 Diff1 $ 10 Diff2 8;
Infile datalines missover dlm=",";
Input ID Column1 Column2 Column3 Diff1 Diff2;
datalines;
2,5,.,.,C,100
1,10,.,.,A,50
3,15,.,.,Y,150
2,15,10,.,C,100
1,15,20,.,A,50
3,15,30,.,Y,150
2,15,30,15,C,100
1,15,30,30,A,50
3,15,30,45,Y,150
;
run;

Desired.png

turcay
Lapis Lazuli | Level 10

Hello again,

 

I would like to learn how can I get other variables different from "ID" and "Column" variables. If I use @PGStats method;

 

set have(keep=id column1 firstobs=1);
set have(keep=id column2 firstobs=4);
set have(keep=id column3 firstobs=7);

 

I can get the "ID" and "Column" variables but If I want to get the other variables may not start with the same name on dataset how can I exceute it. Here is my last sample dataset.

 

data have;
Length ID $ 10 Column1 8 Column2 8 Column3 8 Diff1 $ 10 Diff2 8 Alternate 8 OtherWise 8;
Infile datalines missover dlm=",";
Input ID Column1 Column2 Column3 Diff1 Diff2 Alternate OtherWise;
datalines;
2,5,.,.,C,100,11,13
1,10,.,.,A,50,12,14
3,15,.,.,Y,150,13,15
2,15,10,.,C,100,11,15
1,15,20,.,A,50,12,14
3,15,30,.,Y,150,13,15
2,15,30,15,C,100,11,13
1,15,30,30,A,50,12,14
3,15,30,45,Y,150,13,15
;
run;

Want.png

 

 

Thank you.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1174 views
  • 2 likes
  • 3 in conversation