BookmarkSubscribeRSS Feed
dustychair
Pyrite | Level 9

Hi all,

I have a question: I have a data set as seen below. I used retain statement to create a data set as below in the second data set. However, it did not work for my case. I will appreciate for any suggestion.

 

Thank you,

 

  have        
number item1 item2 item3 item4 id
1 1       xx1
2   0      
3     1    
4       1  
5 0       xx2
6   0      
7     1    
8       0  
9          

 

  want        
number item1 item2 item3 item4 id
1 1 0 1 1 xx1
2 0 0 1 0 xx2
8 REPLIES 8
FreelanceReinh
Jade | Level 19

Hi @dustychair,


@dustychair wrote:

I used retain statement to create a data set as below in the second data set. However, it did not work for my case.


The issue with the RETAIN statement is that you would need to apply it to new variables, not to those read by the SET statement. Variable _id in the code below is an example.

data have;
input item1-item4 id $;
cards;
1 . . . xx1
. 0 . . .
. . 1 . .
. . . 1 .
0 . . . xx2
. 0 . . .
. . 1 . .
. . . 0 .
. . . . .
;

data _tmp(drop=id rename=(_id=id)) / view=_tmp;
set have;
if id ne ' ' then _id=id;
retain _id;
run;

data want;
update _tmp(obs=0) _tmp;
by id;
run;
dustychair
Pyrite | Level 9
Thank you for your answer. It says "by variables are not properly sorted on data set WORK._TMP"

Thanks
FreelanceReinh
Jade | Level 19

Then apply the technique shown for variable ID to the other variables as well:

data have;
input item1-item4 id $;
cards;
1 . . . xx2
. 0 . . .
. . 1 . .
. . . 1 .
0 . . . xx1
. 0 . . .
. . 1 . .
. . . 0 .
. . . . .
;

data _tmp(drop=_:) / view=_tmp;
set have(rename=(item1-item4=_i1-_i4 id=_id));
array item[4];
array _i[4];
if _id ne ' ' then id=_id;
do _j=1 to dim(_i);
  if _i[_j] ne . then item[_j]=_i[_j];
end;
retain item id;
run;

data want;
set _tmp;
by id notsorted;
if last.id;
run;

Depending on the structure of your real data, you can create dataset WANT in a single data step, but I don't want to make assumptions about your real data.

Tom
Super User Tom
Super User

Then make a variable for grouping that is sorted.

data have ;
  infile cards truncover ;
  input item1-item4 id $;
cards;
1 . . . xx1
. 0 . .
. . 1 .
. . . 1
0 . . . xx2
. 0 . .
. . 1 .
. . . 0
. . . .
;

data group ;
  set have;
  group+ not missing(id);
run;

data want;
  update group(obs=0) group;
  by group;
run;

proc print;
run;

Results

Obs    item1    item2    item3    item4    id     group

 1       1        0        1        1      xx1      1
 2       0        0        1        0      xx2      2
FreelanceReinh
Jade | Level 19

I didn't suggest a similar solution because of the implied assumptions about the real data: It would fail if some of the IDs were repeated rather than missing (but could be modified to avoid this).

mkeintz
PROC Star

For each missing ID, you want to carry forward the previous non-missing id.  And you want to do the same for the ITEM variables.  I.e. you want the last non-missing value for each of these variables within an ID group.

 

But for the NUMBER variable, you want the first non-missing value for each ID group.

 

Editted note:  Did I misread the OP posting?  I thought the second row of the desired output had NUMBER=5,  (not NUMBER=2 as I see now).  If so, then no special treatment is needed for NUMBER, like the SET ... POINT= statement below.

 

If the ID column in dataset HAVE had no holes, and if HAVE were sorted by ID, then you could use a the UPDATE statement, with a SET .... POINT= statement:

 

data want (drop=_:);
  update have(obs=0) have (rename=(number=_number));
  by id;
  if first.id then number=_number;
run;

You don't have that situation, but you could create a dataset NEED with all the missing ID values populated from the most recent non-missing ID value.  Then you could apply the UPDATE technique to NEED:

 

data vneed (drop=_:) / view=vneed;
  set have (drop=number rename=(id=_test_id));
  if not missing(_test_id) then set have (keep=id number) point=_n_;
run;

data want;
  update vneed (obs=0) vneed;
  by id;
run;

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
dustychair
Pyrite | Level 9
Thank you for your answer. I am getting the same error. it says "by variables are not properly sorted on data set "

Thanks
mkeintz
PROC Star

@dustychair wrote:
Thank you for your answer. I am getting the same error. it says "by variables are not properly sorted on data set "

Thanks

Then sort the dataset NEED:

data vneed (drop=_:) / view=vneed;
  set have (rename=(id=_test_id));
  if not missing(_test_id) then set have (keep=id) point=_n_;
run;
proc sort data=vneed out=need_sorted;
  by id;
run;
data want;
  update need_sorted (obs=0) need_sorted;
  by id;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 1110 views
  • 1 like
  • 4 in conversation