BookmarkSubscribeRSS Feed
nans_p
Calcite | Level 5

Hi again,

I am trying to fill in the missing records to Y  if any record in the series is Y by subjid and count where coll is 1. In my dataset I have collapsed records, a new record generated is ex = 1.

My interest is if any record in variable X =  Y by count (Which is series) then the missing value should be Y. similarly for variable Y and Z.

I have highlighted the output I want in bold.

Data have subjid X Y Z Coll count ex  
  101 Y N Y 1 1    
  101       1 1 1  
  101 N N Y 1 1    
  101 Y Y N 1 2    
  101       1 2    
  101 Y N N 1 2    
  102 Y N N        
  102 N N N 1 1    
  102       1 1 1  
  102 Y N N 1 1    
  102 N N N 1 1    
                 
Data want subjid X Y Z Coll count ex  
  101 Y N Y 1 1    
  101 Y Y Y 1 1 1  
  101 N N Y 1 1    
  101 Y Y N 1 2    
  101 Y Y N 1 2 1  
  101 Y N N 1 2    
  102 Y N N        
  102 N N N 1 1    
  102 Y N N 1 1 1  
  102 Y N N 1 1    
  102 N N N 1 1    
                 
 
 
 

Thanks in advance.

8 REPLIES 8
Oligolas
Barite | Level 11

Hi,

I do not see why in obs=2 of your dataset want you get Y='Y' since there are only N in your group

The solution could be something like this:

(untested)

PROC SORT data=have; by subjid count descending X descending Y descending Z;RUN;
DATA want;
LENGTH x1 y1 z1 $1;
SET have;
BY subjid count;
RETAIN x1 y1 z1 '';
ARRAY val (3) x y z;
ARRAY lag (3) x1 y1 z1;
DO i=1 TO dim(val);
IF first.count then lag(i)='N';
IF val(i) eq 'Y' then lag(i)='Y';
IF coll eq 1 AND missing(val(i)) then val(i)=lag(i);
END;
RUN;
________________________

- Cheers -

sbxkoenk
SAS Super FREQ

Hello @nans_p ,

 

It took me a long time to figure out what you want exactly. Your explanation is not super-clear (probably due to having a variable Y and a value 'Y'). You are lucky it's a weekend day, otherwise I wouldn't have done the effort. 😉

 

Just like @Oligolas, one of the records in the dataset WANT is not as you specified.

 

I deliberately pulled it all apart in many different steps so that you can easily assess if this is what you want. If you confirm this is what you want then we can come up with more concise code (like the code provided by @Oligolas ). More concise code could come in handy (for performance reasons) if you have tens or hundreds of thousands of observations to deal with.

 

Here's my code:

/* one series is one combination of subjid and count? */
data have;
 input subjid X $ Y $ Z $ Coll count ex;
 informat X Y Z $1.;
 cards;
101	Y	N	Y	1	1	.
101	.   .   . 	1	1	1
101	N	N	Y	1	1	.
101	Y	Y	N	1	2	.
101	.   .   . 	1	2	.
101	Y	N	N	1	2	.
102	Y	N	N   .   .	.
102	N	N	N	1	1	.
102	.   .   . 	1	1	1
102	Y	N	N	1	1	.
102	N	N	N	1	1	.
;
run;

%MACRO LOOP_XYZ;
%DO i=1 %TO 3;
%IF       &i.=1 %THEN %LET LETTER=x;
%ELSE %IF &i.=2 %THEN %LET LETTER=y;
%ELSE %IF &i.=3 %THEN %LET LETTER=z;
%ELSE;
data missing_&LETTER._shouldbe_Y(keep=subjid count);
 set have;
 by subjid count;
 where &LETTER.='Y';
 where also Coll=1;
run;
PROC SORT data=missing_&LETTER._shouldbe_Y NODUPKEY;
 by subjid count;
run;
%END;
%MEND  LOOP_XYZ;
options mprint nosymbolgen;
%LOOP_XYZ

data want;
 merge have 
       missing_x_shouldbe_Y(in=in2)
	   missing_y_shouldbe_Y(in=in3)
	   missing_z_shouldbe_Y(in=in4);
 by subjid count;
 if X='' and Y='' and Z='' then ex=1;
 if in2 and x='' then x='Y'; else x='N';
 if in3 and y='' then y='Y'; else y='N';
 if in4 and z='' then z='Y'; else z='N';
run;
/* end of program */

Cheers,

Koen

sbxkoenk
SAS Super FREQ

The difference between my code and the code of @Oligolas (I just looked at it) is that @Oligolascode only looks at the past records while I look at all records of a series (subjid + count) but maybe that is not necessary at all?

Please clarify!

Koen

nans_p
Calcite | Level 5

Hi Koen,

Thanks for looking at my query. I am sorry I couldn't be clear to explain.

What I am looking for is if any of the values in the variable X , Y , Z  is 'Y'  ( 'Y' basically stands for Yes & N stands for 'No' ), by subjid and count (count is what I call series) then the missing value should be set to 'Y' (Yes).

Thanks.

Tom
Super User Tom
Super User

It sounds like what you are trying to say is you want to replace all of the missing values with the same value for the BY group independent of which variable has the missing value?

 

This would be much easier if you coded the variables as 0/1 instead of 'N'/'Y'.  Then you caould use the two different MAX() function (SQL aggregate and SAS maximum functions) to find the maximum value over all three variable.

select ...,coalesce(X,max(max(x,y,z))) as X....
Oligolas
Barite | Level 11

Hi,

since I'm resorting the records by placing the filled ones ahead, I do consider all records in a group. The assumption I take is take x,y and z will be missing simultaneously.

________________________

- Cheers -

Tom
Super User Tom
Super User

So you want to replace missing values of X with the MAX() value of X for the group? Simple enough with PROC SQL.

First we need to clean up your example results so they match the rules.  Let's add a ROW variable so we can recreate the order.

data have ;
  infile cards dsd dlm='|' truncover;
  row+1;
  input subjid $ (X Y Z) (:$1.) Coll count ex ;
cards;
101|Y|N|Y|1|1| 
101| | | |1|1|1
101|N|N|Y|1|1| 
101|Y|Y|N|1|2| 
101| | | |1|2| 
101|Y|N|N|1|2| 
102|Y|N|N| | | 
102|N|N|N|1|1| 
102| | | |1|1|1
102|Y|N|N|1|1| 
102|N|N|N|1|1| 
;

data expect ;
  infile cards dsd dlm='|' truncover;
  row+1;
  input subjid $ (X Y Z) (:$1.) Coll count ex ;
cards;
101|Y|N|Y|1|1|  
101|Y|N|Y|1|1|1 | Change Y to 'N' 
101|N|N|Y|1|1|  
101|Y|Y|N|1|2|  
101|Y|Y|N|1|2|  |  Remvoed EX=1 
101|Y|N|N|1|2|  
102|Y|N|N| | |  
102|N|N|N|1|1|  
102|Y|N|N|1|1|1 
102|Y|N|N|1|1|  
102|N|N|N|1|1|  
;

Now use PROC SQL and then use PROC COMPARE to test the results

proc sql ;
 create table want as 
 select row
      , subjid 
      , count
      , Coll
      , ex
      , coalesce(X,max(X)) as X
      , coalesce(Y,max(Y)) as Y
      , coalesce(Z,max(Z)) as Z
 from have
 group by subjid,count
 order by row
 ;
quit;

proc print data=want;
run;

proc compare data=want compare=expect;
  id subjid count row;
run;
Obs    row    subjid    count    Coll    ex    X    Y    Z

  1      1     101        1        1      .    Y    N    Y
  2      2     101        1        1      1    Y    N    Y
  3      3     101        1        1      .    N    N    Y
  4      4     101        2        1      .    Y    Y    N
  5      5     101        2        1      .    Y    Y    N
  6      6     101        2        1      .    Y    N    N
  7      7     102        .        .      .    Y    N    N
  8      8     102        1        1      .    N    N    N
  9      9     102        1        1      1    Y    N    N
 10     10     102        1        1      .    Y    N    N
 11     11     102        1        1      .    N    N    N

SAS 9.4 on WINDOWS                                                                                  10:28 Wednesday, April 7, 2021 113

The COMPARE Procedure
Comparison of WORK.WANT with WORK.EXPECT
(Method=EXACT)

Data Set Summary

Dataset               Created          Modified  NVar    NObs

WORK.WANT    10APR21:14:52:58  10APR21:14:52:58     8      11
WORK.EXPECT  10APR21:14:52:58  10APR21:14:52:58     8      11


Variables Summary

Number of Variables in Common: 8.
Number of ID Variables: 3.


Observation Summary

Observation      Base  Compare  ID

First Obs           1        1  subjid=101 count=1 row=1
Last  Obs          11       11  subjid=102 count=1 row=11

Number of Observations in Common: 11.
Total Number of Observations Read from WORK.WANT: 11.
Total Number of Observations Read from WORK.EXPECT: 11.

Number of Observations with Some Compared Variables Unequal: 0.
Number of Observations with All Compared Variables Equal: 11.

NOTE: No unequal values were found. All values compared are exactly equal.

Patrick
Opal | Level 21

Here a data step approach using a hash lookup table.

data want;
  if _n_=1 then
    do;
      dcl hash h1(dataset:'have(where=(whichc("Y",x,y,z)>0))');
      h1.defineKey('subjid','count');
      h1.defineData('subjid','count');
      h1.defineDone();
    end;
  set have;
  if h1.check()=0 then 
    do;
      if missing(x) then x='Y';
      if missing(y) then y='Y';
      if missing(z) then z='Y';
    end;
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 986 views
  • 2 likes
  • 5 in conversation