BookmarkSubscribeRSS Feed
robertrao
Quartz | Level 8

Please help me build the code with the following instructions for the code below:

Variable X can take values of A B or Missing.

When A and B are present for the same patient I want to keep both the observations(pt no 1 below)

when A and missing are present for the same patient I want to consider only the observation containing A .(pt no 2 below)

when missing and missing are present for the same patient I want to consider only one of those missing observations .(pt no3 below)

when missing  and A are present for the same patient I want to consider only the observation containing A .(pt no 4 below)

when both have a value of A then I want to consider only one observation  from those two(pt no 5 below)

In the last case I want to consider only one observation containing A

variable X(charecter)        patient No

A                                           1

B                                           1

A                                           2

Missing                                  2

Missing                                  3

Missing                                3

                                         

Missing                                4

A                                           4

A                                             5

A                                             5

A                                              6

A                                              6

Missing                                    6

Thanks

17 REPLIES 17
Ksharp
Super User

OK. How about:

data have;
input x $ no;
cards;
A                                           1
B                                           1
A                                           2
.                                  2
.                                  3
.                                3 
.                                4
A                                           4
A                                             5
A                                             5
A                                              6
A                                              6
.                                    6
;
run;
proc sql;
create table want as
select distinct * from have where x is not missing 
union 
select distinct * from have group by no having sum(missing(x)) eq count(no)
 order by no;
quit;

Ksharp

Haikuo
Onyx | Level 15

How about join:

proc sql noprint;

  create table want as

  select * from

    (select distinct pno from have) a

      left join

      (select distinct * from have where x is not missing)b

      on

      a.pno=b.pno;

quit;

Haikuo

art297
Opal | Level 21

One way to do it using data step code might be:

proc sort data=have;

  by no descending x;

run;

data want;

  set have;

  by no descending x;

  if first.no then do;

    counter=0;

    mcounter=0;

  end;

  counter+1;

  if missing(x) then do;

    mcounter+1;

    if last.no and counter eq mcounter then output;

  end;

  else if last.x then output;

run;

robertrao
Quartz | Level 8


The code works perfectly fine.

Thanks for the effort.

I have a couple of questions though.

when I ran the code till mcounter+1 it is as shown below.

Please correct me if my interpretation is wrong.

Also explain what is the meaning of mcounter

X    no      counter      mcounter

B     1           1                0

A      1           2                0

A       2          1                0

          2          2                1------------------missing X so mcounter is 1

          3          1                1------------------missing X so mcounter shd have been 1(current value from above) +1=2

          3           2                2 -----------------missing X so mcounter shd have been 2(current value from above)+1=3

A        4           1                0      

art297
Opal | Level 21

Not sure exactly what you ran.  The code sets the mcounter values as you described, not what you mentioned as current values.  Since x was sorted in descending order, missing values will come last.  The purpose of having both counter and mcounter was so that the code could identify where an id (no) had only missing values.  In such a case, it could both identify and output the record.

mcounter, for me, just represented missing_counter, i.e., the number of missing values.

Haikuo
Onyx | Level 15

Here is another data step using array(). It is not as robust as Art's approach, as you need to preset the max quantity of members one id can contain.

data have;

input (X        pNo) (:$);

cards;

A                                           1

B                                           1

A                                           2

.                                  2

.                                  3

.                              3

.                                4

A                                           4

A                                             5

A                                             5

A                                              6

A                                              6

.                                    6

;

data want;  array t(1:10) $1. _temporary_;

_c=0;

do _n_=1 by 1 until (last.pno);

  set have;

   by pno notsorted;

    if x not in t then do;

        output;

        t(_n_)=x;

        _c+1;

      end;

    if last.pno and _c=0 then output;

end;

call missing (of t(*));

drop _c;

run;

Haikuo

Linlin
Lapis Lazuli | Level 10

another way:

data have;

input x $ no;

cards;

A                                           1

B                                           1

A                                           2

.                                  2

.                                  3

.                                3

.                                4

A                                           4

A                                             5

A                                             5

A                                              6

A                                              6

.                                    6

;

proc sort data=have(where=(^missing(x))) out=have1 nodupkey;

by no x;

proc sort data=have(where=(missing(x))) nodupkey dupout=have2 ;

by no ;

data want;

  set have1  have2;

  by no;

run;

proc print;run;

robertrao
Quartz | Level 8

Linlin thnaks for your efforts too...

What is special about dupout????

i understood till:

all missings are considered and duplicates of missing are removed by no. I should have got have 2 like shown below:

.     2

.      3

.      4

.      6

With your dupout code I am only getting

.     3

Thanks

art297
Opal | Level 21

Dupout can be useful, but not (I don't think) for what you are trying to do.  Dupout captures ALL of the duplicates EXCEPT for the initial one.  In the case of your current problem, it doesn't do what you want.  E.g., the following would capture all kinds of records that you don't want to capture:

data have;

input x $ no;

cards;

A                                           1

B                                           1

A                                           2

.                                  2

.                                  2

.                                  3

.                                  3

.                                3

.                                4

A                                           4

A                                             5

A                                             5

A                                              6

A                                              6

.                                    6

;

proc sort data=have(where=(^missing(x))) out=have1 nodupkey;

by no x;

proc sort data=have(where=(missing(x))) nodupkey dupout=have2 ;

by no ;

data want;

  set have1  have2;

  by no;

run;

Linlin
Lapis Lazuli | Level 10

Art,

My code works for the sample data OP provided.:smileysilly:

robertrao
Quartz | Level 8

Hi Arthur,

Linlins code is also giving the right answer

She is creating a dupout dataset with no duplicates(have2) and setting it finally to have1;

to remind you again when we have x doubly missing we shd take it once.sheould not eliminate completely

Would you agree to that

art297
Opal | Level 21

Yes, it gives the right answer for the sample you provided but, if one id has one record with A, and two records with missing, I understood that you would only want the one record with A.

The proc sort approach would give you one record with A and one record with missing.

Thus the answer depends upon what you really want.

Haikuo
Onyx | Level 15

Inspired by Linlin's proc sort and Using the strategy I suggested in SQL approach, we can also use the following:

proc sort data=have(keep=no) out=have1 nodupkey;

by no;

proc sort data=have(where=(^missing(x))) nodupkey out=have2 ;

by no x;

data want;

  merge have1  have2;

  by no;

run;

Haikuo

robertrao
Quartz | Level 8

Hi Arthur,(listed is the partial output of your code when run till mcounter+1)

X        no    counter      mcounter

            3         1                1

             3        2                2

A         7        1                 0

           7         2                1

           7          3                2

if last.no and counter eq mcounter then output;

end;

else if last.x then output;

for pt no7 : for last.no (7    3    2 )   counter is ne mcounter so that condition is false.

now it has to output last.x

In the output for your code I get ( 7    1     0)

Should it not have to be         (    7    3       2)

Could you justify please

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 17 replies
  • 1870 views
  • 1 like
  • 5 in conversation