BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
robertrao
Quartz | Level 8

hi,

in1 out1 is a pair likewise is in2 out2

when both the values of the pair are missing then i want to delete those records....

when a  single value is absent of the pair i want to keep those records intact

Thanks

HAVE
ID       cat      value

101     in1         30

101     out1        60

101      in2          .

101      out2       .

102      in1        20

102      out1     30

102     in2        69

102    out2     .

103    in1        20

103    out1     .

104   in1       10

104   out1    .

104   in2      .

104   out2   .

WANT

ID       cat      value

101     in1         30

101     out1        60

102      in1        20

102      out1     30

102     in2        69

102    out2     .

103    in1        20

103    out1    .

104   in1       10

104   out1    .

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

This version should work:

data want;

  n_missing=0;

  do _n_=1 to 2;

      set have;

      if value=. then n_missing + 1;

  end;

  do _n_=1 to 2;

       set have;

       if n_missing < 2 then output;

  end;

  drop n_missing;

run;

One of the keys here is that each SET statement operates independently of the other.  Each starts with the first observation, and reads the next observation whenever it executes.

The program does rely on the structure to the data being accurate.  It doesn't even look at CAT.  It just assumes that each pair of observations are a matching group that should either both remain or both be deleted.

Good luck.

View solution in original post

19 REPLIES 19
caveman529
Calcite | Level 5

Maybe you can do a lag of value based on id and then delete that observation if both are missing on a row?

Anotherdream
Quartz | Level 8

data answer1;

set yourdata;

order=input(substr(reverse(strip(cat)),1,1),7.);

run;

proc sql;

create table answer2 as

select sum(case when value is null then 1 else 0 end) as missing

,id

,order

from answer1

group by id

,order;

quit;

run;

proc sql;

create table FinalAnswer as

select a1.id

,a1.cat

,a1.value

from answer1 a1

left join answer12 a12 on a1.id=a12.id and a1.order=a12.order

where a12.missing~=2;

quit;

run;

Notice this only works if you variables comes in groups of 2 and have numerics at the end of their variables (aka your example). However this type of code can be made far more generic pretty easily.

Astounding
PROC Star

This version should work:

data want;

  n_missing=0;

  do _n_=1 to 2;

      set have;

      if value=. then n_missing + 1;

  end;

  do _n_=1 to 2;

       set have;

       if n_missing < 2 then output;

  end;

  drop n_missing;

run;

One of the keys here is that each SET statement operates independently of the other.  Each starts with the first observation, and reads the next observation whenever it executes.

The program does rely on the structure to the data being accurate.  It doesn't even look at CAT.  It just assumes that each pair of observations are a matching group that should either both remain or both be deleted.

Good luck.

robertrao
Quartz | Level 8

Hi,

Thanks for the reply.

I tried the datastep since i am a beginner and it seems to be like how I intend.

I dint understand your code fully..

Could you please explain to me...?????

I dont underastand why we are getting a n_missing of 2 when the value is missing??? etc

Thanks

Astounding
PROC Star

The top DO loop reads two observations (notice the loop has _N_=1 TO 2). 

Before that loop starts, N_MISSING=0.  Inside the loop, N_MISSING gets increased each time a missing value is found.  So when the top loop ends, N_MISSING=2 whenever both observations had a missing value.

At that point, the heavy lifting is over.  The bottom DO loop reads the same observations, and outputs them when fewer than 2 missing values were found.

robertrao
Quartz | Level 8

Hi,

How is that if i run the Upper DO  loop alone it outputs only the second record of the pair????

Thanks

Astounding
PROC Star

That result seems perfectly normal.  To execute a DATA step, SAS performs all the statements in the DATA step and then outputs whatever the results are.  So the top DO loop reads two observations, and then when the DATA statements are complete, the results contain the contents of the second observation.  That gets output.  Then the process repeats.  Execute the top DO loop again, and get the next two observations.  When that is complete, the results contain the contents of the fourth observation, so output that.

Good to see that you are trying test programs to get a sense of how SAS will handle different situations.

robertrao
Quartz | Level 8

That was quick!!!!!

So after the first loop we have only the last record of the pair and the corresponding n_missing value.A total of 7 records for our example

The seond Do loop takes the 14 records again fresh(ignoring what the first loop has ). and how does it get to know the n_missing value for the forst record of the pair????.....

Thanks a ton

Astounding
PROC Star

I think you have the order a little mixed up here.  When the first loop is over, you do not have 7 observations.  Eventually, by the end of the DATA step, you will have run through the first loop 7 times.  But the first loop executes once, reading the first two observations.  Then the second loop executes, reading the same two observations.  So the value of N_MISSING (based on the first two observations) just sits there and is available to the second loop.  After the second loop finishes, the DATA step continues.

Once again, the first loop reads two observations (observations 3 and 4).  Then the second loop reads the same observations.

It may help to examine a simpler DATA step:

data new;

set old;

name='Fred';

x=5;

run;

The SET statement executes many times here.  The first time, it reads the first observation from OLD.  For that first observation, NAME= executes, then X= executes.  Whatever is left gets output and becomes the first observation in NEW.  Then the DATA step continues.  The SET statement executes again, reading the second observation from OLD.  Then NAME= executes, X= executes, and the result gets output.  So the SET statement reads a single observation at a time, starting at the beginning of the incoming data set, and executes as many times as needed (yes, there are more details to that part).  If there are multiple SET statements (as in the original problem), each one operates independently of the other (each starts by reading the first observation, and continues wherever it left off).

Hope this helps.

robertrao
Quartz | Level 8

That was very detailed. ...

I understood the concept well after that explanation......One last Question.

IST DO LOOP

n_missing=1 ID=101 cat=in2 value=. _ERROR_=0 _N_=1

n_missing=2 ID=101 cat=out2 value=. _ERROR_=0 _N_=2  <<<-----how does SAS know that it has to put n_missing of this record (from top loop) to "both" the records of the bottom loop from the second SET stmnt??????

IIND DO LOOP

n_missing=2 ID=101 cat=in2 value=. _ERROR_=0 _N_=1

n_missing=2 ID=101 cat=out2 value=. _ERROR_=0 _N_=2

Thnks

Astounding
PROC Star

SAS has no choice.  It doesn't have access to any previous values of N_MISSING.  It only has access to the current value.  (OK, technically we could use a LAG function if needed, but that's not in play here.)  When the top DO loop finishes, there is only one final value for N_MISSING and that is the value that the second DO loop can use.

robertrao
Quartz | Level 8

Great. .Thanks for your patience....This is a great place to learn and share the knowledge ....Also when the second loop brings in the same 2 records.......which statement in the code specifically says to put the value of the N_MISSING from prior loop to both the new  records.......Automatically retained?????

Thanks

Astounding
PROC Star

Automatically retained?  Sort of, and OK to think of it that way.  Again, consider a simpler DATA step:

data new;

set old;

name='Fred';

if name='Fred' then x=5;

run;

How does SAS know whether NAME is "Fred" when it gets to the IF/THEN statement?  Whatever value was assigned by earlier statements justs sits there, and is automatically available.

Technically, RETAIN in SAS has slightly different meaning.  It refers to the process that occurs when one observation is over and the next one begins.

data new;

set old;

put 'X is now ' x;

name='Fred';

if name='Fred' then x=5;

* RETAIN X;

run;

Try this program, both with and without the RETAIN statement, and it will illustrate the difference.

robertrao
Quartz | Level 8

Thanks so much.I appreciate your time

Regards

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 1724 views
  • 11 likes
  • 8 in conversation