hi,
in1 out1 is a pair likewise is in2 out2
when both the values of the pair are missing then i want to delete those records....
when a single value is absent of the pair i want to keep those records intact
Thanks
HAVE
ID cat value
101 in1 30
101 out1 60
101 in2 .
101 out2 .
102 in1 20
102 out1 30
102 in2 69
102 out2 .
103 in1 20
103 out1 .
104 in1 10
104 out1 .
104 in2 .
104 out2 .
WANT
ID cat value
101 in1 30
101 out1 60
102 in1 20
102 out1 30
102 in2 69
102 out2 .
103 in1 20
103 out1 .
104 in1 10
104 out1 .
This version should work:
data want;
n_missing=0;
do _n_=1 to 2;
set have;
if value=. then n_missing + 1;
end;
do _n_=1 to 2;
set have;
if n_missing < 2 then output;
end;
drop n_missing;
run;
One of the keys here is that each SET statement operates independently of the other. Each starts with the first observation, and reads the next observation whenever it executes.
The program does rely on the structure to the data being accurate. It doesn't even look at CAT. It just assumes that each pair of observations are a matching group that should either both remain or both be deleted.
Good luck.
Maybe you can do a lag of value based on id and then delete that observation if both are missing on a row?
data answer1;
set yourdata;
order=input(substr(reverse(strip(cat)),1,1),7.);
run;
proc sql;
create table answer2 as
select sum(case when value is null then 1 else 0 end) as missing
,id
,order
from answer1
group by id
,order;
quit;
run;
proc sql;
create table FinalAnswer as
select a1.id
,a1.cat
,a1.value
from answer1 a1
left join answer12 a12 on a1.id=a12.id and a1.order=a12.order
where a12.missing~=2;
quit;
run;
Notice this only works if you variables comes in groups of 2 and have numerics at the end of their variables (aka your example). However this type of code can be made far more generic pretty easily.
This version should work:
data want;
n_missing=0;
do _n_=1 to 2;
set have;
if value=. then n_missing + 1;
end;
do _n_=1 to 2;
set have;
if n_missing < 2 then output;
end;
drop n_missing;
run;
One of the keys here is that each SET statement operates independently of the other. Each starts with the first observation, and reads the next observation whenever it executes.
The program does rely on the structure to the data being accurate. It doesn't even look at CAT. It just assumes that each pair of observations are a matching group that should either both remain or both be deleted.
Good luck.
Hi,
Thanks for the reply.
I tried the datastep since i am a beginner and it seems to be like how I intend.
I dint understand your code fully..
Could you please explain to me...?????
I dont underastand why we are getting a n_missing of 2 when the value is missing??? etc
Thanks
The top DO loop reads two observations (notice the loop has _N_=1 TO 2).
Before that loop starts, N_MISSING=0. Inside the loop, N_MISSING gets increased each time a missing value is found. So when the top loop ends, N_MISSING=2 whenever both observations had a missing value.
At that point, the heavy lifting is over. The bottom DO loop reads the same observations, and outputs them when fewer than 2 missing values were found.
Hi,
How is that if i run the Upper DO loop alone it outputs only the second record of the pair????
Thanks
That result seems perfectly normal. To execute a DATA step, SAS performs all the statements in the DATA step and then outputs whatever the results are. So the top DO loop reads two observations, and then when the DATA statements are complete, the results contain the contents of the second observation. That gets output. Then the process repeats. Execute the top DO loop again, and get the next two observations. When that is complete, the results contain the contents of the fourth observation, so output that.
Good to see that you are trying test programs to get a sense of how SAS will handle different situations.
That was quick!!!!!
So after the first loop we have only the last record of the pair and the corresponding n_missing value.A total of 7 records for our example
The seond Do loop takes the 14 records again fresh(ignoring what the first loop has ). and how does it get to know the n_missing value for the forst record of the pair????.....
Thanks a ton
I think you have the order a little mixed up here. When the first loop is over, you do not have 7 observations. Eventually, by the end of the DATA step, you will have run through the first loop 7 times. But the first loop executes once, reading the first two observations. Then the second loop executes, reading the same two observations. So the value of N_MISSING (based on the first two observations) just sits there and is available to the second loop. After the second loop finishes, the DATA step continues.
Once again, the first loop reads two observations (observations 3 and 4). Then the second loop reads the same observations.
It may help to examine a simpler DATA step:
data new;
set old;
name='Fred';
x=5;
run;
The SET statement executes many times here. The first time, it reads the first observation from OLD. For that first observation, NAME= executes, then X= executes. Whatever is left gets output and becomes the first observation in NEW. Then the DATA step continues. The SET statement executes again, reading the second observation from OLD. Then NAME= executes, X= executes, and the result gets output. So the SET statement reads a single observation at a time, starting at the beginning of the incoming data set, and executes as many times as needed (yes, there are more details to that part). If there are multiple SET statements (as in the original problem), each one operates independently of the other (each starts by reading the first observation, and continues wherever it left off).
Hope this helps.
That was very detailed. ...
I understood the concept well after that explanation......One last Question.
IST DO LOOP
n_missing=1 ID=101 cat=in2 value=. _ERROR_=0 _N_=1
n_missing=2 ID=101 cat=out2 value=. _ERROR_=0 _N_=2 <<<-----how does SAS know that it has to put n_missing of this record (from top loop) to "both" the records of the bottom loop from the second SET stmnt??????
IIND DO LOOP
n_missing=2 ID=101 cat=in2 value=. _ERROR_=0 _N_=1
n_missing=2 ID=101 cat=out2 value=. _ERROR_=0 _N_=2
Thnks
SAS has no choice. It doesn't have access to any previous values of N_MISSING. It only has access to the current value. (OK, technically we could use a LAG function if needed, but that's not in play here.) When the top DO loop finishes, there is only one final value for N_MISSING and that is the value that the second DO loop can use.
Great. .Thanks for your patience....This is a great place to learn and share the knowledge ....Also when the second loop brings in the same 2 records.......which statement in the code specifically says to put the value of the N_MISSING from prior loop to both the new records.......Automatically retained?????
Thanks
Automatically retained? Sort of, and OK to think of it that way. Again, consider a simpler DATA step:
data new;
set old;
name='Fred';
if name='Fred' then x=5;
run;
How does SAS know whether NAME is "Fred" when it gets to the IF/THEN statement? Whatever value was assigned by earlier statements justs sits there, and is automatically available.
Technically, RETAIN in SAS has slightly different meaning. It refers to the process that occurs when one observation is over and the next one begins.
data new;
set old;
put 'X is now ' x;
name='Fred';
if name='Fred' then x=5;
* RETAIN X;
run;
Try this program, both with and without the RETAIN statement, and it will illustrate the difference.
Thanks so much.I appreciate your time
Regards
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.