Hi guys,
suppose to have the following:
data DB;
input ID :$20. Var1: $20. Value1;
cards;
0001 Mydate1 2
0001 Mydate2 .
0002 Mydate1 .
0002 Mydate2 4
0003 Mydate1 2
0003 Mydate2 .
0003 Mydate3 6
0003 Mydate4 .
0003 Mydate5 0
run;
Is there a way to add a number = 5 every time, for each ID Mydate1 is not missing as Mydate3 while Mydate 2 is missing?
Desired output:
data DB1;
input ID :$20. Var1: $20. Value1;
cards;
0001 Mydate1 2
0001 Mydate2 .
0002 Mydate1 .
0002 Mydate2 4
0003 Mydate1 2
0003 Mydate2 5
0003 Mydate3 6
0003 Mydate4 .
0003 Mydate5 0
run;
Rule: for each ID if not missing (Mydate1) and not missing(Mydate3) and missing(Mydate2) then Mydate2 = 5.
Thank you in advance
Your data steps won't work. let's remove the extra variable from the input statement.
data DB;
input ID :$20. Var :$32. Value ;
cards;
0001 Mydate1 2
0001 Mydate2 .
0002 Mydate1 .
0002 Mydate2 4
0003 Mydate1 2
0003 Mydate2 .
0003 Mydate3 6
0003 Mydate4 .
0003 Mydate5 0
;
This type of across variable checking is easier with a WIDE dataset instead of the TALL structure you have. So if the values of the second column are valid SAS names just use PROC TRANSPOSE first.
proc transpose data=db out=wide(drop=_name_);
by id;
id var;
var value;
run;
To get:
Now you can implement your comparison logic. Which I think is something like this:
data want;
set wide ;
if not missing(mydate3) then mydate2=coalesce(mydate2,5);
run;
To get:
You could then re-transpose back to TALL layout if you want.
To do something similar directly with the TALL dataset just merge it back with the subset of observations that have non-missing values of MYDATE3. Note you need to be careful about case here of values of VAR here.
data want;
merge db
db(keep=id var value rename=(var=_var value=_value)
where=(_var='Mydate3' and not missing(_value)))
;
by id;
if var='Mydate2' and not missing(_value) then
value=coalesce(value,5)
;
drop _var _value;
run;
Which produces:
But this will not work if the var='Mydate2' does not even appear in the dataset for some values of ID.
Note that the TRANSPOSE method will work fine in those cases. If no ID has Mydate2 then it will be added by the data step and initialized to missing for each ID.
It would be very helpful if you could give us the RULES that the program logic should follow, rather than having us deduce the rules from one brief example.
Your data steps won't work. let's remove the extra variable from the input statement.
data DB;
input ID :$20. Var :$32. Value ;
cards;
0001 Mydate1 2
0001 Mydate2 .
0002 Mydate1 .
0002 Mydate2 4
0003 Mydate1 2
0003 Mydate2 .
0003 Mydate3 6
0003 Mydate4 .
0003 Mydate5 0
;
This type of across variable checking is easier with a WIDE dataset instead of the TALL structure you have. So if the values of the second column are valid SAS names just use PROC TRANSPOSE first.
proc transpose data=db out=wide(drop=_name_);
by id;
id var;
var value;
run;
To get:
Now you can implement your comparison logic. Which I think is something like this:
data want;
set wide ;
if not missing(mydate3) then mydate2=coalesce(mydate2,5);
run;
To get:
You could then re-transpose back to TALL layout if you want.
To do something similar directly with the TALL dataset just merge it back with the subset of observations that have non-missing values of MYDATE3. Note you need to be careful about case here of values of VAR here.
data want;
merge db
db(keep=id var value rename=(var=_var value=_value)
where=(_var='Mydate3' and not missing(_value)))
;
by id;
if var='Mydate2' and not missing(_value) then
value=coalesce(value,5)
;
drop _var _value;
run;
Which produces:
But this will not work if the var='Mydate2' does not even appear in the dataset for some values of ID.
Note that the TRANSPOSE method will work fine in those cases. If no ID has Mydate2 then it will be added by the data step and initialized to missing for each ID.
@NewUsrStat wrote:
Thank you very much Tom for your help as usual. But it does not work correctly because it is mandatory that mydate1 also should not be missed. In other words mydate2 is missing between TWO not missing values.
So change the logic to also test mydate1. Should not be hard.
First, make sure that your example data set creates what you want.
This is reading the 0001 on the second line as the numeric value for Value2 and only has one output ID of 0001. The whole data set only has 4 output observations. Since you aren't providing values for Value2 either as missing or actual values SAS is reading the following line ID values. So fix that.
data DB; input ID :$20. Var1: $20. Value1 Value2; cards; 0001 Mydate1 2 0001 Mydate2 . 0002 Mydate1 . 0002 Mydate2 4 0003 Mydate1 2 0003 Mydate2 . 0003 Mydate3 6 0003 Mydate4 . 0003 Mydate5 0 run;
Some missing details:
Are the values of the Mydate variables within an ID sorted? If so, how (may not impact the 1, 2, 3 versions but definitely can be a problem with 10, 11, 12 etc suffixes (default sort for those would mean an order of Mydate1, Mydate10, Mydate11, Mydate2, (etc).
What is to be done if there are 2 or more Mydate2?
What if there are multiple Mydate3 and some have missing Value and some don't?
Your rule, I think, needs to be more correctly stated as:
Rule: for each ID if not missing VALUE1 where VAR1=Mydate1 and not missing VALUE1 where VAR1=Mydate3
and missing VALUE1 where VAR1=Mydate2 then where VAR1=Mydate2 then VALUE1 = 5.
Your rule as currently stated references variable names that do not exist :Mydate1, Mydate2 and Mydate3. Those are values of VAR1.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.