BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
NewUsrStat
Lapis Lazuli | Level 10

Hi guys, 

suppose to have the following: 

 

data DB;
  input ID :$20. Var1: $20. Value1;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  .
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
run;

Is there a way to add a number = 5 every time, for each ID Mydate1 is not missing as Mydate3 while Mydate 2 is missing?

 

Desired output:

 

data DB1;
  input ID :$20. Var1: $20. Value1;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  5
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
run;

 

Rule: for each ID if not missing (Mydate1) and not missing(Mydate3) and missing(Mydate2) then Mydate2 = 5.

Thank you in advance

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Your data steps won't work.  let's remove the extra variable from the input statement.

data DB;
  input ID :$20. Var :$32. Value ;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  .
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
;

This type of across variable checking is easier with a WIDE dataset instead of the TALL structure you have.  So if the values of the second column are valid SAS names just use PROC TRANSPOSE first.

proc transpose data=db out=wide(drop=_name_);
  by id;
  id var;
  var value;
run;

To get:

Tom_0-1738859787803.png

 

Now you can implement your comparison logic.  Which I think is something like this:

data want;
  set wide ;
  if not missing(mydate3) then mydate2=coalesce(mydate2,5);
run;

To get:

Tom_1-1738859812768.png

You could then re-transpose back to TALL layout if you want.

 

To do something similar directly with the TALL dataset just merge it back with the subset of observations that have non-missing values of MYDATE3.  Note you need to be careful about case here of values of VAR here.

data want;
  merge db 
        db(keep=id var value rename=(var=_var value=_value)
           where=(_var='Mydate3' and not missing(_value)))
  ;
  by id;
  if var='Mydate2' and not missing(_value) then
    value=coalesce(value,5)
  ;
  drop _var _value;
run;

Which produces:

Tom_2-1738860080394.png

 

But this will not work if the var='Mydate2' does not even appear in the dataset for some values of ID.   

Note that the TRANSPOSE method will work fine in those cases.  If no ID has Mydate2 then it will be added by the data step and initialized to missing for each ID.

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

It would be very helpful if you could give us the RULES that the program logic should follow, rather than having us deduce the rules from one brief example.

--
Paige Miller
NewUsrStat
Lapis Lazuli | Level 10
Just edited!
Tom
Super User Tom
Super User

Your data steps won't work.  let's remove the extra variable from the input statement.

data DB;
  input ID :$20. Var :$32. Value ;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  .
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
;

This type of across variable checking is easier with a WIDE dataset instead of the TALL structure you have.  So if the values of the second column are valid SAS names just use PROC TRANSPOSE first.

proc transpose data=db out=wide(drop=_name_);
  by id;
  id var;
  var value;
run;

To get:

Tom_0-1738859787803.png

 

Now you can implement your comparison logic.  Which I think is something like this:

data want;
  set wide ;
  if not missing(mydate3) then mydate2=coalesce(mydate2,5);
run;

To get:

Tom_1-1738859812768.png

You could then re-transpose back to TALL layout if you want.

 

To do something similar directly with the TALL dataset just merge it back with the subset of observations that have non-missing values of MYDATE3.  Note you need to be careful about case here of values of VAR here.

data want;
  merge db 
        db(keep=id var value rename=(var=_var value=_value)
           where=(_var='Mydate3' and not missing(_value)))
  ;
  by id;
  if var='Mydate2' and not missing(_value) then
    value=coalesce(value,5)
  ;
  drop _var _value;
run;

Which produces:

Tom_2-1738860080394.png

 

But this will not work if the var='Mydate2' does not even appear in the dataset for some values of ID.   

Note that the TRANSPOSE method will work fine in those cases.  If no ID has Mydate2 then it will be added by the data step and initialized to missing for each ID.

NewUsrStat
Lapis Lazuli | Level 10
Thank you very much Tom for your help as usual. But it does not work correctly because it is mandatory that mydate1 also should not be missed. In other words mydate2 is missing between TWO not missing values.
Tom
Super User Tom
Super User

@NewUsrStat wrote:
Thank you very much Tom for your help as usual. But it does not work correctly because it is mandatory that mydate1 also should not be missed. In other words mydate2 is missing between TWO not missing values.

So change the logic to also test mydate1.  Should not be hard.

ballardw
Super User

First, make sure that your example data set creates what you want.

This is reading the 0001 on the second line as the numeric value for Value2 and only has one output ID of 0001. The whole data set only has 4 output observations. Since you aren't providing values for Value2 either as missing or actual values SAS is reading the following line ID values. So fix that.

data DB;
  input ID :$20. Var1: $20. Value1 Value2;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  .
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
run;

Some missing details:

Are the values of the Mydate variables within an ID sorted? If so, how (may not impact the 1, 2, 3 versions but definitely can be a problem with 10, 11, 12 etc suffixes (default sort for those would mean an order of Mydate1, Mydate10, Mydate11, Mydate2, (etc).

What is to be done if there are 2 or more Mydate2?

What if there are multiple Mydate3 and some have missing Value and some don't?

 

Your rule, I think, needs to be more correctly stated as:

Rule: for each ID if not missing VALUE1 where  VAR1=Mydate1 and not missing VALUE1 where  VAR1=Mydate3 
and missing VALUE1 where VAR1=Mydate2 then where VAR1=Mydate2 then VALUE1 = 5.

Your rule as currently stated references variable names that do not exist :Mydate1, Mydate2 and Mydate3. Those are values of VAR1.

sas-innovate-white.png

Special offer for SAS Communities members

Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1228 views
  • 1 like
  • 4 in conversation