Solved: Add a value between two values of a variable

NewUsrStat · Posted 02-06-2025 10:48 AM

Hi guys,

suppose to have the following:

data DB;
  input ID :$20. Var1: $20. Value1;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  .
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
run;

Is there a way to add a number = 5 every time, for each ID Mydate1 is not missing as Mydate3 while Mydate 2 is missing?

Desired output:

data DB1;
  input ID :$20. Var1: $20. Value1;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  5
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
run;

Rule: for each ID if not missing (Mydate1) and not missing(Mydate3) and missing(Mydate2) then Mydate2 = 5.

Thank you in advance

Tom · Posted 02-06-2025 11:41 AM

Your data steps won't work. let's remove the extra variable from the input statement.

data DB;
  input ID :$20. Var :$32. Value ;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  .
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
;

This type of across variable checking is easier with a WIDE dataset instead of the TALL structure you have. So if the values of the second column are valid SAS names just use PROC TRANSPOSE first.

proc transpose data=db out=wide(drop=_name_);
  by id;
  id var;
  var value;
run;

To get:

Now you can implement your comparison logic. Which I think is something like this:

data want;
  set wide ;
  if not missing(mydate3) then mydate2=coalesce(mydate2,5);
run;

To get:

You could then re-transpose back to TALL layout if you want.

To do something similar directly with the TALL dataset just merge it back with the subset of observations that have non-missing values of MYDATE3. Note you need to be careful about case here of values of VAR here.

data want;
  merge db 
        db(keep=id var value rename=(var=_var value=_value)
           where=(_var='Mydate3' and not missing(_value)))
  ;
  by id;
  if var='Mydate2' and not missing(_value) then
    value=coalesce(value,5)
  ;
  drop _var _value;
run;

Which produces:

But this will not work if the var='Mydate2' does not even appear in the dataset for some values of ID.

Note that the TRANSPOSE method will work fine in those cases. If no ID has Mydate2 then it will be added by the data step and initialized to missing for each ID.

View solution in original post

PaigeMiller · Posted 02-06-2025 10:54 AM

It would be very helpful if you could give us the RULES that the program logic should follow, rather than having us deduce the rules from one brief example.

--
Paige Miller

NewUsrStat · Posted 02-06-2025 11:01 AM

Just edited!

Tom · Posted 02-06-2025 11:41 AM

Your data steps won't work. let's remove the extra variable from the input statement.

data DB;
  input ID :$20. Var :$32. Value ;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  .
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
;

This type of across variable checking is easier with a WIDE dataset instead of the TALL structure you have. So if the values of the second column are valid SAS names just use PROC TRANSPOSE first.

proc transpose data=db out=wide(drop=_name_);
  by id;
  id var;
  var value;
run;

To get:

Now you can implement your comparison logic. Which I think is something like this:

data want;
  set wide ;
  if not missing(mydate3) then mydate2=coalesce(mydate2,5);
run;

To get:

You could then re-transpose back to TALL layout if you want.

To do something similar directly with the TALL dataset just merge it back with the subset of observations that have non-missing values of MYDATE3. Note you need to be careful about case here of values of VAR here.

data want;
  merge db 
        db(keep=id var value rename=(var=_var value=_value)
           where=(_var='Mydate3' and not missing(_value)))
  ;
  by id;
  if var='Mydate2' and not missing(_value) then
    value=coalesce(value,5)
  ;
  drop _var _value;
run;

Which produces:

But this will not work if the var='Mydate2' does not even appear in the dataset for some values of ID.

Note that the TRANSPOSE method will work fine in those cases. If no ID has Mydate2 then it will be added by the data step and initialized to missing for each ID.

NewUsrStat · Posted 02-06-2025 12:28 PM

Thank you very much Tom for your help as usual. But it does not work correctly because it is mandatory that mydate1 also should not be missed. In other words mydate2 is missing between TWO not missing values.

Tom · Posted 02-06-2025 12:38 PM

@NewUsrStat wrote:
Thank you very much Tom for your help as usual. But it does not work correctly because it is mandatory that mydate1 also should not be missed. In other words mydate2 is missing between TWO not missing values.

So change the logic to also test mydate1. Should not be hard.

ballardw · Posted 02-06-2025 11:45 AM

First, make sure that your example data set creates what you want.

This is reading the 0001 on the second line as the numeric value for Value2 and only has one output ID of 0001. The whole data set only has 4 output observations. Since you aren't providing values for Value2 either as missing or actual values SAS is reading the following line ID values. So fix that.

data DB;
  input ID :$20. Var1: $20. Value1 Value2;
cards;
0001 Mydate1  2
0001 Mydate2  . 
0002 Mydate1  .
0002 Mydate2  4
0003 Mydate1  2
0003 Mydate2  .
0003 Mydate3  6
0003 Mydate4  .
0003 Mydate5  0
run;

Some missing details:

Are the values of the Mydate variables within an ID sorted? If so, how (may not impact the 1, 2, 3 versions but definitely can be a problem with 10, 11, 12 etc suffixes (default sort for those would mean an order of Mydate1, Mydate10, Mydate11, Mydate2, (etc).

What is to be done if there are 2 or more Mydate2?

What if there are multiple Mydate3 and some have missing Value and some don't?

Your rule, I think, needs to be more correctly stated as:

Rule: for each ID if not missing VALUE1 where  VAR1=Mydate1 and not missing VALUE1 where  VAR1=Mydate3 
and missing VALUE1 where  VAR1=Mydate2 then where VAR1=Mydate2 then VALUE1 = 5.

Your rule as currently stated references variable names that do not exist :Mydate1, Mydate2 and Mydate3. Those are values of VAR1.

Add a value between two values of a variable

Re: Add a value between two values of a variable

Re: Add a value between two values of a variable

Re: Add a value between two values of a variable

Re: Add a value between two values of a variable

Re: Add a value between two values of a variable

Re: Add a value between two values of a variable

Re: Add a value between two values of a variable

Registration is open