BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
SAS_nova
Fluorite | Level 6

Dataset looks like the following. There are 1000's of IDs and 100,000's of observations. 

'Wanted Value' is the desired output using the variable "Value". The number of observations for ID is not constant, normally between 5 and 30. There could be intermittent missing values in between which need to be repopulated plus or minus the date difference, depending upon the moving forward vs backward. Missing values are labelled as ".". Thanks for your help!!

 

 

IDDateValueDate DiffWanted Value
11/1/2000.-600-92-91-91
14/1/2000.91600-92-91-91
17/1/2000.91600-92-91
110/1/2000.92600-92
11/1/200160092600
14/1/200160090600
17/1/2001.91600+91
110/1/200173092730
21/1/2002.92550-92
24/1/200255090550
27/1/2002.91550+91
210/1/200260092600
21/1/2003.92600+92
24/1/2003.90600+92+90
27/1/2003.91600+92+90+91
210/1/2003.92600+92+90+91+92
21/1/2004.92600+92+90+91+92+92
24/1/2004.91600+92+90+91+92+92+91
1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
infile cards expandtabs truncover;
input ID Date :$20. Value DateDiff;
cards; 
1 1/1/2000 . . 
1 4/1/2000 . 91 
1 7/1/2000 . 91 
1 10/1/2000 . 92 
1 1/1/2001 600 92 
1 4/1/2001 600 90 
1 7/1/2001 . 91 
1 10/1/2001 730 92 
2 1/1/2002 . 92 
2 4/1/2002 550 90 
2 7/1/2002 . 91 
2 10/1/2002 600 92 
2 1/1/2003 . 92 
2 4/1/2003 . 90 
2 7/1/2003 . 91 
2 10/1/2003 . 92 
2 1/1/2004 . 92 
2 4/1/2004 . 91
;

data temp;
 set have;
 n+1;
 if missing(DateDiff) then DateDiff=0;
run;
data temp1;
do until(last.id);
 set temp;
 by id notsorted;
 if not missing(value) then want=value;
  else want=want+DateDiff;
  output;
end;
run;
proc sort data=temp1 out=temp2;
by descending n;
run;
data temp3;
do until(last.id);
 set temp2;
 by id notsorted;
 if not missing(want) then want2=want;
  else want2=want2-DateDiff;
  output;
end;
run;
proc sort data=temp3 out=want(drop=want n);
by n;
run;

View solution in original post

15 REPLIES 15
SAS_nova
Fluorite | Level 6

So the final number should be:

 

IDDateValueDate DiffWanted Value
11/1/2000.-326

and so on.

Tom
Super User Tom
Super User

Is that second column supposed to be a DATE value?  Which dates are those?  Is the second value supposed to mean April fools day ? or the fourth day of January?

 

Why are some of the cells using subtraction and some are using addition?

 

What is the meaning of the column header 'date diff'?  Is that a difference in dates?  So 92 means about 3 months?  Or is a difference in the VALUES.  The later makes more sense in terms of the equations you have the last column.

 

Why does one cell have just a hyphen in it? Is that supposed by a missing value?  Or some type of special missing that is different than the other missing values?

SAS_nova
Fluorite | Level 6

Is that second column supposed to be a DATE value?  Which dates are those?  Is the second value supposed to mean April fools day ? or the fourth day of January?

-- 1 date to 2 date is always about 3 months (April, 1st - Jan 1st and so on)

 

Why are some of the cells using subtraction and some are using addition?

-- if 'wanted value' back in time then subtraction is needed. Opposite for forward in time i.e. addition  

 

What is the meaning of the column header 'date diff'?  Is that a difference in dates? 

-- Yes: Date2 - Date1

So 92 means about 3 months? 

- Yes

Or is a difference in the VALUES.  The later makes more sense in terms of the equations you have the last column.

 

Why does one cell have just a hyphen in it? Is that supposed by a missing value?  Or some type of special missing that is different than the other missing values?

- that should be 0 or no values. Because of Date1 - (no previous date). 

 

 

PaigeMiller
Diamond | Level 26

Explain this part:

 

Why are some of the cells using subtraction and some are using addition?

-- if 'wanted value' back in time then subtraction is needed. Opposite for forward in time i.e. addition

 

How would we know if your wanted value is back in time or forward in time? Back in time or forward in time from what point in time?

 

--
Paige Miller
Quentin
Super User

@PaigeMiller wrote:

Explain this part:

 

Why are some of the cells using subtraction and some are using addition?

-- if 'wanted value' back in time then subtraction is needed. Opposite for forward in time i.e. addition

 

How would we know if your wanted value is back in time or forward in time? Back in time or forward in time from what point in time?

 


I think the reference point intended is, for each ID, the first record where VALUE is not null.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Lisa Mendez & Richann Watson present Get Tipsy with Debugging Tips for SAS® Code: The After Party on Wednesday Jul 16.
Register now at https://www.basug.org/events.
Quentin
Super User

@SAS_nova wrote:

 

Why does one cell have just a hyphen in it? Is that supposed by a missing value?  Or some type of special missing that is different than the other missing values?

- that should be 0 or no values. Because of Date1 - (no previous date). 


Since you are calculating stuff BY ID, shouldn't DateDIff be 0 (or null) for the first record of each ID? Meaning ID=2 Date=1/1/2002 should have DateDiff=0 not 92? Or alternatively, maybe DateDiff should have a value for the first record, because you know how many days were in that quarter, even if you don't have a date value in your data for the prior quarter.

 

But again, more big picture words about the purpose of this code would be helpful.  Also curious that the meaning of VALUE is.  I assume it's unit is days, since you are subtracting from and adding days to it.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Lisa Mendez & Richann Watson present Get Tipsy with Debugging Tips for SAS® Code: The After Party on Wednesday Jul 16.
Register now at https://www.basug.org/events.
PaigeMiller
Diamond | Level 26

You have to explain how "Wanted Value" is computed from the rest of the data.

 

We can't possibly tell you if a macro may be appropriate until we understand this (although it sure doesn't look like a macro is needed right now).

--
Paige Miller
ballardw
Super User

@SAS_nova wrote:

 

'Wanted Value' is the desired output using the variable "Value". You need to provide RULES for how Value is calculated. Examples I can program to exactly match your shown "need" but without a set of rules on how things are calculated the code is very unlikely to work for any other data.

 

"There could be intermittent missing values in between which need to be repopulated plus or minus the date difference, depending upon the moving forward vs backward." What is "repopulated"? Into what variable? Using what rules? What does "forward" mean? What does "backward" mean? How do we know when to apply forward and when backward?

 

It appears that "date diff" is the number of days from the previous date value, is that interpretation correct?

If so I am wondering what the "value" is if you are adding/subtracting days with that value.

Quentin
Super User

I've stared at this for a while, and I think I'm starting to see the pattern.

 

I think the data you HAVE is the table with first four columns.

 

The data you WANT is the same table with the 5th column, WantedValue, added.  Is that right?  And the values of WantedValues is the result of evaluating the expression you entered (e.g. 600-92-91-91=326, so you want the value 326 for the first row).

 

It looks to me like your WantedValue is basically an accumulator.  Where if VALUE is nonmissing, then WantedValue=VALUE.  Else if Value is missing then WantedValue=WantedValue (retained)+Date Diff.  Is that right?


The only tricky part is that you don't have an initial value for Wanted Value. You need to compute the initial WantedValue for each ID as the first non-missing VALUE for that ID minus the sum of all the DateDIff values before that record.  I think. : )

That said, can you describe in words what you are doing?  The dates are clearly the start of each quarter.  DateDiff is number of days in a quarter.  What is the meaning of VALUE?  If this is all just date arithmetic, there is probably a better way to get what you want.

Also, could you please post your HAVE data as code (i.e. a DATA step with cards statement)?

 

But to answer your question, no, it doesn't look like a macro problem.  There is likely a DATA step approach to solving this.

The Boston Area SAS Users Group is hosting free webinars!
Next up: Lisa Mendez & Richann Watson present Get Tipsy with Debugging Tips for SAS® Code: The After Party on Wednesday Jul 16.
Register now at https://www.basug.org/events.
HB
Barite | Level 11 HB
Barite | Level 11
I'm wondering why if datediff is in fact a calculated date difference why one would store that instead of calculating it when it is needed.
ballardw
Super User

@HB wrote:
I'm wondering why if datediff is in fact a calculated date difference why one would store that instead of calculating it when it is needed.

Yep, part of the next round of clarifications.

SAS_nova
Fluorite | Level 6
Yes, that can done using the calculated date difference
Kurt_Bremser
Super User

The proper way to present example data is a DATA step with DATALINES, see here:

data have;
input id $ date :yymmdd10. value date_diff;
format date yymmdd10.;
datalines;
1 2000-01-01 . .
1 2000-04-01 . 91
;

Variable types, attributes and raw contents are clear.

Dates in YMD order are unambiguous.

 

Please expand on this example. Make changes to the code where necessary (where my assumptions do not reflect your real situation).

Ksharp
Super User
data have;
infile cards expandtabs truncover;
input ID Date :$20. Value DateDiff;
cards; 
1 1/1/2000 . . 
1 4/1/2000 . 91 
1 7/1/2000 . 91 
1 10/1/2000 . 92 
1 1/1/2001 600 92 
1 4/1/2001 600 90 
1 7/1/2001 . 91 
1 10/1/2001 730 92 
2 1/1/2002 . 92 
2 4/1/2002 550 90 
2 7/1/2002 . 91 
2 10/1/2002 600 92 
2 1/1/2003 . 92 
2 4/1/2003 . 90 
2 7/1/2003 . 91 
2 10/1/2003 . 92 
2 1/1/2004 . 92 
2 4/1/2004 . 91
;

data temp;
 set have;
 n+1;
 if missing(DateDiff) then DateDiff=0;
run;
data temp1;
do until(last.id);
 set temp;
 by id notsorted;
 if not missing(value) then want=value;
  else want=want+DateDiff;
  output;
end;
run;
proc sort data=temp1 out=temp2;
by descending n;
run;
data temp3;
do until(last.id);
 set temp2;
 by id notsorted;
 if not missing(want) then want2=want;
  else want2=want2-DateDiff;
  output;
end;
run;
proc sort data=temp3 out=want(drop=want n);
by n;
run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 3045 views
  • 5 likes
  • 8 in conversation