- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I've got a sample of data which has separate records for each day of a month for the same ID, containing a daily value. I'm trying to identify the number of days since the value was below a certain value as at the latest date.
data have;
infile datalines dsd truncover;
input ID $8. value 8. date $10.;
datalines;
1 1200 20211001
1 500 20211002
1 50 20211003
1 25 20211004
1 10 20211005
2 200 20211001
2 10 20211002
2 5 20211003
2 25 20211004
2 100 20211005
;;;
e.g. calculate the number of days where the value is less than 75 as at 20211005 (latest date), per ID. Calculating this as follows against the above code:
if value < 75 then OD = 1; else OD = 0;
the results would show as below. What I'm trying to achieve is count the number of days since the value was last below 75 at the latest date per ID, so the count for ID 1 would equal 3, and ID 2 would equal 0. Any assistance would be greatly appreciated!
*edit* - the count would reset to 0 if the value is 75 or greater.
1 1200 20211001 0
1 500 20211002 0
1 50 20211003 1
1 25 20211004 1
1 10 20211005 1
2 200 20211001 0
2 100 20211002 0
2 5 20211003 1
2 25 20211004 1
2 100 20211005 0
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile datalines delimiter = " " dsd truncover;
input ID :$1. value :4. date :yymmdd10.;
format date yymmdd10.;
datalines;
1 1200 20211001
1 500 20211002
1 50 20211003
1 25 20211004
1 10 20211005
1 20 20211006
1 500 20211007
1 88 20211008
1 25 20211009
1 36 20211010
1 42 20211011
2 200 20211001
2 10 20211002
2 5 20211003
2 25 20211004
2 100 20211005
;
data want;
set have;
by id;
if first.id or value ge 75 then count = 0;
if value lt 75 then count + 1;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I modified your DATALINES a bit since it wasn't formatted correctly.
data have;
infile datalines delimiter = " " dsd truncover;
input ID :$1. value :4. date :$10.;
datalines;
1 1200 20211001
1 500 20211002
1 50 20211003
1 25 20211004
1 10 20211005
2 200 20211001
2 10 20211002
2 5 20211003
2 25 20211004
2 100 20211005
;
data want;
set have;
by id;
retain od;
if value > 75 then od = 0;
if value < 75 then od + 1;
run;
ID value date od 1 1200 20211001 0 1 500 20211002 0 1 50 20211003 1 1 25 20211004 2 1 10 20211005 3 2 200 20211001 0 2 10 20211002 1 2 5 20211003 2 2 25 20211004 3 2 100 20211005 0
You can then use `drop` or `keep` and `if last.id then output` to just get those values and variables.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@ciaranhillery wrote:
Hi,
I've got a sample of data which has separate records for each day of a month for the same ID, containing a daily value. I'm trying to identify the number of days since the value was below a certain value as at the latest date.
data have; infile datalines dsd truncover; input ID $8. value 8. date $10.; datalines; 1 1200 20211001 1 500 20211002 1 50 20211003 1 25 20211004 1 10 20211005 2 200 20211001 2 10 20211002 2 5 20211003 2 25 20211004 2 100 20211005 ;;;
e.g. calculate the number of days where the value is less than 75 as at 20211005 (latest date), per ID. Calculating this as follows against the above code:
if value < 75 then OD = 1; else OD = 0;
the results would show as below. What I'm trying to achieve is count the number of days since the value was last below 75 at the latest date per ID, so the count for ID 1 would equal 3, and ID 2 would equal 0. Any assistance would be greatly appreciated!
*edit* - the count would reset to 0 if the value is 75 or greater.
1 1200 20211001 0
1 500 20211002 0
1 50 20211003 1
1 25 20211004 1
1 10 20211005 1
2 200 20211001 0
2 100 20211002 0
2 5 20211003 1
2 25 20211004 1
2 100 20211005 0
I'm not sure what role OD plays but is this the sort of count you are looking for?
data want; set have; by id; retain lowcount 0; if first.id then lowcount=0; if value < 75 then do; OD = 1; lowcount+1; end; else do; OD = 0; lowcount=0; end; run;
If you data is not sorted but only grouped by your ID you would need to use BY NOTSORTED ID;
The BY group lets you test if a record is the first or last of a group of values with the FIRST. or LAST. (note the dot there) automatic variables. These are 1/0 (true/false to SAS) values and can be used with IF as shown to reset or set things at the beginning/end of each group.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you both for your responses. Apologies, but the sample has been expanded to include up to 12 months worth of daily values per ID, where some IDs will have 12 months worth, and others won't (only 3 months worth for e.g.).
I'm looking to calculate the counts by month i.e. the count will reset after each month ends. I've calculated the month number using the intck function, so have values from 0 - 12. I need the count to reset for the first monthly value, which can be 0 or 1 depending on the value - so I'll now need it to be based on both the ID, and the month.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for posting data, please add the expected result for that dataset. This helps us to better understand what you actually need.
What do you expect with data like
datalines;
1 1200 20211001
1 500 20211002
1 50 20211003
1 25 20211004
1 10 20211005
1 20 20211006
1 500 20211007
1 88 20211008
1 25 20211009
1 36 20211010
1 42 20211011
;
as result?
The function month() can extract the month from a sas-date, right now you have only char-dates. If you are not 105% sure that there will be data for one year only, you should use substr(date, 1, 6) to get both month and year.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I've added a few records at the beginning to show values from a previous month to hopefully show what I'm after - while the count need to be reset at the beginning of each month, if the value of the first. record for the month is less than 75, the count will start as 1. If the value is 75 or greater, then the count will start as 0.
*edited with month end and beginning lines in red
datalines; 1 1200 20210928 0 1 500 20210929 0 1 50 20210930 1 1 60 20211001 1 1 500 20211002 0 1 50 20211003 1 1 25 20211004 2 1 10 20211005 3 1 20 20211006 4 1 500 20211007 0 1 88 20211008 0 1 25 20211009 1 1 36 20211010 2 1 42 20211011 3 1 200 20211030 0 1 250 20211101 0 1 50 20211102 1 ;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
data have;
infile datalines delimiter = " " dsd truncover;
input ID :$1. value :4. date :yymmdd10.;
format date yymmdd10.;
datalines;
1 1200 20211001
1 500 20211002
1 50 20211003
1 25 20211004
1 10 20211005
1 20 20211006
1 500 20211007
1 88 20211008
1 25 20211009
1 36 20211010
1 42 20211011
2 200 20211001
2 10 20211002
2 5 20211003
2 25 20211004
2 100 20211005
;
data want;
set have;
by id;
if first.id or value ge 75 then count = 0;
if value lt 75 then count + 1;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Kurt, this worked in appending what I need.
Thank you all for your responses on this, greatly appreciated!