I have id, day, and record variables. I would like to change record's values as below rules using ID and DAY:
1. If record is 10 or higher then do the following:
if the same ID and DAY, if record value with 10 or more as first then use the second where record <10, see ID=4 example
if the same ID and DAY and the as first and last (see ID=1 and DAY=4 and ID=2 and DAY=2) then just remove that row
if last then use one preceding with less than 10, see example ID=4 and DAY=2
if in the middle, then use an average of one good preceding and following, see example ID=1 and DAY=6 and ID=4 and DAY=4.
Thank you.
If you want to use tabs as delimiters, you need to tell the data step so:
data have;
infile datalines dlm="09"x dsd truncover;
input id day record;
cards;
1 1 1
1 1 10
1 1 10
1 2 3
1 3 3
1 3 4
1 4 15
1 5 10
1 5 3
1 6 2
1 6 57
1 6 3
2 2 12
3 1 2
3 1 4
3 1 6
4 1 12
4 1 12
4 1 14
4 1 1
4 2 3
4 2 4
4 2 67
4 4 4
4 4 45
4 4 53
4 4 2
;
Based on this, this code creates your want dataset:
proc sql;
create table avg as
select
id,
day,
avg(record) as ___record
from have
where record lt 10
group by id, day
;
quit;
data want;
merge
have
have (
rename=(record=__record)
where=(__record lt 10)
)
avg
;
by id day;
retain _record;
if record ge 10
then do;
if first.day
then do;
if __record ne .
then record = __record;
else delete;
end;
else if not last.day
then record = ___record;
else record = _record;
end;
_record = record;
drop _:;
run;
Note that I simplified your rule by calculating the average of a day instead of the exact average of the last preceding and first succeeding "ok" value, as this would call for more complicated code and might not be necessary. See if this fits you.
For the exact calculation, you will need to set a sequence number within a day so you can determine which "lt 10" observations to select as preceding and succeeding, and you may have to use advanced techniques like a quite complicated SQL or a hash iterator object in a data step. This is necessary because a simple one-obs look-ahead won't work in the case of several "ge 10" observations in series.
Please post usable data (data step with datalines). And the expected result from this example.
Imagine what happens if I downloaded an Excel file for each question I tried to answer on here.
There's actually a really good reason why we don't recommend this - there's no guarantee we're all using the same version of SAS. If we each use PROC IMPORT on an Excel file, which may or may not be clean enough to read easily, then we can end up with entirely different files and there's no guarantee that the answer will work for you. This leads to extra work.
You're also asking someone who's trying to help you to both download your data, import the data, ensure that's done correctly and then answer your question when there are easy ways provide that would allow us to skip the first three steps entirely. It does make a few extra steps for you, but makes it infinitely easier for us to help you.
Plus you'll get faster answers.
Instructions on how to provide sample data is here:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
EDIT: and just checked, your file isn't formatted for an easy PROC IMPORT without specifying the range or starting area.
Excel files cannot represent SAS datasets. They have no column attributes like types or formats. So please give us a SAS dataset. Post a data step so we can recreate your dataset with a simple copy/paste and submit. Help us to help you.
data have;
input id day record;
cards;
1 1 1
1 1 10
1 1 10
1 2 3
1 3 3
1 3 4
1 4 15
1 5 10
1 5 3
1 6 2
1 6 57
1 6 3
2 2 12
3 1 2
3 1 4
3 1 6
4 1 12
4 1 12
4 1 14
4 1 1
4 2 3
4 2 4
4 2 67
4 4 4
4 4 45
4 4 53
4 4 2
;
data wanted;
input id day record;
cards;
1 1 1
1 1 1
1 1 1
1 2 3
1 3 3
1 3 4
1 5 3
1 5 3
1 6 2
1 6 2.5
1 6 3
3 1 2
3 1 4
3 1 6
4 1 1
4 1 1
4 1 1
4 1 1
4 2 3
4 2 4
4 2 4
4 4 4
4 4 3
4 4 3
4 4 2
;
If you want to use tabs as delimiters, you need to tell the data step so:
data have;
infile datalines dlm="09"x dsd truncover;
input id day record;
cards;
1 1 1
1 1 10
1 1 10
1 2 3
1 3 3
1 3 4
1 4 15
1 5 10
1 5 3
1 6 2
1 6 57
1 6 3
2 2 12
3 1 2
3 1 4
3 1 6
4 1 12
4 1 12
4 1 14
4 1 1
4 2 3
4 2 4
4 2 67
4 4 4
4 4 45
4 4 53
4 4 2
;
Based on this, this code creates your want dataset:
proc sql;
create table avg as
select
id,
day,
avg(record) as ___record
from have
where record lt 10
group by id, day
;
quit;
data want;
merge
have
have (
rename=(record=__record)
where=(__record lt 10)
)
avg
;
by id day;
retain _record;
if record ge 10
then do;
if first.day
then do;
if __record ne .
then record = __record;
else delete;
end;
else if not last.day
then record = ___record;
else record = _record;
end;
_record = record;
drop _:;
run;
Note that I simplified your rule by calculating the average of a day instead of the exact average of the last preceding and first succeeding "ok" value, as this would call for more complicated code and might not be necessary. See if this fits you.
For the exact calculation, you will need to set a sequence number within a day so you can determine which "lt 10" observations to select as preceding and succeeding, and you may have to use advanced techniques like a quite complicated SQL or a hash iterator object in a data step. This is necessary because a simple one-obs look-ahead won't work in the case of several "ge 10" observations in series.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.