BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Emma2021
Quartz | Level 8

I have id, day, and record variables. I would like to change record's values as below rules using ID and DAY:

1. If record is 10 or higher then do the following:

      if the same ID and DAY, if record value with 10 or more as first then use the second where record <10, see ID=4 example

      if the same ID and DAY and the as first and last (see ID=1 and DAY=4 and ID=2 and DAY=2) then just remove that row

     if last then use one preceding with less than 10, see example ID=4 and DAY=2

    if in the middle, then use an average of one good preceding and following, see example ID=1 and DAY=6 and ID=4 and DAY=4.

 

Thank you.

 

 

 

    

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

If you want to use tabs as delimiters, you need to tell the data step so:

data have;
infile datalines dlm="09"x dsd truncover;
  input id day record;
  cards;
1	1	1
1	1	10
1	1	10
1	2	3
1	3	3
1	3	4
1	4	15
1	5	10
1	5	3
1	6	2
1	6	57
1	6	3
2	2	12
3	1	2
3	1	4
3	1	6
4	1	12
4	1	12
4	1	14
4	1	1
4	2	3
4	2	4
4	2	67
4	4	4
4	4	45
4	4	53
4	4	2
;

Based on this, this code creates your want dataset:

proc sql;
create table avg as
  select
    id,
    day,
    avg(record) as ___record
  from have
  where record lt 10
  group by id, day
;
quit;

data want;
merge
  have
  have (
    rename=(record=__record)
    where=(__record lt 10)
  )
  avg
;
by id day;
retain _record;
if record ge 10
then do;
  if first.day
  then do;
    if __record ne .
    then record = __record;
    else delete;
  end;
  else if not last.day
  then record = ___record;
  else record = _record;
end;
_record = record;
drop _:;
run;

Note that I simplified your rule by calculating the average of a day instead of the exact average of the last preceding and first succeeding "ok" value, as this would call for more complicated code and might not be necessary. See if this fits you.

For the exact calculation, you will need to set a sequence number within a day so you can determine which "lt 10" observations to select as preceding and succeeding, and you may have to use advanced techniques like a quite complicated SQL or a hash iterator object in a data step. This is necessary because a simple one-obs look-ahead won't work in the case of several "ge 10" observations in series.

View solution in original post

6 REPLIES 6
Emma2021
Quartz | Level 8
Please use the excel file
Reeza
Super User

Imagine what happens if I downloaded an Excel file for each question I tried to answer on here.

There's actually a really good reason why we don't recommend this - there's no guarantee we're all using the same version of SAS. If we each use PROC IMPORT on an Excel file, which may or may not be clean enough to read easily, then we can end up with entirely different files and there's no guarantee that the answer will work for you. This leads to extra work.

You're also asking someone who's trying to help you to both download your data, import the data, ensure that's done correctly and then answer your question when there are easy ways provide that would allow us to skip the first three steps entirely. It does make a few extra steps for you, but makes it infinitely easier for us to help you.

Plus you'll get faster answers.

Instructions on how to provide sample data is here:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...



 

EDIT: and just checked, your file isn't formatted for an easy PROC IMPORT without specifying the range or starting area. 

Kurt_Bremser
Super User

Excel files cannot represent SAS datasets. They have no column attributes like types or formats. So please give us a SAS dataset. Post a data step so we can recreate your dataset with a simple copy/paste and submit. Help us to help you.

Emma2021
Quartz | Level 8
data have;
  input id day record;
  cards;
1	1	1
1	1	10
1	1	10
1	2	3
1	3	3
1	3	4
1	4	15
1	5	10
1	5	3
1	6	2
1	6	57
1	6	3
2	2	12
3	1	2
3	1	4
3	1	6
4	1	12
4	1	12
4	1	14
4	1	1
4	2	3
4	2	4
4	2	67
4	4	4
4	4	45
4	4	53
4	4	2
;


data wanted;
  input id day record;
  cards;
1	1	1
1	1	1
1	1	1
1	2	3
1	3	3
1	3	4
1	5	3
1	5	3
1	6	2
1	6	2.5
1	6	3
3	1	2
3	1	4
3	1	6
4	1	1
4	1	1
4	1	1
4	1	1
4	2	3
4	2	4
4	2	4
4	4	4
4	4	3
4	4	3
4	4	2
;
Kurt_Bremser
Super User

If you want to use tabs as delimiters, you need to tell the data step so:

data have;
infile datalines dlm="09"x dsd truncover;
  input id day record;
  cards;
1	1	1
1	1	10
1	1	10
1	2	3
1	3	3
1	3	4
1	4	15
1	5	10
1	5	3
1	6	2
1	6	57
1	6	3
2	2	12
3	1	2
3	1	4
3	1	6
4	1	12
4	1	12
4	1	14
4	1	1
4	2	3
4	2	4
4	2	67
4	4	4
4	4	45
4	4	53
4	4	2
;

Based on this, this code creates your want dataset:

proc sql;
create table avg as
  select
    id,
    day,
    avg(record) as ___record
  from have
  where record lt 10
  group by id, day
;
quit;

data want;
merge
  have
  have (
    rename=(record=__record)
    where=(__record lt 10)
  )
  avg
;
by id day;
retain _record;
if record ge 10
then do;
  if first.day
  then do;
    if __record ne .
    then record = __record;
    else delete;
  end;
  else if not last.day
  then record = ___record;
  else record = _record;
end;
_record = record;
drop _:;
run;

Note that I simplified your rule by calculating the average of a day instead of the exact average of the last preceding and first succeeding "ok" value, as this would call for more complicated code and might not be necessary. See if this fits you.

For the exact calculation, you will need to set a sequence number within a day so you can determine which "lt 10" observations to select as preceding and succeeding, and you may have to use advanced techniques like a quite complicated SQL or a hash iterator object in a data step. This is necessary because a simple one-obs look-ahead won't work in the case of several "ge 10" observations in series.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1558 views
  • 4 likes
  • 3 in conversation