Solved: Re: Replace values

Emma2021 · Posted 01-15-2022 02:05 PM

I have id, day, and record variables. I would like to change record's values as below rules using ID and DAY:

1. If record is 10 or higher then do the following:

if the same ID and DAY, if record value with 10 or more as first then use the second where record <10, see ID=4 example

if the same ID and DAY and the as first and last (see ID=1 and DAY=4 and ID=2 and DAY=2) then just remove that row

if last then use one preceding with less than 10, see example ID=4 and DAY=2

if in the middle, then use an average of one good preceding and following, see example ID=1 and DAY=6 and ID=4 and DAY=4.

Thank you.

Kurt_Bremser · Posted 01-16-2022 06:30 AM

If you want to use tabs as delimiters, you need to tell the data step so:

data have;
infile datalines dlm="09"x dsd truncover;
  input id day record;
  cards;
1	1	1
1	1	10
1	1	10
1	2	3
1	3	3
1	3	4
1	4	15
1	5	10
1	5	3
1	6	2
1	6	57
1	6	3
2	2	12
3	1	2
3	1	4
3	1	6
4	1	12
4	1	12
4	1	14
4	1	1
4	2	3
4	2	4
4	2	67
4	4	4
4	4	45
4	4	53
4	4	2
;

Based on this, this code creates your want dataset:

proc sql;
create table avg as
  select
    id,
    day,
    avg(record) as ___record
  from have
  where record lt 10
  group by id, day
;
quit;

data want;
merge
  have
  have (
    rename=(record=__record)
    where=(__record lt 10)
  )
  avg
;
by id day;
retain _record;
if record ge 10
then do;
  if first.day
  then do;
    if __record ne .
    then record = __record;
    else delete;
  end;
  else if not last.day
  then record = ___record;
  else record = _record;
end;
_record = record;
drop _:;
run;

Note that I simplified your rule by calculating the average of a day instead of the exact average of the last preceding and first succeeding "ok" value, as this would call for more complicated code and might not be necessary. See if this fits you.

For the exact calculation, you will need to set a sequence number within a day so you can determine which "lt 10" observations to select as preceding and succeeding, and you may have to use advanced techniques like a quite complicated SQL or a hash iterator object in a data step. This is necessary because a simple one-obs look-ahead won't work in the case of several "ge 10" observations in series.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 01-15-2022 02:27 PM

Please post usable data (data step with datalines). And the expected result from this example.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Emma2021 · Posted 01-15-2022 02:53 PM

Please use the excel file

Reeza · Posted 01-15-2022 04:15 PM

Imagine what happens if I downloaded an Excel file for each question I tried to answer on here.

There's actually a really good reason why we don't recommend this - there's no guarantee we're all using the same version of SAS. If we each use PROC IMPORT on an Excel file, which may or may not be clean enough to read easily, then we can end up with entirely different files and there's no guarantee that the answer will work for you. This leads to extra work.

You're also asking someone who's trying to help you to both download your data, import the data, ensure that's done correctly and then answer your question when there are easy ways provide that would allow us to skip the first three steps entirely. It does make a few extra steps for you, but makes it infinitely easier for us to help you.

Plus you'll get faster answers.

Instructions on how to provide sample data is here:
https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

EDIT: and just checked, your file isn't formatted for an easy PROC IMPORT without specifying the range or starting area.

Kurt_Bremser · Posted 01-15-2022 05:27 PM

Excel files cannot represent SAS datasets. They have no column attributes like types or formats. So please give us a SAS dataset. Post a data step so we can recreate your dataset with a simple copy/paste and submit. Help us to help you.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Emma2021 · Posted 01-15-2022 09:53 PM

data have;
  input id day record;
  cards;
1	1	1
1	1	10
1	1	10
1	2	3
1	3	3
1	3	4
1	4	15
1	5	10
1	5	3
1	6	2
1	6	57
1	6	3
2	2	12
3	1	2
3	1	4
3	1	6
4	1	12
4	1	12
4	1	14
4	1	1
4	2	3
4	2	4
4	2	67
4	4	4
4	4	45
4	4	53
4	4	2
;


data wanted;
  input id day record;
  cards;
1	1	1
1	1	1
1	1	1
1	2	3
1	3	3
1	3	4
1	5	3
1	5	3
1	6	2
1	6	2.5
1	6	3
3	1	2
3	1	4
3	1	6
4	1	1
4	1	1
4	1	1
4	1	1
4	2	3
4	2	4
4	2	4
4	4	4
4	4	3
4	4	3
4	4	2
;

Kurt_Bremser · Posted 01-16-2022 06:30 AM

If you want to use tabs as delimiters, you need to tell the data step so:

data have;
infile datalines dlm="09"x dsd truncover;
  input id day record;
  cards;
1	1	1
1	1	10
1	1	10
1	2	3
1	3	3
1	3	4
1	4	15
1	5	10
1	5	3
1	6	2
1	6	57
1	6	3
2	2	12
3	1	2
3	1	4
3	1	6
4	1	12
4	1	12
4	1	14
4	1	1
4	2	3
4	2	4
4	2	67
4	4	4
4	4	45
4	4	53
4	4	2
;

Based on this, this code creates your want dataset:

proc sql;
create table avg as
  select
    id,
    day,
    avg(record) as ___record
  from have
  where record lt 10
  group by id, day
;
quit;

data want;
merge
  have
  have (
    rename=(record=__record)
    where=(__record lt 10)
  )
  avg
;
by id day;
retain _record;
if record ge 10
then do;
  if first.day
  then do;
    if __record ne .
    then record = __record;
    else delete;
  end;
  else if not last.day
  then record = ___record;
  else record = _record;
end;
_record = record;
drop _:;
run;

Note that I simplified your rule by calculating the average of a day instead of the exact average of the last preceding and first succeeding "ok" value, as this would call for more complicated code and might not be necessary. See if this fits you.

For the exact calculation, you will need to set a sequence number within a day so you can determine which "lt 10" observations to select as preceding and succeeding, and you may have to use advanced techniques like a quite complicated SQL or a hash iterator object in a data step. This is necessary because a simple one-obs look-ahead won't work in the case of several "ge 10" observations in series.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Registration is open

SAS Training: Just a Click Away