BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
tan-wongv
Obsidian | Level 7

tanwongv_0-1709273067135.png

I have a dataset with multiple record dates per ID (DMRN). As you can see that the sample with DMRN = 16 has Null for tobacco_user at the first visit. I would like to impute the NULL with the value from other visits (In this case it will be Never). Can anyone help me with the SAS code? I am very new to SAS program and still have so many things to learn. Thank you very much.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Welcome to the SAS Communities. 

Your screenshot doesn't show such a case but "in theory" there could also be "NULL" after a date where you had another value. I assume in such a case you would just want to repeat the last earlier value that wasn't "NULL" ...which then gets us into a situation where we need logic for both a look ahead (for the first row per dmrn) and a look behind once there was a value other than "NULL".

The SAS data step just iterates over the rows from a source table which makes a look behind easy because that's a row you've already read in an earlier iteration. You can use the lag() function or the retain statement for this.

A look-ahead is a bit harder to code for because you need information from a source row that hasn't been read yet. In below sample code I'm first creating a table "inter" that contains the first row from "have" per dmnr where the value is not "NULL". I then use the MERGE statement for this "look ahead". ...you will need to read-up a bit how the SAS data step MERGE statement works when there are same named variables in more than one of the source tables (=the value from which table wins for same named variables).

data have;
  input dmrn dx_date:date9. tobacco_user $;
  have_tobacco_user=tobacco_user;
  format dx_date date9.;
  datalines;
16 01jan2024 NULL
16 02jan2024 Never
16 03jan2024 NULL
16 04jan2024 NULL
16 05jan2024 Quit
16 07jan2024 NULL
;

data inter(keep=dmrn tobacco_user);
  set have;
  by dmrn dx_date;
  retain out_flg;
  if first.dmrn then out_flg=0;
  if out_flg=1 then delete;
  if tobacco_user ne 'NULL' then
    do;
      output;
      out_flg=1;
    end;
run;

data want;
  merge have inter;
  by dmrn;
  retain r_tobacco_user;
  if tobacco_user ne 'NULL'   then r_tobacco_user=tobacco_user;
  else if tobacco_user='NULL' then tobacco_user=r_tobacco_user;
run;

proc print data=want;
run;

Going forward and to help us help you: Please post sample data not as screenshot but a data step code that creates such data (similar to the "data have" code above).

View solution in original post

5 REPLIES 5
ballardw
Super User

What is your rule for replacing the "null" with another value when there are two or more non-null values?

tan-wongv
Obsidian | Level 7

My aim is to analyze the value at patient's first visit, and NULL in this survey is counted as a missing value at a specific visit.

ballardw
Super User

@tan-wongv wrote:

My aim is to analyze the value at patient's first visit, and NULL in this survey is counted as a missing value at a specific visit.


Does not answer the question about what to do with two, or more, different non-null responses.

Patrick
Opal | Level 21

Welcome to the SAS Communities. 

Your screenshot doesn't show such a case but "in theory" there could also be "NULL" after a date where you had another value. I assume in such a case you would just want to repeat the last earlier value that wasn't "NULL" ...which then gets us into a situation where we need logic for both a look ahead (for the first row per dmrn) and a look behind once there was a value other than "NULL".

The SAS data step just iterates over the rows from a source table which makes a look behind easy because that's a row you've already read in an earlier iteration. You can use the lag() function or the retain statement for this.

A look-ahead is a bit harder to code for because you need information from a source row that hasn't been read yet. In below sample code I'm first creating a table "inter" that contains the first row from "have" per dmnr where the value is not "NULL". I then use the MERGE statement for this "look ahead". ...you will need to read-up a bit how the SAS data step MERGE statement works when there are same named variables in more than one of the source tables (=the value from which table wins for same named variables).

data have;
  input dmrn dx_date:date9. tobacco_user $;
  have_tobacco_user=tobacco_user;
  format dx_date date9.;
  datalines;
16 01jan2024 NULL
16 02jan2024 Never
16 03jan2024 NULL
16 04jan2024 NULL
16 05jan2024 Quit
16 07jan2024 NULL
;

data inter(keep=dmrn tobacco_user);
  set have;
  by dmrn dx_date;
  retain out_flg;
  if first.dmrn then out_flg=0;
  if out_flg=1 then delete;
  if tobacco_user ne 'NULL' then
    do;
      output;
      out_flg=1;
    end;
run;

data want;
  merge have inter;
  by dmrn;
  retain r_tobacco_user;
  if tobacco_user ne 'NULL'   then r_tobacco_user=tobacco_user;
  else if tobacco_user='NULL' then tobacco_user=r_tobacco_user;
run;

proc print data=want;
run;

Going forward and to help us help you: Please post sample data not as screenshot but a data step code that creates such data (similar to the "data have" code above).

tan-wongv
Obsidian | Level 7
Thank you very much for your helpful tips and solution! 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 742 views
  • 2 likes
  • 3 in conversation