DATA Step, Macro, Functions and more

Replace missing value with previous value

Reply
Occasional Contributor
Posts: 19

Replace missing value with previous value

[ Edited ]

There are probably many ways to go about fixing my issue, and I have tried all that I know and still no luck. I have many variables and observations, but let's just focus on four of them. Subject, Projectedid, Actual, Actualid. I have a single observation for each subject and I need each variable to be populated, but some subjects are missing actual and actualin. If a subject is missing actual and actual in then I need it to be populated with what was projected. This would be an example of my dataset (obviously this is not the actual dataset).

 

Subject Projectedid Actual     Actualid

001       1                summer   sum

002       2                winter       win

003       1                winter       win

004       1                    .             .

 

I need my final dataset to look like;

 

Subject Projectedid Actual       Actualid

001       1                 summer   sum

002       2                 winter       win

003       1                 winter       win

004       1                 summer   sum

 

Note: If the variabels are not missing, then sometimes the projected and actual do not match and that is how it is suppose to be.

 

So, I thought I could use a data step and populate the missing variables by using an if then do;

 

 

data favorite_seasons;

        set seasons;

 

        if actual='.' then do;

        if projectid='1' then actual='summer';

        else if projectid='2' then actual='winter'

        end;

 

         if actualid='.' then do;

         if projectedid='1' then actualid='sum';

         if projectedid='2' then actualid='win';

         end;

run;

 

 

I thought this would work, but it is not populating correctly. Also, I need to kept my variables names that same so I am not sure if I need to do that in proc sql or not. Any help?

 

 

 

 

Regular Contributor
Posts: 233

Re: Replace missing value with previous value

Hi,

 

Some points are not clear in your question. I think you made some typos in the variables names (Actualcd, Actualin, Actualid, ...).

Can you edit your post in order to harmonize the column names or, if they refer to different variables, explain the use of the different variables.

Also, in your example there are two observations with Projectedcd equal to 1. How do we determine which should be used to populate the missing observation.

Are Actual and Actualcd only alternative reprersentations of the variable Projectedcd ?

If so, you can use formats rather rthan creating new columns in your dataset.

Regular Contributor
Posts: 233

Re: Replace missing value with previous value

Maybe your problem is with the tests :

 

if XXX='.' then do;

 If the column contains blanks, the test will fail.

 

Try

if strip(XXX)='.' then do;

 

to remove leading and trailing blanks.

Occasional Contributor
Posts: 19

Re: Replace missing value with previous value

I am not understanding why I would need to use strip in this situation?
Regular Contributor
Posts: 233

Re: Replace missing value with previous value

You are comparating strings.

'    .     ' is not equal to '.'

Depending on how the Actual column was generated, you may have some blanks before or after

the dot for the missing observations.

Also, SAS convention for an empty string is a single space " ", which is tested true against "" and " ".

Occasional Contributor
Posts: 19

Re: Replace missing value with previous value

Projectid (1) = summer

               (2) = winter

 

let's say that projectid means that the subjects are projected to like (1) summer or (2) winter, and actualid and actual are what they 'actually' like, and if actual and actualid are missing then I need it to be populated with what they are projected to like. 

Super User
Super User
Posts: 7,955

Re: Replace missing value with previous value

Hi,

 

Sorry, your post is a bit mixed up there with the various differences.  When posting a question it is very important to post test data in the form of a datastep - this boths avoids us having to type things in, but also shows the structure of the data - i.e. is projectid character or num, from your logic it appears to be character.  Anyways I have typed all that in and show working code - assuming your data matches that structure:

data have;
  input Subject $ Projectid Actual $ Actualid $;
  if strip(actual)="" then actual=".";
  if strip(actualid)="" then actualid=".";
datalines;
001       1                summer   sum
002       2                winter       win
003       1                winter       win
004       1                    .             .
;
run;

data favorite_seasons;
  set have;
  if actual='.' then actual=ifc(projectid=1,'summer','winter');
  if actualid='.' then actualid=substr(actual,1,3);
run;
Occasional Contributor
Posts: 19

Re: Replace missing value with previous value

data favorite_seasons;
        set seasons;

if ACTUAL="" then ACTUAL=ifc(PROJECTID='1','Summer', 'Winter');
else if ACTUAL="" then ACTUAL=ifc(PROJECTID='2','Winter','Summer');

if ACTUALID="" then ACTUALID=ifc(PROJECTID='1', 'Sum', 'Win');
else if ACTUALID="" then ACTUALID=ifc(PROJECTID='2', 'Win', 'Sum');


run;

 

 

This is the code that I ended up using, I really liked the 'IFC' procedure, but I did not understand what the substr procedure was doing.

Regular Contributor
Posts: 233

Re: Replace missing value with previous value

The substr procedure was used to extract the first three letters of the season name.

Actualid seems indeed to be redundant with the Actual variable. If your dataset is large,

this can lead to store much unnecessary information.

In order to gain space, it may be better to work with your season codification and use formats

to get names whenever needed. Here is an example using character codes '1', '2' for Summer and Winter.

 

proc format;
   invalue $season 'Summer'='1'
                  'Winter'='2'
				  other=.;
   value $seas_name '1'="Summer"
                   '2'="Winter";
   value $seas_short '1'="Sum"
                    '2'="Win";
run;

data have;
  infile datalines dlm=',';

  format Subject $3. Projectid Actual $1.;
  informat Actual $season.;

  input Subject Projectid Actual;
/* Summer and Winter will be replaced by their codes in the resulting dataset */
datalines;
001,1,Summer
002,2,Winter
003,1,Winter
004,1,.
;
run;


data want;
    set have;
    if Actual=. then Actual=Projectid;
run;

proc print data=want;
format Projectid Actual $seas_name.;
run;
Super User
Posts: 10,028

Re: Replace missing value with previous value

Why it is 
004       1                 summer   sum

not
004       1                 winter   win

Ask a Question
Discussion stats
  • 9 replies
  • 325 views
  • 0 likes
  • 4 in conversation