SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Why are these coalesce or if\missing statements "retaining" data?

Reply
Frequent Contributor
Posts: 84

Why are these coalesce or if\missing statements "retaining" data?

Some test code:

data one;

input cat1 $ cat2 $;

datalines;

mortgage fixed

mortgage arm

;

run;

data two;

input group1 $ group2 $;

datalines;

card standard

card reward

;

run;

data three;

set one two;

if missing(cat2) then cat2 = group2;

x = coalesceC(cat2, group2);

run;

For Cards, why would cat2 and X end up as "standard" for both records??

mortgagefixedfixed
mortgagearmarm
standardcardstandardstandard
standardcardrewardstandard
Super User
Super User
Posts: 6,502

Re: Why are these coalesce or if\missing statements "retaining" data?

CAT2 is RETAINed because it is coming from one of the datasets on the SET statement.  All such variables are retained.  It only appears not to be retained because the SET statement fills it with new values (or when you reach the end missing values).

You do not normally notice is this type of program because you normally do not assign a value one of the variables from your INPUT dataset after you have read past the end.

Add a couple of PUT statements and you can see what is happening.

BEFORE: dsname=  cat1=  cat2=  group1=  group2=  x=  _ERROR_=0 _N_=1

AFTER: dsname=WORK.ONE cat1=mortgage cat2=fixed group1=  group2=  x=fixed _ERROR_=0 _N_=1

BEFORE: dsname=WORK.ONE cat1=mortgage cat2=fixed group1=  group2=  x=  _ERROR_=0 _N_=2

AFTER: dsname=WORK.ONE cat1=mortgage cat2=arm group1=  group2=  x=arm _ERROR_=0 _N_=2

BEFORE: dsname=WORK.ONE cat1=mortgage cat2=arm group1=  group2=  x=  _ERROR_=0 _N_=3

AFTER: dsname=WORK.TWO cat1=  cat2=standard group1=card group2=standard x=standard _ERROR_=0 _N_=3

BEFORE: dsname=WORK.TWO cat1=  cat2=standard group1=card group2=standard x=  _ERROR_=0 _N_=4

AFTER: dsname=WORK.TWO cat1=  cat2=standard group1=card group2=reward x=standard _ERROR_=0 _N_=4

BEFORE: dsname=WORK.TWO cat1=  cat2=standard group1=card group2=reward x=  _ERROR_=0 _N_=5

Super User
Posts: 5,096

Re: Why are these coalesce or if\missing statements "retaining" data?

Besides retaining SET statement variables, there's one more complicating factor, which PUT messages can't capture 100% because it's part of SET statement execution.

When, if ever, should the software reinitialize CAT2 to missing?  The answer is when reading the first observation from the data set TWO.  When reading the first observation from a data set, SET statement variables that do not appear in that data set get reset to missing.

Ask a Question
Discussion stats
  • 2 replies
  • 429 views
  • 0 likes
  • 3 in conversation