Solved: Re: Generating ID but don't want to have an id for blank value rows

jerry898969 · Posted 09-19-2011 10:59 AM

I have a data table that I have sorted by a specific field. I want to create an id for this field so I did the following:

data tbl ;

length id 8 ;

set tbl_ ;

by num ;

retain id 0 ;

if first.num then id = id + 1 ;

run ;

The first few rows of num are blank and I want them to have a . not 0 as the id.

I'm going to use this to create an look up table so by assigning the ids here I can split up the data sets after. I just don't want id 1 = to a blank value.

Thank you

art297 · Posted 09-19-2011 03:05 PM

I typed too quickly, twice. It should have read:

data tbl (drop=counter);

length id 8 ;

set tbl_ ;

retain id . ;

if not missing(num) then do;

counter + 1;

id=counter;

end;

run;

Plus I have a question about num and why you originally used a by statement and looked for first.num. Is there anything special you want done for different nums?

First.Num is only a boolean that will be equal to 1 for the first instance of the particular value in Num and zero for all other instances of that particular value.

View solution in original post

Hobbes · Posted 09-19-2011 01:39 PM

Is this what you're looking for:

data tbl ;

length id 8 ;

set tbl_ ;

by num ;

retain id 0 ;

if not missing(num) then do;

if first.num then id = id + 1;

end;

run;

/hobbes

art297 · Posted 09-19-2011 01:52 PM

I think you are trying to accomplish something like:

data tbl (drop=counter);

length id 8 ;

set tbl_ ;

retain id . ;

if not missing(num) then do;

counter + 1;

num=counter;

end;

run;

jerry898969 · Posted 09-19-2011 02:52 PM

Thanks guys but neither worked.

Art,

shouldn't the last line of code set id=counter not num?

This is how I have it working

data tbl (drop=counter);

length id 8 ;

set tbl_ ;

by num ;

retain id . ;

if not missing(num) then do;

if first.num then do ;

counter + 1;

id = counter;

end ;

end;

run;

Thank you

art297 · Posted 09-19-2011 03:05 PM

I typed too quickly, twice. It should have read:

data tbl (drop=counter);

length id 8 ;

set tbl_ ;

retain id . ;

if not missing(num) then do;

counter + 1;

id=counter;

end;

run;

Plus I have a question about num and why you originally used a by statement and looked for first.num. Is there anything special you want done for different nums?

First.Num is only a boolean that will be equal to 1 for the first instance of the particular value in Num and zero for all other instances of that particular value.

NickR · Posted 09-19-2011 03:06 PM

Assuming num is a numeric variable. Try this...

data tbl ;

length id 8 ;

set tbl_ ;

by num ;

if first.num then do;

if nmiss(num) then id = .;

if n(num) then id+1;

end;

run;

DLing · Posted 09-19-2011 03:02 PM

data tbl ;
    length id 8 ;
    set tbl_ ;
    by num ;
    retain id 0 ;

if num = . then id = .;
else if first.num then id + 1 ;
run ;

Output on first column as input:

. .

10 1

20 2

30 3

But is there any difference between using NUM or ID as ID field? The only difference is that ID is sequential, but if you only need them as keys, then there's really no difference.

jerry898969 · Posted 11-29-2011 02:48 PM

Thank you everyone for your help.

Arts last post helped a lot.

MikeZdeb · Posted 11-29-2011 09:49 PM

hi ... you already got an answer, but here's another suggestion ...

data y;

set x;

by num;

if ^missing(num) then id+first.num;

else id = .;

run;

Peter_C · Posted 12-01-2011 12:24 PM

great code - so simple! :smileyshocked:

(even without the "else ... " )

MikeZdeb · Posted 12-03-2011 09:52 AM

hi ... nicer without the "else..." but unfortunately that generates a id=0 (rather than missing) for the missing num rows given how the sum statement works (with an initial value of zero)

without the "else" ... but retain is also an extra statement

data y;

retain id .;

set x;

by num;

if ^missing(num) then id+first.num;

run;

the use of +first.<var> for a group counter comes from posts I saw ... and remembered, that's the hard part ... by Howard Schreier a long time ago

Howard is the elegant code master

Peter_C · Posted 12-03-2011 12:28 PM

no q about Howard's status!

As to the ELSE - my mistake!

I was surprised to see the default of an accumulator is zero rather than missing!

Cannot believe the ELSE statement would ever use measureably more cpu than the RETAIN id .; !

Linlin · Posted 12-03-2011 06:02 PM

Besides the variable positions in the output dataset, what is the difference put "retain id ." before and after "set x"? Thank you!

data y;

retain id .;

set x;

by num;

if ^missing(num) then id+first.num;

run;

/**********************/

data y;

set x;

retain id .;

by num;

if ^missing(num) then id+first.num;

run;

Peter_C · Posted 12-04-2011 10:12 AM

not only position, but also data type (and length for character variables) will be defined when the retain statement comes first.

When it comes after the SET, these would be inherited from the first ID variable in any of the data sets on the SET statement.

When ID is not in any of the data sets on the SET statement, the only difference is position.

Linlin · Posted 12-05-2011 10:04 AM

Thank you for your detailed explanation Peter!

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away