I have a data table that I have sorted by a specific field. I want to create an id for this field so I did the following:
data tbl ;
length id 8 ;
set tbl_ ;
by num ;
retain id 0 ;
if first.num then id = id + 1 ;
run ;
The first few rows of num are blank and I want them to have a . not 0 as the id.
I'm going to use this to create an look up table so by assigning the ids here I can split up the data sets after. I just don't want id 1 = to a blank value.
Thank you
I typed too quickly, twice. It should have read:
data tbl (drop=counter);
length id 8 ;
set tbl_ ;
retain id . ;
if not missing(num) then do;
counter + 1;
id=counter;
end;
run;
Plus I have a question about num and why you originally used a by statement and looked for first.num. Is there anything special you want done for different nums?
First.Num is only a boolean that will be equal to 1 for the first instance of the particular value in Num and zero for all other instances of that particular value.
Is this what you're looking for:
data tbl ;
length id 8 ;
set tbl_ ;
by num ;
retain id 0 ;
if not missing(num) then do;
if first.num then id = id + 1;
end;
run;
/hobbes
I think you are trying to accomplish something like:
data tbl (drop=counter);
length id 8 ;
set tbl_ ;
retain id . ;
if not missing(num) then do;
counter + 1;
num=counter;
end;
run;
Thanks guys but neither worked.
Art,
shouldn't the last line of code set id=counter not num?
This is how I have it working
data tbl (drop=counter);
length id 8 ;
set tbl_ ;
by num ;
retain id . ;
if not missing(num) then do;
if first.num then do ;
counter + 1;
id = counter;
end ;
end;
run;
Thank you
I typed too quickly, twice. It should have read:
data tbl (drop=counter);
length id 8 ;
set tbl_ ;
retain id . ;
if not missing(num) then do;
counter + 1;
id=counter;
end;
run;
Plus I have a question about num and why you originally used a by statement and looked for first.num. Is there anything special you want done for different nums?
First.Num is only a boolean that will be equal to 1 for the first instance of the particular value in Num and zero for all other instances of that particular value.
Assuming num is a numeric variable. Try this...
data tbl ;
length id 8 ;
set tbl_ ;
by num ;
if first.num then do;
if nmiss(num) then id = .;
if n(num) then id+1;
end;
run;
data tbl ;
length id 8 ;
set tbl_ ;
by num ;
retain id 0 ;
if num = . then id = .;
else if first.num then id + 1 ;
run ;
Output on first column as input:
. .
. .
10 1
10 1
20 2
20 2
20 2
30 3
30 3
30 3
30 3
But is there any difference between using NUM or ID as ID field? The only difference is that ID is sequential, but if you only need them as keys, then there's really no difference.
Thank you everyone for your help.
Arts last post helped a lot.
hi ... you already got an answer, but here's another suggestion ...
data y;
set x;
by num;
if ^missing(num) then id+first.num;
else id = .;
run;
great code - so simple! :smileyshocked:
(even without the "else ... " )
hi ... nicer without the "else..." but unfortunately that generates a id=0 (rather than missing) for the missing num rows given how the sum statement works (with an initial value of zero)
without the "else" ... but retain is also an extra statement
data y;
retain id .;
set x;
by num;
if ^missing(num) then id+first.num;
run;
the use of +first.<var> for a group counter comes from posts I saw ... and remembered, that's the hard part ... by Howard Schreier a long time ago
Howard is the elegant code master
no q about Howard's status!
As to the ELSE - my mistake!
I was surprised to see the default of an accumulator is zero rather than missing!
Cannot believe the ELSE statement would ever use measureably more cpu than the RETAIN id .; !
Besides the variable positions in the output dataset, what is the difference put "retain id ." before and after "set x"? Thank you!
data y;
retain id .;
set x;
by num;
if ^missing(num) then id+first.num;
run;
/**********************/
data y;
set x;
retain id .;
by num;
if ^missing(num) then id+first.num;
run;
not only position, but also data type (and length for character variables) will be defined when the retain statement comes first.
When it comes after the SET, these would be inherited from the first ID variable in any of the data sets on the SET statement.
When ID is not in any of the data sets on the SET statement, the only difference is position.
Thank you for your detailed explanation Peter!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.