Hi, I have a very simple issue to do but I can't work it out. I have a subject ID column and I want to add a variable (counter) telling me the number of times the ID is repeated or not.
I tried to do a loop like this:
do i= 1 to 28872;
if ID(i) = ID(i-1) then counter=1;
but this not work, I have a message telling me " unknow ID function"
In fact there is only one column (one variable) I think i's just a vector. I only want to go through it, line by line and keeping the last value in memory, so I can compare the actual lines with the last and write a 1 if it's equal. The only thing I need is asociate the values with an index (first to last) and so I can give the instruction to compare the i value with the i-1 value.
With an array/do loop you're looping for EACH iteration of the datastep (=for each "line of data", observation). That's something you might want to do if you have a lot of columns.
What you describe is getting information out of passing through the data, line by line. That's something SAS is very good at.
Find below one possibility to get what you want. I used Proc SQL - but it could also be done with Proc Sort and datasteps.
/* create some test data with some duplicate ID 's */
do i=1 to 1000;
/* count per id and write result to variable IDcount */
/* create table want as*/
select l.id, r.IDcount
from have l left join
(select id, count(id) as IDcount from have group by id) r
order by id;
> Hi, I have a very simple issue to do but I can't work
> it out. I have a subject ID column and I want to add
> a variable (counter) telling me the number of times
> the ID is repeated or not.
> I tried to do a loop like this:
> do i= 1 to 28872;
> if ID(i) = ID(i-1) then counter=1;
> else counter=0;
Ah, I've made this mistake before. You can't think of SAS datasets as being like arrays -- you can't access each record (row) in a SAS dataset by its observation number. You're thinking like a "normal" programmer rather than like a SAS programmer.
This does what I think you're looking for:
checkrepeat=(ID=priorid); * 1 if ID is same as prior value, 0 if not;
PriorID=ID; * creates a new variable to hold the value of ID;
retain PriorID; * keeps PriorID at its _old_ value so you can check it against
the new value of ID;
* if checkrepeat; * optionally restricts the new dataset to repeated values of ID;
The keyword "retain" can be a bit tricky to get used to, but as you can see it does the job,
Oh, there's a simpler way using the lag() function:
checkrepeat=(id=lag(id)); * 1 if id is same as prior value, 0 if not;
* if checkrepeat; * optionally restricts the new dataset to repeated values of id;
(I learned to use retain before I found lag(), so it comes more naturally to me. )