Repeating a Count

Reply
Occasional Contributor
Posts: 16

Repeating a Count

Good afternoon,

 

I have a large data set where I need to fulfill two conditions. Each time there is a new bout, I need to increase the count by 1. Each time there is a new ID # (thus a new bout as well), I need to restart the count. I have the date and time for each bout and ID number. The bouts and ID numbers are already in order. Any ideas? Below is an example of what I need to do.

 

bout          ID

1              X

1              X

1              X

2              X

2              X

2              X

1              Y

1              Y

1              Y

2              Y

2              Y

2              Y

Respected Advisor
Posts: 4,173

Re: Repeating a Count

[ Edited ]

Something like below should do the job.

data have;
  input (bout ID) ($);
  datalines;
1 X
1 X
1 X
2 X
2 X
2 X
1 Y
1 Y
1 Y
2 Y
2 Y
2 Y
;
run;

data want;
  set have;
  by id bout;
  if first.id then count=1;
  else if first.bout then count+1;
run;
Occasional Contributor
Posts: 16

Re: Repeating a Count

Thanks for your help! I am not sure if it is not working because the idea has an alpha prefix, but the count remains as "1" throughout the datasheet. Do I need to code it differently with an alpha prefix, i.e. ID X01?

Regular Contributor
Posts: 161

Re: Repeating a Count

Did you try running the code from @Patrick ? Can you do a proc contents of your input dataset and your output dataset and post it  ? Also, if you could provide your log, it may help.

Kannan Deivasigamani
Occasional Contributor
Posts: 16

Re: Repeating a Count

I was typing out a long response and all of the sudden it started working! Sort of! It counted properly but I did receive the error: BY variables are not properly sorted on data set WORK.NEWTESTDATA. By row "bout" does come before my row ID. Would that cause an issue? Below is my original post that I was typing out:

 

-----

 

Good morning,

 

Yes, I tried both codes. The code provided by kannand adds 1 with each new bout; however, it does not restart the count when a new ID appears.

 

I tried getting rid of the alpha prefix and trying Patrick's code again. I have received the error: "BY variables are not properly sorted on data set WORK.TESTDATA. Otherwise, it makes the entire column "1". If I sort the by statement "id bout" then I receive the error, if I set the by 

 

The first lines of my code are:

proc import datafile="C:\file.txt" out=TESTDATA dbms=dlm; getnames=yes;

delimiter=","

guessingrows=500000;

run;

 

I then edit the rows by deleting unnecessary text, and creating new rows with dates and times. The variable ID is in a $13. character format; however, I did try making it a BEST12. numeric format.

 

So I have:

 

bout                   ID                   datetime (datetime20.)

1                        X01                 25JUL2012 18:02:00

1                        X01                 25JUL2012 18:02:00

(continued)

2                        X01                 25JUL2012  21:00:00

2                        X01                 25JUL2012  21:00:00

(continued)

3                       X12                 28JUL2012 17:02:00

3                       X12                 28JUL2012  17:02:00

(continued)

4                      X12                 28JUL2012 22:00:00

(et cetera)

 

So instead of the bout number continuing (1, 2, 3, 4), I need it to reset so it would be (1, 2, 1, 2). Each ID does not necessarily have the same number of bouts or the same datetime. The data is inputted from a text file but formatted.

 

Occasional Contributor
Posts: 16

Re: Repeating a Count

One more question! Thank you guys for your help! I have been working on this dataset far too long.

 

Similar to my other problem:

 

 

bout                   ID                   datetime (datetime20.)

1                        X01                 25JUL2012 18:02:00

1                        X01                 25JUL2012 18:02:00

(continued)

2                        X01                 25JUL2012  21:00:00

2                        X01                 25JUL2012  21:00:00

(continued)

3                       X12                 28JUL2012 17:02:00

3                       X12                 28JUL2012  17:02:00

(continued)

4                      X12                 28JUL2012 22:00:00

(et cetera)

 

If it is the first datetime for that participant then I want to keep that datetime in a new column until a new ID appears, then that datetime will be used until a third ID appears.

 

bout                   ID                   datetime (datetime20.)                     count           keepdatetime

1                        X01                 25JUL2012 18:02:00                        1                 25JUL2012 18:02:00

1                        X01                 25JUL2012 18:02:00                        1                 25JUL2012 18:02:00

(continued)

2                        X01                 25JUL2012  21:00:00                       2                 25JUL2012 18:02:00

2                        X01                 25JUL2012  21:00:00                       2                 25JUL2012 18:02:00

(continued)

3                       X12                 28JUL2012 17:02:00                         1                 28JUL2012 17:02:00

3                       X12                 28JUL2012  17:02:00                        1                 28JUL2012 17:02:00

(continued)

4                      X12                 28JUL2012 22:00:00                          2                 28JUL2012  17:02:00

(et cetera)

Regular Contributor
Posts: 161

Re: Repeating a Count

[ Edited ]

The error appears because the data is not sorted. Add a sort step before setting the dataset using the "BY" variables. The error should go away.

 

Good Luck...!!!

Kannan Deivasigamani
Occasional Contributor
Posts: 16

Re: Repeating a Count

You were right! It needed that statement.

Regular Contributor
Posts: 161

Re: Repeating a Count

Create the new variable using the retain statement (which will retain the value for successive rows until a new value is assigned, ...as shown below...  

 

Retain new_datetime;

 

Assign the value to the var for every new occurrence of the ID as shown below:

 

 

IF FIRST.ID THEN NEW_DATETIME= DATETIME;

 

Hope this helps...!!! Good Luck...!!!

Kannan Deivasigamani
Regular Contributor
Posts: 161

Re: Repeating a Count

If you simply assign the datetime value without a Retain statement, you'll have missing if there are repeated IDs in your result. 

Kannan Deivasigamani
Occasional Contributor
Posts: 16

Re: Repeating a Count

It worked! Thank you! I was forgetting the retain statement so I had missing observations except where in specific cells. Marvelous Smiley Happy.
Ask a Question
Discussion stats
  • 10 replies
  • 397 views
  • 1 like
  • 3 in conversation