Each subject in my dataset potentially has repeat measurements (where Period>1) and I am trying to figure out who has repeat measurements and how many measurements total each subject has. Some sample code is below (the dates are not reading in properly for some reason but I think the code still works for the purposes of this question):
data temporary;
INPUT
studyid$ Period StartDate MMDDYY10. GI_C ;
format Startdate yymmdd10.;
datalines;
097531 1 02/01/2007 0
222482 1 06/02/2012 0
092203 1 05/07/2010 1
090624 1 06/16/2007 0
118646 1 04/28/2007 0
093507 1 03/16/2007 0
080279 1 04/18/2007 0
1045110 1 05/11/2007 0
1633011 1 08/01/2007 0
1633011 2 02/21/2008 0
0607212 1 02/06/2008 1
1932914 1 01/31/2007 0
1932914 2 02/11/2009 0
1932914 3 05/19/2009 0
1932914 4 09/22/2009 0
1932914 5 01/09/2010 0
1932914 6 02/23/2010 0
1932914 7 12/01/2010 0
1932914 8 05/10/2011 0
1932914 9 08/03/2013 0
;
run;
proc print data=temporary; run;
Subjects with only one measurement have only one row with Period=1. In the sample code above, while many people just have one period (Period=1) some have multiple periods such as Study ID 1633011 who has 2 periods and studyID 1932914 who has 9 periods. Is there a way to create a code that will identify and apply the highest value available in a column (the highest value of a variable) for a particular subject? I want to count how many periods are available for each person. I would like to create a new variable (Periods_no) that tells you how many total periods that subject has overall, as follows:
studyid Period StartDate GI_C Periods_no
097531 1 02/01/2007 0 1
222482 1 06/02/2012 0 1
092203 1 05/07/2010 1 1
090624 1 06/16/2007 0 1
118646 1 04/28/2007 0 1
093507 1 03/16/2007 0 1
080279 1 04/18/2007 0 1
1045110 1 05/11/2007 0 1
1633011 1 08/01/2007 0 2
1633011 2 02/21/2008 0 2
0607212 1 02/06/2008 1 1
1932914 1 01/31/2007 0 9
1932914 2 02/11/2009 0 9
1932914 3 05/19/2009 0 9
1932914 4 09/22/2009 0 9
1932914 5 01/09/2010 0 9
1932914 6 02/23/2010 0 9
1932914 7 12/01/2010 0 9
1932914 8 05/10/2011 0 9
1932914 9 08/03/2013 0 9
Thank you very much in advance.
How about this:
data temporary;
	INPUT studyid$ Period StartDate : MMDDYY10. GI_C ;
	format Startdate yymmdd10.;
datalines;
097531  1         02/01/2007        0
222482  1         06/02/2012        0
092203  1         05/07/2010        1
090624  1         06/16/2007        0
118646  1         04/28/2007        0
093507  1         03/16/2007        0
080279  1         04/18/2007        0
1045110  1        05/11/2007        0
1633011  1        08/01/2007        0
1633011  2        02/21/2008        0
0607212  1        02/06/2008        1
1932914  1        01/31/2007        0
1932914  2        02/11/2009        0
1932914  3        05/19/2009        0
1932914  4        09/22/2009        0
1932914  5        01/09/2010        0
1932914  6        02/23/2010        0
1932914  7        12/01/2010        0
1932914  8        05/10/2011        0
1932914  9        08/03/2013        0
;
run;
proc sql;
  create table temporary2 as
	  select a.*,
		    count(*) as Periods_no,
				max(StartDate) as MaxStartDate format=yymmdd10.
		from temporary a
		group by studyid 
    order by studyid, StartDate;
quit;
proc print data=temporary2; 
run;
I have added the highest StartDate within each StudyID, anticipating your next question 
Why is subject number given column name studyid? Do you need the count in the data you have presented, or just as a summary? If its just a sequential count, then maybe:
proc sql;
  create table WANT as
  select A.*,
         B.COUNT
  from   HAVE A
  left join (select count(*) as COUNT from HAVE group by STUDYID) B
  on     A.STUDYID=B.STUDYID;
quit;
					
				
			
			
				
			
			
			
			
			
			
			
		How about this:
data temporary;
	INPUT studyid$ Period StartDate : MMDDYY10. GI_C ;
	format Startdate yymmdd10.;
datalines;
097531  1         02/01/2007        0
222482  1         06/02/2012        0
092203  1         05/07/2010        1
090624  1         06/16/2007        0
118646  1         04/28/2007        0
093507  1         03/16/2007        0
080279  1         04/18/2007        0
1045110  1        05/11/2007        0
1633011  1        08/01/2007        0
1633011  2        02/21/2008        0
0607212  1        02/06/2008        1
1932914  1        01/31/2007        0
1932914  2        02/11/2009        0
1932914  3        05/19/2009        0
1932914  4        09/22/2009        0
1932914  5        01/09/2010        0
1932914  6        02/23/2010        0
1932914  7        12/01/2010        0
1932914  8        05/10/2011        0
1932914  9        08/03/2013        0
;
run;
proc sql;
  create table temporary2 as
	  select a.*,
		    count(*) as Periods_no,
				max(StartDate) as MaxStartDate format=yymmdd10.
		from temporary a
		group by studyid 
    order by studyid, StartDate;
quit;
proc print data=temporary2; 
run;
I have added the highest StartDate within each StudyID, anticipating your next question 
Adding the max date and count of dates to each observation for groups of studyids
see
https://goo.gl/010WH7
https://communities.sas.com/t5/Base-SAS-Programming/How-to-create-a-new-variable-based-on-the-max-value-of-a/m-p/330453
Assumes that Studyids are grouped.
This also maintains the order if that is important.
Should be the fastest because of 'locality of reference' and
cacheing.
HAVE
====
Obs    STUDYID    PERIOD     STARTDATE    GI_C
  1    097531        1      2007-02-01      0
  2    222482        1      2012-06-02      0
  3    092203        1      2010-05-07      1
  4    090624        1      2007-06-16      0
  5    118646        1      2007-04-28      0
  6    093507        1      2007-03-16      0
  7    080279        1      2007-04-18      0
  8    1045110       1      2007-05-11      0
  9    1633011       1      2007-08-01      0
 10    1633011       2      2008-02-21      0
 11    0607212       1      2008-02-06      1
 12    1932914       1      2007-01-31      0
 13    1932914       2      2009-02-11      0
 14    1932914       3      2009-05-19      0
 15    1932914       4      2009-09-22      0
 16    1932914       5      2010-01-09      0
 17    1932914       6      2010-02-23      0
 18    1932914       7      2010-12-01      0
 19    1932914       8      2011-05-10      0
 20    1932914       9      2013-08-03      0
WANT
Obs    STUDYID    PERIOD     STARTDATE    GI_C    COUNT
  1    0607212       1      2008-02-06      1       1
  2    080279        1      2007-04-18      0       1
  3    090624        1      2007-06-16      0       1
  4    092203        1      2010-05-07      1       1
  5    093507        1      2007-03-16      0       1
  6    097531        1      2007-02-01      0       1
  7    1045110       1      2007-05-11      0       1
  8    118646        1      2007-04-28      0       1
  9    1633011       2      2008-02-21      0       2 2 of these
 10    1633011       1      2007-08-01      0       2
 11    1932914       3      2009-05-19      0       9  9 of these
 12    1932914       4      2009-09-22      0       9
 13    1932914       2      2009-02-11      0       9
 14    1932914       1      2007-01-31      0       9
 15    1932914       9      2013-08-03      0       9
 16    1932914       8      2011-05-10      0       9
 17    1932914       7      2010-12-01      0       9
 18    1932914       6      2010-02-23      0       9
 19    1932914       5      2010-01-09      0       9
 20    222482        1      2012-06-02      0       1
* create some data;
data have;
INPUT
studyid $ Period StartDate MMDDYY10. GI_C ;
format Startdate yymmdd10.;
cards4;
097531 1 02/01/2007 0
222482 1 06/02/2012 0
092203 1 05/07/2010 1
090624 1 06/16/2007 0
118646 1 04/28/2007 0
093507 1 03/16/2007 0
080279 1 04/18/2007 0
1045110 1 05/11/2007 0
1633011 1 08/01/2007 0
1633011 2 02/21/2008 0
0607212 1 02/06/2008 1
1932914 1 01/31/2007 0
1932914 2 02/11/2009 0
1932914 3 05/19/2009 0
1932914 4 09/22/2009 0
1932914 5 01/09/2010 0
1932914 6 02/23/2010 0
1932914 7 12/01/2010 0
1932914 8 05/10/2011 0
1932914 9 08/03/2013 0
;;;;
run;quit;
data want;
   retain maxdte cnt .;
   do until (last.studyid);
     set have;
     by studyid notsorted;
     cnt=sum(cnt,1);
     if startdate > maxdte then maxdte=startdate;
   end;
   do until (last.studyid);
     set have;
     by studyid notsorted;
     output;
   end;
   call missing(maxdte,cnt);
run;quit;
proc print data=want;
format startdate maxdte yymmdd10.;
run;quit;
Obs        MAXDTE    CNT    STUDYID    PERIOD     STARTDATE    GI_C
  1    2007-02-01     1     097531        1      2007-02-01      0
  2    2012-06-02     1     222482        1      2012-06-02      0
  3    2010-05-07     1     092203        1      2010-05-07      1
  4    2007-06-16     1     090624        1      2007-06-16      0
  5    2007-04-28     1     118646        1      2007-04-28      0
  6    2007-03-16     1     093507        1      2007-03-16      0
  7    2007-04-18     1     080279        1      2007-04-18      0
  8    2007-05-11     1     1045110       1      2007-05-11      0
  9    2008-02-21     2     1633011       1      2007-08-01      0
 10    2008-02-21     2     1633011       2      2008-02-21      0
 11    2008-02-06     1     0607212       1      2008-02-06      1
 12    2013-08-03     9     1932914       1      2007-01-31      0
 13    2013-08-03     9     1932914       2      2009-02-11      0
 14    2013-08-03     9     1932914       3      2009-05-19      0
 15    2013-08-03     9     1932914       4      2009-09-22      0
 16    2013-08-03     9     1932914       5      2010-01-09      0
 17    2013-08-03     9     1932914       6      2010-02-23      0
 18    2013-08-03     9     1932914       7      2010-12-01      0
 19    2013-08-03     9     1932914       8      2011-05-10      0
 20    2013-08-03     9     1932914       9      2013-08-03      0
Your data appear to be grouped (but not sorted) by studyid. And within studyid, it is sorted by period. You can take advantage of that attribute with a "do until " loop with "set" and "by" statements inside the loop:
data want;
  do until (last.studyid);
    set temporary (rename=(period=periods_no));
    by studyid notsorted;
  end;
  do until (last.studyid);
    set temporary;
    by studyid notsorted;
    output;
  end;
run;
The first do group advances through a studyid to the final period. With the PERIODS_NO in hand, the second do group re-reads the studyid and outputs each record.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
