Generating id

BayzidurRahman · Posted 08-28-2023 08:57 AM

I have two variables id and dif. I need to create sum of variable dif by id until the value of dif is 1. If dif>1 then counting will restart from there and continue until dif>1 or id is different.

id	dif	Sum
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	3
2	1	3
2	1	3
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2

PaigeMiller · Posted 08-28-2023 09:01 AM

I need to create sum of variable dif by id until the value of dif is 1

Should that really say "... until the value of dif is not equal to 1" ??

--
Paige Miller

BayzidurRahman · Posted 08-28-2023 09:04 AM

No, it should say dif>1 because in some situations the dif can be 0.
Secondly, the sum variable should take to sum of rows.

PaigeMiller · Posted 08-28-2023 09:12 AM

Why is the value of variable SUM of the second group equal to 6? It seems to me that the sum of variable DIF in the second group is 7.

2 + 1 + 1 + 1+ 1 + 1 = 7

Similar question applies to later groups as well

--
Paige Miller

BayzidurRahman · Posted 08-28-2023 09:17 AM

Sorry- I clarified it in my previous reply. The sum should be total number of rows.

PaigeMiller · Posted 08-28-2023 09:25 AM

This code is untested. If you want tested code, please provide the data as working SAS data step code (examples and instructions).

/* UNTESTED CODE */
data intermediate;
    set have;
    by id;
    if first.id or dif>1 then group+1;
run;
proc summary nway data=intermediate;
    class group;
    var dif;
    output out=counts n=sum;
run;
data want;
    merge intermediate counts(keep=group sum);
    by id;
run;

This assumes the original data set is sorted by ID.

Suggestion: if the value you want is a count and not a sum, do not name the variable SUM

--
Paige Miller

Tom · Posted 08-28-2023 09:47 AM

You seem to be making GROUPS, but you don't actually create a variable to indicate the group number.

data want;
  set have;
  by id;
  if first.id then group=1;
  else if dif > 1 then group+1;
run;

Once you have the group then counting how may observations are in each group is easy.

proc sql;
create table want2 as 
  select *, count(*) as count
  from want
  group by id,group
;
quit;

Result

Obs    id    dif    Sum    group    count

  1     1     1      7       1        7
  2     1     1      7       1        7
  3     1     1      7       1        7
  4     1     1      7       1        7
  5     1     1      7       1        7
  6     1     1      7       1        7
  7     1     1      7       1        7
  8     1     1      6       2        6
  9     1     1      6       2        6
 10     1     1      6       2        6
 11     1     1      6       2        6
 12     1     1      6       2        6
 13     1     2      6       2        6
 14     2     1      3       1        3
 15     2     1      3       1        3
 16     2     1      3       1        3
 17     2     1      3       2        3
 18     2     1      3       2        3
 19     2     3      3       2        3
 20     2     1      2       3        2
 21     2     2      2       3        2

BayzidurRahman · Posted 08-28-2023 09:55 AM

Thanks Tom.

I have another condition- if dif =0 or missing then that row should not be counted in the sum.

id	dif	Sum
1	1	6
1	1	6
1	0	6
1	1	6
1	1	6
1	1	6
1	1	6
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	2
2	.	2
2	1	2
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2

Tom · Posted 08-28-2023 11:00 AM

Just change the counting step. Instead of COUNT(*) you could use SUM( dif not in (0 .) )

BayzidurRahman · Posted 08-29-2023 04:24 AM

Thanks Tom. That gave me required clue. I modified the code to ignore 0 and missing in the dif variable.

data want;
*retain id group dif count;
  set have (drop= sum);
  by id;
  if first.id then gr=1;
  else if (dif > 2 & dif ne .) then gr+1;
 count=1;
   if dif = . or dif = 0 then count=.;
run;

proc sql;
create table want2 as 
  select *, SUM( count ) as dur
  from want
  group by id,gr
;
quit;

But one thing I noticed that in the want2 dataset it is changing the order of observations in the dif variable. Number 3 is appearing on the last row in dif. Is there any way to maintain the order as in the want dataset?

mkeintz · Posted 08-28-2023 09:59 AM

Pass through each GROUP twice, once to get the group size, and once to write output with the new COUNT variable. GROUP is defined as a sequence of obs ending with either the last obs for a given ID, or when the next observation has DIF>1:

data have;
  input id dif sum;
datalines;
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	3
2	1	3
2	1	3
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2
run;
data want (drop=_:); 
  do count=1 by 1 until (last.id or _next_dif>1);
    set have;
    by id;
    merge have have (firstobs=2 keep=dif rename=(dif=_next_dif));
  end;
  do _n_=1 to count;
    set have;
    output;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

BayzidurRahman · Posted 08-29-2023 04:16 AM

Thanks.

How can I exclude counting any rows where the value of dif is either 0 or missing?

Generating id

Re: Generating id

Re: Generating id

Re: Generating id

Re: Generating id

Re: Generating id

Re: Generating id

Re: Generating id

Re: Generating id

Re: Generating id

Re: Generating id

Re: Generating id

id	dif	Sum
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	3
2	1	3
2	1	3
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2

id	dif	Sum
1	1	6
1	1	6
1	0	6
1	1	6
1	1	6
1	1	6
1	1	6
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	2
2	.	2
2	1	2
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2

id	dif	Sum
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	3
2	1	3
2	1	3
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2

id	dif	Sum
1	1	6
1	1	6
1	0	6
1	1	6
1	1	6
1	1	6
1	1	6
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	2
2	.	2
2	1	2
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2

Registration is open

SAS Training: Just a Click Away

id	dif	Sum
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	1	7
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	3
2	1	3
2	1	3
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2

id	dif	Sum
1	1	6
1	1	6
1	0	6
1	1	6
1	1	6
1	1	6
1	1	6
1	2	6
1	1	6
1	1	6
1	1	6
1	1	6
1	1	6
2	1	2
2	.	2
2	1	2
2	3	3
2	1	3
2	1	3
2	2	2
2	1	2