DATA Step, Macro, Functions and more

Identify groups using a data step

Reply
Super Contributor
Posts: 272

Identify groups using a data step

[ Edited ]

Hi I need help in my code.

 

I have a data with 4 variables;

 

Subjid          date                test       period 

1             14OCT2015        Diab    Unscheduled

1             14OCT2015       Sybp    Unscheduled

1             14OCT2015       Diab     Unscheduled

1             14OCT2015        Sybp    Unscheduled

1              21OCT2015        Diab    Unscheduled

1              21OCT2015        Sybp    Unscheduled

1               21OCT2015        diab    Unscheduled

1               21OCT2015        Sybp    Unscheduled

 

 

I need create  a new variable with value=1 for all observation with same date and next value=2 for all observation with same date.

 

I need output like this

 

Subjid          date                test       period                   value

1             14OCT2015        Diab    Unscheduled             1

1             14OCT2015       Sybp    Unscheduled              1

1             14OCT2015       Diab     Unscheduled              1

1             14OCT2015        Sybp    Unscheduled              1

1              21OCT2015        Diab    Unscheduled              2

1              21OCT2015        Sybp    Unscheduled             2

1               21OCT2015        diab    Unscheduled              2

1               21OCT2015        Sybp    Unscheduled             2

 

I am using this code:

 

proc sort data=one out=two
by subjid period date test;
run;


data three;;
set two;
by usubjid vs_dat;
retain value;
if first.vs_dat=1 then value=1;
else value+1;

run;

 

The output:

Subjid          date                test       period                   value

1             14OCT2015        Diab    Unscheduled             1

1             14OCT2015       Sybp    Unscheduled              2

1             14OCT2015       Diab     Unscheduled              3

1             14OCT2015        Sybp    Unscheduled              4

1              21OCT2015        Diab    Unscheduled              1

1              21OCT2015        Sybp    Unscheduled             2

1               21OCT2015        diab    Unscheduled              3

1               21OCT2015        Sybp    Unscheduled             4

 

Please help my code. Thanks

 

 

 

Super User
Posts: 17,885

Re: Data step

Why don't your BY values in your data step match your proc sort?

Super User
Posts: 17,885

Re: Data step

And why have the variable names changed between the two datasets? Date to vs_date? Subjid to usubjid?

 

Your approach is correct, but you need to reference variables appropriately. 

Trusted Advisor
Posts: 1,115

Re: Identify groups using a data step

[ Edited ]

Also, your initial description suggests that DATE is the primary grouping variable (within SUBJID). But your PROC SORT step indicates that PERIOD should have higher priority. Hence, in general, observations with the same DATE might be assigned different group numbers in variable VALUE.

 

If the groups are simply defined by SUBJID and DATE, you could use the following code:

proc sort data=one out=two;
by subjid date;
run;

data three;
set two;
by subjid date;
value+first.date; /* note the implicit RETAIN due to the sum statement */
if first.subjid then value=1; /* EDIT: This resets the value for the first obs. of each SUBJID. */
run;

 

Super Contributor
Posts: 272

Re: Identify groups using a data step

proc sort data=one out=two;
by subjid date;
run;

data three;
set two;
by subjid date;
value+first.date; /* note the implicit RETAIN due to the sum statement */
run;

Hi this code working but still did not produce expected results. It results in

 

Subjid          date                test       period                   value

1             14OCT2015        Diab    Unscheduled             1

1             14OCT2015       Sybp    Unscheduled              1

1             14OCT2015       Diab     Unscheduled              1

1             14OCT2015        Sybp    Unscheduled              1

1              21OCT2015        Diab    Unscheduled              2

1              21OCT2015        Sybp    Unscheduled             2

1               21OCT2015        diab    Unscheduled              2

1               21OCT2015        Sybp    Unscheduled             2

2             14OCT2015        Diab    Unscheduled               3

2             14OCT2015       Sybp    Unscheduled              3

2             14OCT2015       Diab     Unscheduled              3

2             14OCT2015        Sybp    Unscheduled              3

2              21OCT2015        Diab    Unscheduled              4

2              21OCT2015        Sybp    Unscheduled             4

2               21OCT2015        diab    Unscheduled              4

2               21OCT2015        Sybp    Unscheduled             4

 

The output I needis:

 

1             14OCT2015        Diab    Unscheduled             1

1             14OCT2015       Sybp    Unscheduled              1

1             14OCT2015       Diab     Unscheduled              1

1             14OCT2015        Sybp    Unscheduled              1

1              21OCT2015        Diab    Unscheduled              2

1              21OCT2015        Sybp    Unscheduled             2

1               21OCT2015        diab    Unscheduled              2

1               21OCT2015        Sybp    Unscheduled             2

2             14OCT2015        Diab    Unscheduled             1

2             14OCT2015       Sybp    Unscheduled              1

2             14OCT2015       Diab     Unscheduled              1

2             14OCT2015        Sybp    Unscheduled              1

2              21OCT2015        Diab    Unscheduled              2

2              21OCT2015        Sybp    Unscheduled             2

2               21OCT2015        diab    Unscheduled              2

2               21OCT2015        Sybp    Unscheduled             2

 

Please help; Thanks

Super User
Posts: 17,885

Re: Identify groups using a data step

Your initial code was correct - only change the BY values.

 

if first.date=1 then value=1;
else value+1;
Super Contributor
Posts: 272

Re: Identify groups using a data step

Hi I didnot get expected output. Please help. Thanks

Trusted Advisor
Posts: 1,115

Re: Identify groups using a data step

@knveraraju91: Sure, you're right! How embarassing! Please accept my apologies and see the correction in my edited post.

Super Contributor
Posts: 272

Re: Identify groups using a data step

Where is your edited post with answer

##- Please type your reply above this line. Simple formatting, no
attachments. -##
Trusted Advisor
Posts: 1,115

Re: Identify groups using a data step

It's in my original post. I've inserted the following line:

if first.subjid then value=1; /* EDIT: This resets the value for the first obs. of each SUBJID. */
Super Contributor
Posts: 272

Re: Identify groups using a data step

Hi Thanks for your support. But I didnot get my correct answer. I hope you got my question correctly. With following code i am getting the following output. Please look at the value variable values in both, the output i am getting and the output I need. Please help.

 

proc sort data=one out=two;
by subjid date;
run;

data three;
set two;
by subjid date;
value+first.date; /* note the implicit RETAIN due to the sum statement */
run;

Output with the above code:

 

Subjid          date                test       period                   value

1             14OCT2015        Diab    Unscheduled             1

1             14OCT2015       Sybp    Unscheduled              1

1             14OCT2015       Diab     Unscheduled              1

1             14OCT2015        Sybp    Unscheduled              1

1              21OCT2015        Diab    Unscheduled              2

1              21OCT2015        Sybp    Unscheduled             2

1               21OCT2015        diab    Unscheduled              2

1               21OCT2015        Sybp    Unscheduled             2

2             14OCT2015        Diab    Unscheduled               3

2             14OCT2015       Sybp    Unscheduled              3

2             14OCT2015       Diab     Unscheduled              3

2             14OCT2015        Sybp    Unscheduled              3

2              21OCT2015        Diab    Unscheduled              4

2              21OCT2015        Sybp    Unscheduled             4

2               21OCT2015        diab    Unscheduled              4

2               21OCT2015        Sybp    Unscheduled             4

 

The output I need is:

Subjid          date                test       period                   value

 

1             14OCT2015        Diab    Unscheduled             1

1             14OCT2015       Sybp    Unscheduled              1

1             14OCT2015       Diab     Unscheduled              1

1             14OCT2015        Sybp    Unscheduled              1

1              21OCT2015        Diab    Unscheduled              2

1              21OCT2015        Sybp    Unscheduled             2

1               21OCT2015        diab    Unscheduled              2

1               21OCT2015        Sybp    Unscheduled             2

2             14OCT2015        Diab    Unscheduled             1

2             14OCT2015       Sybp    Unscheduled              1

2             14OCT2015       Diab     Unscheduled              1

2             14OCT2015        Sybp    Unscheduled              1

2              21OCT2015        Diab    Unscheduled              2

2              21OCT2015        Sybp    Unscheduled             2

2               21OCT2015        diab    Unscheduled              2

2               21OCT2015        Sybp    Unscheduled             2

Trusted Advisor
Posts: 1,115

Re: Identify groups using a data step

You're using the same code as before, not the corrected version.

Super Contributor
Posts: 272

Re: Identify groups using a data step

Do you mean the following code I need to use;
proc sort data=one out=two;
by subjid date;
run;

data three;
set two;
by subjid date;
if first.date=1 then value=1;
else value+1;
run;




The output I am getting with that code is:

 

Subjid          date                test       period                   value

1             14OCT2015        Diab    Unscheduled             1

1             14OCT2015       Sybp    Unscheduled              2

1             14OCT2015       Diab     Unscheduled              3

1             14OCT2015        Sybp    Unscheduled              4

1              21OCT2015        Diab    Unscheduled              1

1              21OCT2015        Sybp    Unscheduled             2]

1               21OCT2015        diab    Unscheduled              3

1               21OCT2015        Sybp    Unscheduled             4

2             14OCT2015        Diab    Unscheduled               1

2             14OCT2015       Sybp    Unscheduled              2

2             14OCT2015       Diab     Unscheduled              3

2             14OCT2015        Sybp    Unscheduled              4

2              21OCT2015        Diab    Unscheduled              1

2              21OCT2015        Sybp    Unscheduled             2

2               21OCT2015        diab    Unscheduled              3

2               21OCT2015        Sybp    Unscheduled             4

 

But the output I needis:

 

1             14OCT2015        Diab    Unscheduled             1

1             14OCT2015       Sybp    Unscheduled              1

1             14OCT2015       Diab     Unscheduled              1

1             14OCT2015        Sybp    Unscheduled              1

1              21OCT2015        Diab    Unscheduled              2

1              21OCT2015        Sybp    Unscheduled             2

1               21OCT2015        diab    Unscheduled              2

1               21OCT2015        Sybp    Unscheduled             2

2             14OCT2015        Diab    Unscheduled             1

2             14OCT2015       Sybp    Unscheduled              1

2             14OCT2015       Diab     Unscheduled              1

2             14OCT2015        Sybp    Unscheduled              1

2              21OCT2015        Diab    Unscheduled              2

2              21OCT2015        Sybp    Unscheduled             2

2               21OCT2015        diab    Unscheduled              2

2               21OCT2015        Sybp    Unscheduled             2

Trusted Advisor
Posts: 1,115

Re: Identify groups using a data step

No, I referred to my own correction, which I copy below for your convenience.

proc sort data=one out=two;
by subjid date;
run;

data three;
set two;
by subjid date;
value+first.date; /* note the implicit RETAIN due to the sum statement */
if first.subjid then value=1; /* EDIT: This resets the value for the first obs. of each SUBJID. */
run;
Super Contributor
Posts: 272

Re: Identify groups using a data step

Thank you so much that works.

Ask a Question
Discussion stats
  • 14 replies
  • 313 views
  • 0 likes
  • 3 in conversation