BookmarkSubscribeRSS Feed
Abishekaa
Obsidian | Level 7

This is a dataset I have:

 

PtID var1  var2

1        5       5

1        6       6

2        6       6

2       7        2

 

I would like to create a count  variable (by PtID) for var 1 & 2, which calculates how many '6' are present in these variables for each PtID

Here is the expected dataset:

 

PtID var1  var2  count_var1 count_var2

1        5       5          1                  1

1        6       6          1                  1

2        6       6          1                  2       

2       7        6          1                  2

 

Could anyone suggest a data step using an array for this process?

 

 

6 REPLIES 6
Kurt_Bremser
Super User

Why do you want count_var2 = 2 for ptid 2 when there's only one 6 in that group for var2?


@Abishekaa wrote:

This is a dataset I have:

 

PtID var1  var2

1        5       5

1        6       6

2        6       6

2       7        2

 

I would like to create a count  variable (by PtID) for var 1 & 2, which calculates how many '6' are present in these variables for each PtID

Here is the expected dataset:

 

PtID var1  var2  count_var1 count_var2

1        5       5          1                  1

1        6       6          1                  1

2        6       6          1                  2       

2       7        6          1                  2

 

Could anyone suggest a data step using an array for this process?

 

 


 

Kurt_Bremser
Super User

I see, that's a typo in your source data.

Two ways, SQL and data step:

data have;
input PtID $ var1  var2;
datalines;
1        5       5
1        6       6
2        6       6
2       7        6
;

proc sql;
create table want1 as
  select
    *,
    sum(var1 = 6) as count_var1,
    sum(var2 = 6) as count_var2
  from have
  group by ptid
;
quit;

data want2;
if 0 then set have;
count_var1 = 0;
count_var2 = 0;
do until (last.ptid);
  set have;
  by ptid;
  count_var1 + (var1 = 6);
  count_var2 + (var2 = 6);
end;
do until (last.ptid);
  set have;
  by ptid;
  output;
end;
run;
japelin
Rhodochrosite | Level 12

I think this is difficult to achieve with data step because it is not possible to check the value of observations that have not been read.

If you use proc sql, it is relatively simple.

 

data have;
input PtID var1 var2;
datalines;
1 5 5
1 6 6
2 6 6
2 7 6
;
run;

proc sql;
  create table want as
    select A.*
          ,(select count(var1) 
            from have as B 
            where A.PtID=B.PtID and var1=6) as count_var1
          ,(select count(var1) 
            from have as B 
            where A.PtID=B.PtID and var2=6) as count_var2 
    from have as A
  ;
quit;
Kurt_Bremser
Super User

@japelin wrote:

I think this is difficult to achieve with data step because it is not possible to check the value of observations that have not been read.

That's where you use a Double DO (or DO Whitlock) loop. It mimics the SQL "remerge".

japelin
Rhodochrosite | Level 12

@Kurt_Bremser wrote:

That's where you use a Double DO (or DO Whitlock) loop. It mimics the SQL "remerge".


Oh! Thank you for letting me know!

I will try it.

Kurt_Bremser
Super User

The DATA step with a DOW loop may need extra sorting, SQL does that on the fly. Which method you use depends on the sort state of the dataset, and its size (data steps usually outperform SQL with large data when sorting/joining is involved). As long as the SQL finishes in seconds, there's no need for the more complicated DATA step code.

 

Addendum:

there are some things that can't be done in SQL or are very hard to do. Mainly this involves evaluating certain sequences of data, as the data step does this on its own, while you need to force SQL to recognize a certain sequence within a certain order of data. So if you need to compare observations according to the physical sequence in which they are stored in the dataset (e.g. with LAG and/or RETAIN), the DATA step always wins.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1732 views
  • 3 likes
  • 3 in conversation