BookmarkSubscribeRSS Feed
Abishekaa
Obsidian | Level 7

This is a dataset I have:

 

PtID var1  var2

1        5       5

1        6       6

2        6       6

2       7        2

 

I would like to create a count  variable (by PtID) for var 1 & 2, which calculates how many '6' are present in these variables for each PtID

Here is the expected dataset:

 

PtID var1  var2  count_var1 count_var2

1        5       5          1                  1

1        6       6          1                  1

2        6       6          1                  2       

2       7        6          1                  2

 

Could anyone suggest a data step using an array for this process?

 

 

6 REPLIES 6
Kurt_Bremser
Super User

Why do you want count_var2 = 2 for ptid 2 when there's only one 6 in that group for var2?


@Abishekaa wrote:

This is a dataset I have:

 

PtID var1  var2

1        5       5

1        6       6

2        6       6

2       7        2

 

I would like to create a count  variable (by PtID) for var 1 & 2, which calculates how many '6' are present in these variables for each PtID

Here is the expected dataset:

 

PtID var1  var2  count_var1 count_var2

1        5       5          1                  1

1        6       6          1                  1

2        6       6          1                  2       

2       7        6          1                  2

 

Could anyone suggest a data step using an array for this process?

 

 


 

Kurt_Bremser
Super User

I see, that's a typo in your source data.

Two ways, SQL and data step:

data have;
input PtID $ var1  var2;
datalines;
1        5       5
1        6       6
2        6       6
2       7        6
;

proc sql;
create table want1 as
  select
    *,
    sum(var1 = 6) as count_var1,
    sum(var2 = 6) as count_var2
  from have
  group by ptid
;
quit;

data want2;
if 0 then set have;
count_var1 = 0;
count_var2 = 0;
do until (last.ptid);
  set have;
  by ptid;
  count_var1 + (var1 = 6);
  count_var2 + (var2 = 6);
end;
do until (last.ptid);
  set have;
  by ptid;
  output;
end;
run;
japelin
Rhodochrosite | Level 12

I think this is difficult to achieve with data step because it is not possible to check the value of observations that have not been read.

If you use proc sql, it is relatively simple.

 

data have;
input PtID var1 var2;
datalines;
1 5 5
1 6 6
2 6 6
2 7 6
;
run;

proc sql;
  create table want as
    select A.*
          ,(select count(var1) 
            from have as B 
            where A.PtID=B.PtID and var1=6) as count_var1
          ,(select count(var1) 
            from have as B 
            where A.PtID=B.PtID and var2=6) as count_var2 
    from have as A
  ;
quit;
Kurt_Bremser
Super User

@japelin wrote:

I think this is difficult to achieve with data step because it is not possible to check the value of observations that have not been read.

That's where you use a Double DO (or DO Whitlock) loop. It mimics the SQL "remerge".

japelin
Rhodochrosite | Level 12

@Kurt_Bremser wrote:

That's where you use a Double DO (or DO Whitlock) loop. It mimics the SQL "remerge".


Oh! Thank you for letting me know!

I will try it.

Kurt_Bremser
Super User

The DATA step with a DOW loop may need extra sorting, SQL does that on the fly. Which method you use depends on the sort state of the dataset, and its size (data steps usually outperform SQL with large data when sorting/joining is involved). As long as the SQL finishes in seconds, there's no need for the more complicated DATA step code.

 

Addendum:

there are some things that can't be done in SQL or are very hard to do. Mainly this involves evaluating certain sequences of data, as the data step does this on its own, while you need to force SQL to recognize a certain sequence within a certain order of data. So if you need to compare observations according to the physical sequence in which they are stored in the dataset (e.g. with LAG and/or RETAIN), the DATA step always wins.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1101 views
  • 3 likes
  • 3 in conversation