BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kirkends
Fluorite | Level 6

I am new to SAS so forgive me if this is basic. I have a data set with 150,000+ rows, each with a unique ID, and 250 columns, all columns must be retained. However for this specific inquiry, I am only interested in 13 columns; each column is a variable with a valueof  "0" "1" "7" or "9".  I would like to make a new column that counts the number of times a "1" populates a column cell across the row/observation. 

 

For example:

 

                                                                                                                     count of "1"

ID         Col1    Col2     Col3    Col4    Col5     Col6      Col7    Col8................WANT 

152         1           0          1          0         1           9          1          7                        4

153         0           0          7          7         1           9          0          9                        1

154         1           9          1          9         9           0          0          9                        2

155         7           1          9          1         9           1          1          9                        4

156         9           0          0          9         0           0          0          0                        0

157         9           7          9          0         7           1          0          0                        1

 

 

 

I have tried solutions from other forums with similar problems haven't found anything that has worked for me. I appreciate the help!! 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

A "variable" only occurs once per observation. So no need to count.

 

A VALUE on the other hand is a different matter.

 

Are your variables actually numeric or character? This would work for either though if the variables ar

data want;
   set have;
   countval = countc(cats(of col1-col13),'1');
run;

If your actual variables are not consecutively numbered then list them instead of the "of col1 - col13" I used above: countc (cats(col1,col4,col11,<finish the list>),'1');

The countc function counts characters, so would not work if you have values like 11 (which would be two 1's), the Cats function concatenates all of the variables into a single string that countc wants as the "search" string. The second parameter for the countc function is the specific characters to search for. So if you place '79' in the second position you would get count of 7 and 9.

 

 

Would I be correct guessing that you have survey data with question answers of Yes (1) , No(0), Don't know(7) and Refused (9) ?

 

View solution in original post

9 REPLIES 9
mkeintz
PROC Star

This looks like a classic application of the array statement, followed by a loop over that array, as in:

 

data want;
  set have;
  array testvals  col1 col2 ... col8 .. ;
  do over testvals;
    want=sum(want,testvals in (0,1,7,9));
  end;
run;

The ARRAY statement names an array whose elements are col1 col2 ... col8 ....

 

the DO OVER arrayname loops over those elements.  The "testvals in (0,1,7,9) returns a 1 or 0 depending no whether the IN condition is satisfied.

 

The more modern use of array in sas wold use array names with subscripts:

 

data want;
  set have;
  array testvals {*} col1 col2 ... col8 .. ;
  do i=1 to dim(testvals);
    want=sum(want,testvals{i} in (0,1,7,9));
  end;
  drop i;
run;

But for this task, I like the old way - no need to create, and then drop the i variable. 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

BTW, you are counting the occurrence of a value across columns, not occurrence of a variable

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
novinosrin
Tourmaline | Level 20
data have;
input ID         Col1    Col2     Col3    Col4    Col5     Col6      Col7    Col8;*................WANT ;
cards;
152         1           0          1          0         1           9          1          7                        4

153         0           0          7          7         1           9          0          9                        1

154         1           9          1          9         9           0          0          9                        2

155         7           1          9          1         9           1          1          9                        4

156         9           0          0          9         0           0          0          0                        0

157         9           7          9          0         7           1          0          0                        1
;

data want;
 set have;
 want=countc(cats(of col1-col8),'1');
run;
SAS_SB
Obsidian | Level 7

Hello,

I am having a similar situation where I need to count another value as well. In the example you have, I would like to count either 1 or 9, which would be 5 instead of 4 in the variable "want".

 

I would appreciate it if you could help me on this. 

 

Thank you!

Tom
Super User Tom
Super User

@SAS_SB wrote:

Hello,

I am having a similar situation where I need to count another value as well. In the example you have, I would like to count either 1 or 9, which would be 5 instead of 4 in the variable "want".

 

I would appreciate it if you could help me on this. 

 

Thank you!


Re-read the marked answer as that case was already covered.  Instead of passing the one character string '1' to the COUNTC() function you want to pass the two character string '19'.

For the array based solutions instead of test if the value is equal to 1 you can test if the value is in the list 1 9.

SAS_SB
Obsidian | Level 7

Got it, thanks @Tom ! 

ballardw
Super User

A "variable" only occurs once per observation. So no need to count.

 

A VALUE on the other hand is a different matter.

 

Are your variables actually numeric or character? This would work for either though if the variables ar

data want;
   set have;
   countval = countc(cats(of col1-col13),'1');
run;

If your actual variables are not consecutively numbered then list them instead of the "of col1 - col13" I used above: countc (cats(col1,col4,col11,<finish the list>),'1');

The countc function counts characters, so would not work if you have values like 11 (which would be two 1's), the Cats function concatenates all of the variables into a single string that countc wants as the "search" string. The second parameter for the countc function is the specific characters to search for. So if you place '79' in the second position you would get count of 7 and 9.

 

 

Would I be correct guessing that you have survey data with question answers of Yes (1) , No(0), Don't know(7) and Refused (9) ?

 

kirkends
Fluorite | Level 6

This is exactly what I was looking for. Sorry about the term mix up I am still learning! This does explains why me searches were fruitless.... I am working on Survey Data and 1=yes, 0=No, 7=unknown, 9=missing, I just converted characters to numerics for ease of uses in other inquiries.

Thank you
ballardw
Super User

@kirkends wrote:

This is exactly what I was looking for. Sorry about the term mix up I am still learning! This does explains why me searches were fruitless.... I am working on Survey Data and 1=yes, 0=No, 7=unknown, 9=missing, I just converted characters to numerics for ease of uses in other inquiries.

Thank you

 

There is absolutely no reason to apologize for converting the codes. I would myself. And could very well create re-coded variables setting the 7 and 9 to missing to report on prevalence among respondents that answered the question yes or no.

Arrays are a perfect approach to do that.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 8272 views
  • 2 likes
  • 6 in conversation