Howdy Y'all,
I would like to add a new column to a dataset but I am not sure how to do so. My dataset has a variable called KEYVAR (character variable) with three different values. A participant can appear multiple times in my dataset, with each row containing a similar or different value for KEYVAR. What I want to do is create a new variable call NEWVAR that counts how many times a participant has a specific value for KEYVAR; when a participant does not have an observation for that specific value, I want NEWVAR to have a result of zero.
The first code I used was:
PROC SQL;
CREATE TABLE WANT AS
SELECT*, COUNT (PARTICIPANT) AS NEWVAR
FROM HAVE
WHERE KEYVAR = 'Y';
QUIT;
While the code does create a new column, it has a major problem. It creates a new dataset but with only observations containing the specific value I want; however, I need a dataset that retains all my observations. Secondly, the WHERE statement also restricts the count to the specific value, so any participant lacking the 'Y' in KEYVAR is ignored completely.
Here's an example of the dataset I would like (in this example, I want to count every instance of "Y"):
PARTICIPANT | KEYVAR | NEWVAR |
A | Y | 1 |
A | N | 1 |
B | Y | 3 |
B | Y | 3 |
B | Y | 3 |
C | W | 0 |
C | N | 0 |
C | W | 0 |
D | Y | 2 |
D | N | 2 |
D | N | 2 |
D | Y | 2 |
D | W | 2 |
I am using SAS 9.4
Thanks for the help y'all!
@TXSASneophyte wrote:
Howdy Y'all,
I would like to add a new column to a dataset but I am not sure how to do so. My dataset has a variable called KEYVAR (character variable) with three different values. A participant can appear multiple times in my dataset, with each row containing a similar or different value for KEYVAR. What I want to do is create a new variable call NEWVAR that counts how many times a participant has a specific value for KEYVAR; when a participant does not have an observation for that specific value, I want NEWVAR to have a result of zero.
The first code I used was:
PROC SQL;
CREATE TABLE WANT AS
SELECT*, COUNT (PARTICIPANT) AS NEWVAR
FROM HAVE
WHERE KEYVAR = 'Y';
QUIT;
While the code does create a new column, it has a major problem. It creates a new dataset but with only observations containing the specific value I want; however, I need a dataset that retains all my observations. Secondly, the WHERE statement also restricts the count to the specific value, so any participant lacking the 'Y' in KEYVAR is ignored completely.
Here's an example of the dataset I would like (in this example, I want to count every instance of "Y"):
PARTICIPANT KEYVAR NEWVAR A Y 1 A N 1 B Y 3 B Y 3 B Y 3 C W 0 C N 0 C W 0 D Y 2 D N 2 D N 2 D Y 2 D W 2
I am using SAS 9.4
Thanks for the help y'all!
Use SQL SUM() instead.
PROC SQL;
CREATE TABLE WANT AS
SELECT
*,
sum(KEYVAR='Y') AS COUNT_VAR
FROM HAVE
GROUP BY PARTICIPANT;
QUIT;
PROC SQL;
CREATE TABLE WANT AS
SELECT
*,
COUNT (*) AS NEWVAR
FROM HAVE
GROUP BY PARTICIPANT, KEYVAR;
QUIT;
Thank you for the code. Unfortunately it does not do what I need. Here's a table of what your code did, comparing the column I want to create in blue and the column your code created in green:
PARTICIPANT | KEYVAR | NEWVAR | PGStats_CODE |
A | Y | 1 | 1 |
A | N | 1 | 1 |
B | Y | 3 | 3 |
B | Y | 3 | 3 |
B | Y | 3 | 3 |
C | W | 0 | 2 |
C | N | 0 | 1 |
C | W | 0 | 2 |
D | Y | 2 | 2 |
D | N | 2 | 2 |
D | N | 2 | 2 |
D | Y | 2 | 2 |
D | W | 2 | 1 |
Your code gives me a count for every value in KEYVAR but I am only interested in the count for the "Y" value in KEYVAR, and if a certain participant has no "Y" at all in any of their observations, then I want a value of zero for them in NEWVAR (look at participant C).
Not sure I understand fully the issue
Assigning the value of zero for groups missing a cetain category
data have;
input
PARTICIPANT $ KEYVAR $ NEWVAR;
cards4;
A Y 1
A N 1
B Y 3
B Y 3
B Y 3
C W 0
C N 0
C W 0
D Y 2
D N 2
D N 2
D Y 2
D W 2
;;;;
run;quit;
HAVE
====
Up to 40 obs WORK.HAVE total obs=13
Obs PARTICIPANT KEYVAR NEWVAR
1 A Y 1
2 A N 1
3 B Y 3
4 B Y 3
5 B Y 3
6 C W 0
7 C N 0
8 C W 0
9 D Y 2
10 D N 2
11 D N 2
12 D Y 2
13 D W 2
WANT
====
Obs KEYVAR N W Y
1 A 1 0 1
2 B 0 0 3
3 C 1 2 0 * C has no Y;
4 D 2 1 2
FULL SOLUTION
=============
Ods Exclude All;
Ods Output Observed=want(Rename=Label=Keyvar);
Proc Corresp Data=have Observed dim=1;
Table PARTICIPANT, KEYVAR;
*weight newvar; * turn this on if newvar is a count;
run;quit;
Ods Select All;
proc print data=want;
run;quit;
My computer is taking forever to run this code; I'll run again tonight to see what it does. But thanks for the suggestion!
In your original example, on the second line, NEWVAR=1 with KEYVAR="N" ? How so?
The second line has NEWVAR = 1 with KEYVAR = 'N' because I set up the dataset assuming a group by statement was possible. The second line doesn't mean that I wanted 1 for 'N' but that participant "A" had at least one 'Y'.
@TXSASneophyte wrote:
Howdy Y'all,
I would like to add a new column to a dataset but I am not sure how to do so. My dataset has a variable called KEYVAR (character variable) with three different values. A participant can appear multiple times in my dataset, with each row containing a similar or different value for KEYVAR. What I want to do is create a new variable call NEWVAR that counts how many times a participant has a specific value for KEYVAR; when a participant does not have an observation for that specific value, I want NEWVAR to have a result of zero.
The first code I used was:
PROC SQL;
CREATE TABLE WANT AS
SELECT*, COUNT (PARTICIPANT) AS NEWVAR
FROM HAVE
WHERE KEYVAR = 'Y';
QUIT;
While the code does create a new column, it has a major problem. It creates a new dataset but with only observations containing the specific value I want; however, I need a dataset that retains all my observations. Secondly, the WHERE statement also restricts the count to the specific value, so any participant lacking the 'Y' in KEYVAR is ignored completely.
Here's an example of the dataset I would like (in this example, I want to count every instance of "Y"):
PARTICIPANT KEYVAR NEWVAR A Y 1 A N 1 B Y 3 B Y 3 B Y 3 C W 0 C N 0 C W 0 D Y 2 D N 2 D N 2 D Y 2 D W 2
I am using SAS 9.4
Thanks for the help y'all!
Use SQL SUM() instead.
PROC SQL;
CREATE TABLE WANT AS
SELECT
*,
sum(KEYVAR='Y') AS COUNT_VAR
FROM HAVE
GROUP BY PARTICIPANT;
QUIT;
This does exactly what I need, thanks for the help 😄
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.