Hi,
I have a dataset to which i need to get the count details of a variable without losing the number of rows. here the eg of the table i have and the expected result table i need. I am trying get 3 additional columns
1. total_makers -> which gives a total makers count of ('TDM', 'BDM' ) grouped by CID and country
2. eng_p_makers -> based on the P-score1=100 and flag=engaged and makers = TDM, BDM (only)
3. eng_e_makers -> based on the e-score2 = 100 and flag=engaged and makers = TDM, BDM (only)
Greatly appreciate your time and assistance in advance!
table 1 | ||||||
CID | GID | country | Makers | flag | P-score1 | E-score2 |
. | . | |||||
. | . | |||||
. | . | |||||
1234 | 567 | UK | TDM | engaged | 100 | 100 |
1234 | 567 | UK | BDM | engaged | 0 | 100 |
1234 | 567 | UK | Business | unengaged | 0 | 100 |
62438 | 99790 | NZ | BDM | engaged | 100 | 0 |
77777 | 88888 | US | technical | engaged | 0 | 100 |
77777 | 88888 | US | TDM | engaged | 100 | 0 |
77777 | 88888 | US | Business | engaged | 100 | 0 |
77777 | 88888 | US | BDM | engaged | 0 | 100 |
77777 | 88888 | US | BDM | engaged | 100 | 100 |
77777 | 88888 | US | BDM | unengaged | 100 | 0 |
77777 | 88888 | US | Business | unengaged | 0 | 0 |
77777 | 88888 | US | unengaged | 0 | 0 | |
77777 | 88888 | US | unengaged | 0 | 0 | |
64238 | 4724 | JP | BDM | engaged | 100 | 0 |
Expected_Result | |||||||||
CID | GID | country | Makers | flag | P-score1 | E-score2 | total-makers | eng_p_makers | eng_e_makers |
. | . | 0 | 0 | ||||||
. | . | 0 | 0 | ||||||
. | . | 0 | 0 | ||||||
1234 | 567 | UK | TDM | engaged | 100 | 100 | 2 | 1 | 2 |
1234 | 567 | UK | BDM | engaged | 0 | 100 | 2 | 1 | 2 |
1234 | 567 | UK | Business | unengaged | 0 | 100 | 2 | 1 | 2 |
62438 | 99790 | NZ | BDM | engaged | 100 | 0 | 1 | 1 | 0 |
77777 | 88888 | US | technical | engaged | 0 | 100 | 3 | 2 | 2 |
77777 | 88888 | US | TDM | engaged | 100 | 0 | 3 | 2 | 2 |
77777 | 88888 | US | Business | engaged | 100 | 0 | 3 | 2 | 2 |
77777 | 88888 | US | BDM | engaged | 0 | 100 | 3 | 2 | 2 |
77777 | 88888 | US | BDM | engaged | 100 | 100 | 3 | 2 | 2 |
77777 | 88888 | US | BDM | unengaged | 100 | 0 | 3 | 2 | 2 |
77777 | 88888 | US | Business | unengaged | 0 | 0 | 3 | 2 | 2 |
77777 | 88888 | US | unengaged | 0 | 0 | 3 | 2 | 2 | |
77777 | 88888 | US | unengaged | 0 | 0 | 3 | 2 | 2 | |
64238 | 4724 | JP | BDM | engaged | 100 | 0 | 1 | 1 | 0 |
I did avoid proc sql, since where statement filters off the table rows by the condition.
PROC SQL;
CREATE TABLE want AS
SELECT *, COUNT (makers) AS makers_1
FROM have where makers in ('BDM','TDM') and p-score1 = 100 and flag = 'engaged'
group by cid, country;
QUIT;
the data step i am using is not giving me the desired results-
data want;
set have;
by cid country;
if p-score1 = 100 and flag = 'engaged' and makers in ('TDM','BDM') then
eng_p_makers = count(makers);
else eng_p_makers = 0;
run;
Just to be clear, do you prefer an SQL or a data step solution to do this?
Ok. Why is total-makers=3 for CID=77777, Country=US? There are 4 rows that satisfy the criteria?
I think this is what you want
data table1;
input CID GID country $ Makers :$20. flag :$20. Pscore1 Escore2;
infile datalines dlm=',' dsd;
datalines;
1234,567,UK,TDM,engaged,100,100
1234,567,UK,BDM,engaged,0,100
1234,567,UK,Business,unengaged,0,100
62438,99790,NZ,BDM,engaged,100,0
77777,88888,US,technical,engaged,0,100
77777,88888,US,TDM,engaged,100,0
77777,88888,US,Business,engaged,100,0
77777,88888,US,BDM,engaged,0,100
77777,88888,US,BDM,engaged,100,100
77777,88888,US,BDM,unengaged,100,0
77777,88888,US,Business,unengaged,0,0
77777,88888,US,,unengaged,0,0
77777,88888,US,,unengaged,0,0
64238,4724,JP,BDM,engaged,100,0
;
proc sql;
create table want as
select *,
sum(Makers in ('TDM', 'BDM')) as total_makers,
sum(Pscore1=100 & Makers in ('TDM', 'BDM')) as eng_p_makers,
sum(Escore2=100 & Makers in ('TDM', 'BDM')) as eng_e_makers
from table1
group by CID, country;
quit;
Can you please show what an output like that might look like?
I personally do not prefer SQL for all missing vars and yet wanted in output
data have;
if _n_=1 then do _n_=1 to 3;
output;
end;
input (CID GID country Makers flag)(:$10.) Pscore1 Escore2;
output;
cards;
1234 567 UK TDM engaged 100 100
1234 567 UK BDM engaged 0 100
1234 567 UK Business unengaged 0 100
62438 99790 NZ BDM engaged 100 0
77777 88888 US technical engaged 0 100
77777 88888 US TDM engaged 100 0
77777 88888 US Business engaged 100 0
77777 88888 US BDM engaged 0 100
77777 88888 US BDM engaged 100 100
77777 88888 US BDM unengaged 100 0
77777 88888 US Business unengaged 0 0
77777 88888 US . unengaged 0 0
77777 88888 US . unengaged 0 0
64238 4724 JP BDM engaged 100 0
;
data want;
if 0 then set have;
Total_makers=.;
eng_p_makers=0;
eng_e_makers=0;
do _n_=1 by 1 until(last.country);
set have;
by cid country notsorted;
Total_makers=sum(Total_makers,Makers in ('BDM','TDM'));
eng_p_makers=sum(eng_p_makers,Pscore1=100 and flag='engaged' and Makers in ('BDM','TDM'));
eng_e_makers=sum(eng_e_makers,escore2 = 100 and flag='engaged' and Makers in ('BDM','TDM'));
end;
do _n_=1 to _n_;
set have;
output;
end;
run;
"where in i the total sum is not getting update to all other rows of the respective CID by country." -- Can you please elaborate this like which one you are referring to?
Coz all I did was copy past your condition that you wrote at the top and sum(sum, boolean(1,0)). So did i miss anything ?
This is the result i receive now...
CID | GID | country | Makers | flag | P-score1 | E-score2 | total-makers | eng_p_makers | eng_e_makers |
. | . | 0 | 0 | ||||||
. | . | 0 | 0 | ||||||
. | . | 0 | 0 | ||||||
1234 | 567 | UK | TDM | engaged | 100 | 100 | 2 | 1 | 2 |
1234 | 567 | UK | BDM | engaged | 0 | 100 | 2 | 1 | 2 |
1234 | 567 | UK | Business | unengaged | 0 | 100 | 0 | 0 | 0 |
62438 | 99790 | NZ | BDM | engaged | 100 | 0 | 1 | 1 | 0 |
77777 | 88888 | US | technical | engaged | 0 | 100 | 0 | 0 | 0 |
77777 | 88888 | US | TDM | engaged | 100 | 0 | 4 | 2 | 2 |
77777 | 88888 | US | Business | engaged | 100 | 0 | 0 | 0 | 0 |
77777 | 88888 | US | BDM | engaged | 0 | 100 | 4 | 2 | 2 |
77777 | 88888 | US | BDM | engaged | 100 | 100 | 4 | 2 | 2 |
77777 | 88888 | US | BDM | unengaged | 100 | 0 | 0 | 0 | 0 |
77777 | 88888 | US | Business | unengaged | 0 | 0 | 0 | 0 | 0 |
77777 | 88888 | US | unengaged | 0 | 0 | 0 | 0 | 0 | |
77777 | 88888 | US | unengaged | 0 | 0 | 0 | 0 | 0 | |
64238 | 4724 | JP | BDM | engaged | 100 | 0 | 1 | 1 | 0 |
You see i have underlined the 0's received, instead can have the total sum number being updated based on that CID.
Hi @codyV So you mean this condition I think
if eng_p_makers=0 then eng_p_makers=Total_makers;
if eng_e_makers=0 then eng_e_makers=Total_makers;
Updated full one:
data have;
if _n_=1 then do _n_=1 to 3;
output;
end;
input (CID GID country Makers flag)(:$10.) Pscore1 Escore2;
output;
cards;
1234 567 UK TDM engaged 100 100
1234 567 UK BDM engaged 0 100
1234 567 UK Business unengaged 0 100
62438 99790 NZ BDM engaged 100 0
77777 88888 US technical engaged 0 100
77777 88888 US TDM engaged 100 0
77777 88888 US Business engaged 100 0
77777 88888 US BDM engaged 0 100
77777 88888 US BDM engaged 100 100
77777 88888 US BDM unengaged 100 0
77777 88888 US Business unengaged 0 0
77777 88888 US . unengaged 0 0
77777 88888 US . unengaged 0 0
64238 4724 JP BDM engaged 100 0
;
data want;
if 0 then set have;
Total_makers=.;
eng_p_makers=0;
eng_e_makers=0;
do _n_=1 by 1 until(last.country);
set have;
by cid country notsorted;
Total_makers=sum(Total_makers,Makers in ('BDM','TDM'));
eng_p_makers=sum(eng_p_makers,Pscore1=100 and flag='engaged' and Makers in ('BDM','TDM'));
eng_e_makers=sum(eng_e_makers,escore2 = 100 and flag='engaged' and Makers in ('BDM','TDM'));
end;
do _n_=1 to _n_;
set have;
if eng_p_makers=0 then eng_p_makers=Total_makers;
if eng_e_makers=0 then eng_e_makers=Total_makers;
output;
end;
run;
I ran a test with the sample you gave, results here
eng_p_makers=0 eng_e_makers=0
eng_p_makers=0 eng_e_makers=0
eng_p_makers=0 eng_e_makers=0
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=1
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=1 eng_e_makers=1
NOTE: There were 17 observations read from the data set WORK.HAVE.
NOTE: There were 17 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 17 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.