Hi,
I have a dataset to which i need to get the count details of a variable without losing the number of rows. here the eg of the table i have and the expected result table i need. I am trying get 3 additional columns
1. total_makers -> which gives a total makers count of ('TDM', 'BDM' ) grouped by CID and country
2. eng_p_makers -> based on the P-score1=100 and flag=engaged and makers = TDM, BDM (only)
3. eng_e_makers -> based on the e-score2 = 100 and flag=engaged and makers = TDM, BDM (only)
Greatly appreciate your time and assistance in advance!
table 1 | ||||||
CID | GID | country | Makers | flag | P-score1 | E-score2 |
. | . | |||||
. | . | |||||
. | . | |||||
1234 | 567 | UK | TDM | engaged | 100 | 100 |
1234 | 567 | UK | BDM | engaged | 0 | 100 |
1234 | 567 | UK | Business | unengaged | 0 | 100 |
62438 | 99790 | NZ | BDM | engaged | 100 | 0 |
77777 | 88888 | US | technical | engaged | 0 | 100 |
77777 | 88888 | US | TDM | engaged | 100 | 0 |
77777 | 88888 | US | Business | engaged | 100 | 0 |
77777 | 88888 | US | BDM | engaged | 0 | 100 |
77777 | 88888 | US | BDM | engaged | 100 | 100 |
77777 | 88888 | US | BDM | unengaged | 100 | 0 |
77777 | 88888 | US | Business | unengaged | 0 | 0 |
77777 | 88888 | US | unengaged | 0 | 0 | |
77777 | 88888 | US | unengaged | 0 | 0 | |
64238 | 4724 | JP | BDM | engaged | 100 | 0 |
Expected_Result | |||||||||
CID | GID | country | Makers | flag | P-score1 | E-score2 | total-makers | eng_p_makers | eng_e_makers |
. | . | 0 | 0 | ||||||
. | . | 0 | 0 | ||||||
. | . | 0 | 0 | ||||||
1234 | 567 | UK | TDM | engaged | 100 | 100 | 2 | 1 | 2 |
1234 | 567 | UK | BDM | engaged | 0 | 100 | 2 | 1 | 2 |
1234 | 567 | UK | Business | unengaged | 0 | 100 | 2 | 1 | 2 |
62438 | 99790 | NZ | BDM | engaged | 100 | 0 | 1 | 1 | 0 |
77777 | 88888 | US | technical | engaged | 0 | 100 | 3 | 2 | 2 |
77777 | 88888 | US | TDM | engaged | 100 | 0 | 3 | 2 | 2 |
77777 | 88888 | US | Business | engaged | 100 | 0 | 3 | 2 | 2 |
77777 | 88888 | US | BDM | engaged | 0 | 100 | 3 | 2 | 2 |
77777 | 88888 | US | BDM | engaged | 100 | 100 | 3 | 2 | 2 |
77777 | 88888 | US | BDM | unengaged | 100 | 0 | 3 | 2 | 2 |
77777 | 88888 | US | Business | unengaged | 0 | 0 | 3 | 2 | 2 |
77777 | 88888 | US | unengaged | 0 | 0 | 3 | 2 | 2 | |
77777 | 88888 | US | unengaged | 0 | 0 | 3 | 2 | 2 | |
64238 | 4724 | JP | BDM | engaged | 100 | 0 | 1 | 1 | 0 |
I did avoid proc sql, since where statement filters off the table rows by the condition.
PROC SQL;
CREATE TABLE want AS
SELECT *, COUNT (makers) AS makers_1
FROM have where makers in ('BDM','TDM') and p-score1 = 100 and flag = 'engaged'
group by cid, country;
QUIT;
the data step i am using is not giving me the desired results-
data want;
set have;
by cid country;
if p-score1 = 100 and flag = 'engaged' and makers in ('TDM','BDM') then
eng_p_makers = count(makers);
else eng_p_makers = 0;
run;
Just to be clear, do you prefer an SQL or a data step solution to do this?
Ok. Why is total-makers=3 for CID=77777, Country=US? There are 4 rows that satisfy the criteria?
I think this is what you want
data table1;
input CID GID country $ Makers :$20. flag :$20. Pscore1 Escore2;
infile datalines dlm=',' dsd;
datalines;
1234,567,UK,TDM,engaged,100,100
1234,567,UK,BDM,engaged,0,100
1234,567,UK,Business,unengaged,0,100
62438,99790,NZ,BDM,engaged,100,0
77777,88888,US,technical,engaged,0,100
77777,88888,US,TDM,engaged,100,0
77777,88888,US,Business,engaged,100,0
77777,88888,US,BDM,engaged,0,100
77777,88888,US,BDM,engaged,100,100
77777,88888,US,BDM,unengaged,100,0
77777,88888,US,Business,unengaged,0,0
77777,88888,US,,unengaged,0,0
77777,88888,US,,unengaged,0,0
64238,4724,JP,BDM,engaged,100,0
;
proc sql;
create table want as
select *,
sum(Makers in ('TDM', 'BDM')) as total_makers,
sum(Pscore1=100 & Makers in ('TDM', 'BDM')) as eng_p_makers,
sum(Escore2=100 & Makers in ('TDM', 'BDM')) as eng_e_makers
from table1
group by CID, country;
quit;
Can you please show what an output like that might look like?
I personally do not prefer SQL for all missing vars and yet wanted in output
data have;
if _n_=1 then do _n_=1 to 3;
output;
end;
input (CID GID country Makers flag)(:$10.) Pscore1 Escore2;
output;
cards;
1234 567 UK TDM engaged 100 100
1234 567 UK BDM engaged 0 100
1234 567 UK Business unengaged 0 100
62438 99790 NZ BDM engaged 100 0
77777 88888 US technical engaged 0 100
77777 88888 US TDM engaged 100 0
77777 88888 US Business engaged 100 0
77777 88888 US BDM engaged 0 100
77777 88888 US BDM engaged 100 100
77777 88888 US BDM unengaged 100 0
77777 88888 US Business unengaged 0 0
77777 88888 US . unengaged 0 0
77777 88888 US . unengaged 0 0
64238 4724 JP BDM engaged 100 0
;
data want;
if 0 then set have;
Total_makers=.;
eng_p_makers=0;
eng_e_makers=0;
do _n_=1 by 1 until(last.country);
set have;
by cid country notsorted;
Total_makers=sum(Total_makers,Makers in ('BDM','TDM'));
eng_p_makers=sum(eng_p_makers,Pscore1=100 and flag='engaged' and Makers in ('BDM','TDM'));
eng_e_makers=sum(eng_e_makers,escore2 = 100 and flag='engaged' and Makers in ('BDM','TDM'));
end;
do _n_=1 to _n_;
set have;
output;
end;
run;
"where in i the total sum is not getting update to all other rows of the respective CID by country." -- Can you please elaborate this like which one you are referring to?
Coz all I did was copy past your condition that you wrote at the top and sum(sum, boolean(1,0)). So did i miss anything ?
This is the result i receive now...
CID | GID | country | Makers | flag | P-score1 | E-score2 | total-makers | eng_p_makers | eng_e_makers |
. | . | 0 | 0 | ||||||
. | . | 0 | 0 | ||||||
. | . | 0 | 0 | ||||||
1234 | 567 | UK | TDM | engaged | 100 | 100 | 2 | 1 | 2 |
1234 | 567 | UK | BDM | engaged | 0 | 100 | 2 | 1 | 2 |
1234 | 567 | UK | Business | unengaged | 0 | 100 | 0 | 0 | 0 |
62438 | 99790 | NZ | BDM | engaged | 100 | 0 | 1 | 1 | 0 |
77777 | 88888 | US | technical | engaged | 0 | 100 | 0 | 0 | 0 |
77777 | 88888 | US | TDM | engaged | 100 | 0 | 4 | 2 | 2 |
77777 | 88888 | US | Business | engaged | 100 | 0 | 0 | 0 | 0 |
77777 | 88888 | US | BDM | engaged | 0 | 100 | 4 | 2 | 2 |
77777 | 88888 | US | BDM | engaged | 100 | 100 | 4 | 2 | 2 |
77777 | 88888 | US | BDM | unengaged | 100 | 0 | 0 | 0 | 0 |
77777 | 88888 | US | Business | unengaged | 0 | 0 | 0 | 0 | 0 |
77777 | 88888 | US | unengaged | 0 | 0 | 0 | 0 | 0 | |
77777 | 88888 | US | unengaged | 0 | 0 | 0 | 0 | 0 | |
64238 | 4724 | JP | BDM | engaged | 100 | 0 | 1 | 1 | 0 |
You see i have underlined the 0's received, instead can have the total sum number being updated based on that CID.
Hi @codyV So you mean this condition I think
if eng_p_makers=0 then eng_p_makers=Total_makers;
if eng_e_makers=0 then eng_e_makers=Total_makers;
Updated full one:
data have;
if _n_=1 then do _n_=1 to 3;
output;
end;
input (CID GID country Makers flag)(:$10.) Pscore1 Escore2;
output;
cards;
1234 567 UK TDM engaged 100 100
1234 567 UK BDM engaged 0 100
1234 567 UK Business unengaged 0 100
62438 99790 NZ BDM engaged 100 0
77777 88888 US technical engaged 0 100
77777 88888 US TDM engaged 100 0
77777 88888 US Business engaged 100 0
77777 88888 US BDM engaged 0 100
77777 88888 US BDM engaged 100 100
77777 88888 US BDM unengaged 100 0
77777 88888 US Business unengaged 0 0
77777 88888 US . unengaged 0 0
77777 88888 US . unengaged 0 0
64238 4724 JP BDM engaged 100 0
;
data want;
if 0 then set have;
Total_makers=.;
eng_p_makers=0;
eng_e_makers=0;
do _n_=1 by 1 until(last.country);
set have;
by cid country notsorted;
Total_makers=sum(Total_makers,Makers in ('BDM','TDM'));
eng_p_makers=sum(eng_p_makers,Pscore1=100 and flag='engaged' and Makers in ('BDM','TDM'));
eng_e_makers=sum(eng_e_makers,escore2 = 100 and flag='engaged' and Makers in ('BDM','TDM'));
end;
do _n_=1 to _n_;
set have;
if eng_p_makers=0 then eng_p_makers=Total_makers;
if eng_e_makers=0 then eng_e_makers=Total_makers;
output;
end;
run;
I ran a test with the sample you gave, results here
eng_p_makers=0 eng_e_makers=0
eng_p_makers=0 eng_e_makers=0
eng_p_makers=0 eng_e_makers=0
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=1
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=1 eng_e_makers=1
NOTE: There were 17 observations read from the data set WORK.HAVE.
NOTE: There were 17 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 17 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.