BookmarkSubscribeRSS Feed
codyV
Fluorite | Level 6

Hi,

 

I have a dataset  to which i need to get the count details of a variable without losing the number of rows. here the eg of the table i have and the expected result table i need. I am trying get 3 additional columns

1. total_makers -> which gives a total makers count of ('TDM', 'BDM' ) grouped by CID and country

2. eng_p_makers -> based on the P-score1=100 and flag=engaged and makers = TDM, BDM (only)

3. eng_e_makers -> based on the e-score2 = 100 and flag=engaged and makers = TDM, BDM (only)

 

Greatly appreciate your time and assistance in advance!

 

table 1      
CIDGIDcountryMakersflagP-score1E-score2
..     
..     
..     
1234567UKTDMengaged100100
1234567UKBDMengaged0100
1234567UKBusinessunengaged0100
6243899790NZBDMengaged1000
7777788888UStechnicalengaged0100
7777788888USTDMengaged1000
7777788888USBusinessengaged1000
7777788888USBDMengaged0100
7777788888USBDMengaged100100
7777788888USBDMunengaged1000
7777788888USBusinessunengaged00
7777788888US unengaged00
7777788888US unengaged00
642384724JPBDMengaged1000

 

Expected_Result         
CIDGIDcountryMakersflagP-score1E-score2total-makerseng_p_makerseng_e_makers
..      00
..      00
..      00
1234567UKTDMengaged100100212
1234567UKBDMengaged0100212
1234567UKBusinessunengaged0100212
6243899790NZBDMengaged1000110
7777788888UStechnicalengaged0100322
7777788888USTDMengaged1000322
7777788888USBusinessengaged1000322
7777788888USBDMengaged0100322
7777788888USBDMengaged100100322
7777788888USBDMunengaged1000322
7777788888USBusinessunengaged00322
7777788888US unengaged00322
7777788888US unengaged00322
642384724JPBDMengaged1000110

 

I did avoid proc sql, since where statement filters off the table rows by the condition.

PROC SQL;

         CREATE TABLE want AS

         SELECT *, COUNT (makers) AS makers_1

         FROM have where makers in ('BDM','TDM') and p-score1 = 100 and flag = 'engaged'

         group by cid, country;

QUIT;

 

the data step i am using is not giving me the desired results-

 

data want;

set have;

by cid country;

if p-score1 = 100 and flag = 'engaged' and makers in ('TDM','BDM'then

eng_p_makers = count(makers);

else eng_p_makers = 0;

run;

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

Just to be clear, do you prefer an SQL or a data step solution to do this?

codyV
Fluorite | Level 6
Any of them should do good, unless the number of rows ain't reduced.
PeterClemmensen
Tourmaline | Level 20

Ok. Why is total-makers=3 for CID=77777, Country=US? There are 4 rows that satisfy the criteria?

PeterClemmensen
Tourmaline | Level 20

I think this is what you want

 

data table1;
input CID GID country $ Makers :$20. flag :$20. Pscore1 Escore2;
infile datalines dlm=',' dsd;
datalines;
1234,567,UK,TDM,engaged,100,100
1234,567,UK,BDM,engaged,0,100
1234,567,UK,Business,unengaged,0,100
62438,99790,NZ,BDM,engaged,100,0
77777,88888,US,technical,engaged,0,100
77777,88888,US,TDM,engaged,100,0
77777,88888,US,Business,engaged,100,0
77777,88888,US,BDM,engaged,0,100
77777,88888,US,BDM,engaged,100,100
77777,88888,US,BDM,unengaged,100,0
77777,88888,US,Business,unengaged,0,0
77777,88888,US,,unengaged,0,0
77777,88888,US,,unengaged,0,0
64238,4724,JP,BDM,engaged,100,0
;

proc sql;
    create table want as
    select *,
           sum(Makers in ('TDM', 'BDM')) as total_makers,
           sum(Pscore1=100 & Makers in ('TDM', 'BDM')) as eng_p_makers,
           sum(Escore2=100 & Makers in ('TDM', 'BDM')) as eng_e_makers
    from table1
    group by CID, country;
quit;
codyV
Fluorite | Level 6
This provides me with the results for total makers, eng_p_makers, eng_e_makers only where the makers = TDM or BDM.

Is it possible that we can update the total count/sum of total makers, eng_p_makers, eng_e_makers to all other rows of a particular CID by country instead of having 0.
PeterClemmensen
Tourmaline | Level 20

Can you please show what an output like that might look like?

novinosrin
Tourmaline | Level 20

I personally do not prefer SQL for all missing vars and yet wanted in output

 


data have;
if _n_=1 then do _n_=1 to 3;
output;
end;
input (CID	GID	country	Makers	flag)(:$10.)	Pscore1	Escore2;
output;
cards;	
1234	567	UK	TDM	engaged	100	100
1234	567	UK	BDM	engaged	0	100
1234	567	UK	Business	unengaged	0	100
62438	99790	NZ	BDM	engaged	100	0
77777	88888	US	technical	engaged	0	100
77777	88888	US	TDM	engaged	100	0
77777	88888	US	Business	engaged	100	0
77777	88888	US	BDM	engaged	0	100
77777	88888	US	BDM	engaged	100	100
77777	88888	US	BDM	unengaged	100	0
77777	88888	US	Business	unengaged	0	0
77777	88888	US	 .	unengaged	0	0
77777	88888	US	 .	unengaged	0	0
64238	4724	JP	BDM	engaged	100	0
;


data want;
if 0 then set have;
Total_makers=.;
eng_p_makers=0;
eng_e_makers=0;
 do _n_=1 by 1 until(last.country);
  set have;
  by cid country notsorted;
  Total_makers=sum(Total_makers,Makers in ('BDM','TDM'));
  eng_p_makers=sum(eng_p_makers,Pscore1=100 and flag='engaged' and Makers in ('BDM','TDM'));
  eng_e_makers=sum(eng_e_makers,escore2 = 100 and flag='engaged' and Makers in ('BDM','TDM'));
 end;
 do _n_=1 to _n_;
  set have;
   output;
 end;
run;
codyV
Fluorite | Level 6
i tried this data step, where in i the total sum is not getting update to all other rows of the respective CID by country.

novinosrin
Tourmaline | Level 20

"where in i the total sum is not getting update to all other rows of the respective CID by country." -- Can you please elaborate this like which one you are referring to?

 

Coz all I did was copy past your condition that you wrote at the top and sum(sum, boolean(1,0)). So did i miss anything ?

codyV
Fluorite | Level 6

This is the result i receive now...

 

CIDGIDcountryMakersflagP-score1E-score2total-makerseng_p_makerseng_e_makers
..      00
..      00
..      00
1234567UKTDMengaged100100212
1234567UKBDMengaged0100212
1234567UKBusinessunengaged0100000
6243899790NZBDMengaged1000110
7777788888UStechnicalengaged0100000
7777788888USTDMengaged1000422
7777788888USBusinessengaged1000000
7777788888USBDMengaged0100422
7777788888USBDMengaged100100422
7777788888USBDMunengaged1000000
7777788888USBusinessunengaged00000
7777788888US unengaged00000
7777788888US unengaged00000
642384724JPBDMengaged1000110

 

You see i have underlined the 0's received, instead can have the total sum number being updated based on that CID.

novinosrin
Tourmaline | Level 20

Hi @codyV  So you mean this condition I think

 

 if eng_p_makers=0 then eng_p_makers=Total_makers;
  if eng_e_makers=0 then eng_e_makers=Total_makers;

Updated full one:


data have;
if _n_=1 then do _n_=1 to 3;
output;
end;
input (CID	GID	country	Makers	flag)(:$10.)	Pscore1	Escore2;
output;
cards;	
1234	567	UK	TDM	engaged	100	100
1234	567	UK	BDM	engaged	0	100
1234	567	UK	Business	unengaged	0	100
62438	99790	NZ	BDM	engaged	100	0
77777	88888	US	technical	engaged	0	100
77777	88888	US	TDM	engaged	100	0
77777	88888	US	Business	engaged	100	0
77777	88888	US	BDM	engaged	0	100
77777	88888	US	BDM	engaged	100	100
77777	88888	US	BDM	unengaged	100	0
77777	88888	US	Business	unengaged	0	0
77777	88888	US	 .	unengaged	0	0
77777	88888	US	 .	unengaged	0	0
64238	4724	JP	BDM	engaged	100	0
;


data want;
if 0 then set have;
Total_makers=.;
eng_p_makers=0;
eng_e_makers=0;
 do _n_=1 by 1 until(last.country);
  set have;
  by cid country notsorted;
  Total_makers=sum(Total_makers,Makers in ('BDM','TDM'));
  eng_p_makers=sum(eng_p_makers,Pscore1=100 and flag='engaged' and Makers in ('BDM','TDM'));
  eng_e_makers=sum(eng_e_makers,escore2 = 100 and flag='engaged' and Makers in ('BDM','TDM'));
 end;
 do _n_=1 to _n_;
  set have;
  if eng_p_makers=0 then eng_p_makers=Total_makers;
  if eng_e_makers=0 then eng_e_makers=Total_makers;
  output;
 end;
run;
codyV
Fluorite | Level 6
Sadly, i receive the same results!
novinosrin
Tourmaline | Level 20

I ran a test with the sample you gave, results here


eng_p_makers=0 eng_e_makers=0
eng_p_makers=0 eng_e_makers=0
eng_p_makers=0 eng_e_makers=0
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=1
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=1 eng_e_makers=1
NOTE: There were 17 observations read from the data set WORK.HAVE.
NOTE: There were 17 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 17 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1063 views
  • 2 likes
  • 3 in conversation