BookmarkSubscribeRSS Feed
codyV
Fluorite | Level 6

Hi,

 

I have a dataset  to which i need to get the count details of a variable without losing the number of rows. here the eg of the table i have and the expected result table i need. I am trying get 3 additional columns

1. total_makers -> which gives a total makers count of ('TDM', 'BDM' ) grouped by CID and country

2. eng_p_makers -> based on the P-score1=100 and flag=engaged and makers = TDM, BDM (only)

3. eng_e_makers -> based on the e-score2 = 100 and flag=engaged and makers = TDM, BDM (only)

 

Greatly appreciate your time and assistance in advance!

 

table 1      
CIDGIDcountryMakersflagP-score1E-score2
..     
..     
..     
1234567UKTDMengaged100100
1234567UKBDMengaged0100
1234567UKBusinessunengaged0100
6243899790NZBDMengaged1000
7777788888UStechnicalengaged0100
7777788888USTDMengaged1000
7777788888USBusinessengaged1000
7777788888USBDMengaged0100
7777788888USBDMengaged100100
7777788888USBDMunengaged1000
7777788888USBusinessunengaged00
7777788888US unengaged00
7777788888US unengaged00
642384724JPBDMengaged1000

 

Expected_Result         
CIDGIDcountryMakersflagP-score1E-score2total-makerseng_p_makerseng_e_makers
..      00
..      00
..      00
1234567UKTDMengaged100100212
1234567UKBDMengaged0100212
1234567UKBusinessunengaged0100212
6243899790NZBDMengaged1000110
7777788888UStechnicalengaged0100322
7777788888USTDMengaged1000322
7777788888USBusinessengaged1000322
7777788888USBDMengaged0100322
7777788888USBDMengaged100100322
7777788888USBDMunengaged1000322
7777788888USBusinessunengaged00322
7777788888US unengaged00322
7777788888US unengaged00322
642384724JPBDMengaged1000110

 

I did avoid proc sql, since where statement filters off the table rows by the condition.

PROC SQL;

         CREATE TABLE want AS

         SELECT *, COUNT (makers) AS makers_1

         FROM have where makers in ('BDM','TDM') and p-score1 = 100 and flag = 'engaged'

         group by cid, country;

QUIT;

 

the data step i am using is not giving me the desired results-

 

data want;

set have;

by cid country;

if p-score1 = 100 and flag = 'engaged' and makers in ('TDM','BDM'then

eng_p_makers = count(makers);

else eng_p_makers = 0;

run;

13 REPLIES 13
PeterClemmensen
Tourmaline | Level 20

Just to be clear, do you prefer an SQL or a data step solution to do this?

codyV
Fluorite | Level 6
Any of them should do good, unless the number of rows ain't reduced.
PeterClemmensen
Tourmaline | Level 20

Ok. Why is total-makers=3 for CID=77777, Country=US? There are 4 rows that satisfy the criteria?

PeterClemmensen
Tourmaline | Level 20

I think this is what you want

 

data table1;
input CID GID country $ Makers :$20. flag :$20. Pscore1 Escore2;
infile datalines dlm=',' dsd;
datalines;
1234,567,UK,TDM,engaged,100,100
1234,567,UK,BDM,engaged,0,100
1234,567,UK,Business,unengaged,0,100
62438,99790,NZ,BDM,engaged,100,0
77777,88888,US,technical,engaged,0,100
77777,88888,US,TDM,engaged,100,0
77777,88888,US,Business,engaged,100,0
77777,88888,US,BDM,engaged,0,100
77777,88888,US,BDM,engaged,100,100
77777,88888,US,BDM,unengaged,100,0
77777,88888,US,Business,unengaged,0,0
77777,88888,US,,unengaged,0,0
77777,88888,US,,unengaged,0,0
64238,4724,JP,BDM,engaged,100,0
;

proc sql;
    create table want as
    select *,
           sum(Makers in ('TDM', 'BDM')) as total_makers,
           sum(Pscore1=100 & Makers in ('TDM', 'BDM')) as eng_p_makers,
           sum(Escore2=100 & Makers in ('TDM', 'BDM')) as eng_e_makers
    from table1
    group by CID, country;
quit;
codyV
Fluorite | Level 6
This provides me with the results for total makers, eng_p_makers, eng_e_makers only where the makers = TDM or BDM.

Is it possible that we can update the total count/sum of total makers, eng_p_makers, eng_e_makers to all other rows of a particular CID by country instead of having 0.
PeterClemmensen
Tourmaline | Level 20

Can you please show what an output like that might look like?

novinosrin
Tourmaline | Level 20

I personally do not prefer SQL for all missing vars and yet wanted in output

 


data have;
if _n_=1 then do _n_=1 to 3;
output;
end;
input (CID	GID	country	Makers	flag)(:$10.)	Pscore1	Escore2;
output;
cards;	
1234	567	UK	TDM	engaged	100	100
1234	567	UK	BDM	engaged	0	100
1234	567	UK	Business	unengaged	0	100
62438	99790	NZ	BDM	engaged	100	0
77777	88888	US	technical	engaged	0	100
77777	88888	US	TDM	engaged	100	0
77777	88888	US	Business	engaged	100	0
77777	88888	US	BDM	engaged	0	100
77777	88888	US	BDM	engaged	100	100
77777	88888	US	BDM	unengaged	100	0
77777	88888	US	Business	unengaged	0	0
77777	88888	US	 .	unengaged	0	0
77777	88888	US	 .	unengaged	0	0
64238	4724	JP	BDM	engaged	100	0
;


data want;
if 0 then set have;
Total_makers=.;
eng_p_makers=0;
eng_e_makers=0;
 do _n_=1 by 1 until(last.country);
  set have;
  by cid country notsorted;
  Total_makers=sum(Total_makers,Makers in ('BDM','TDM'));
  eng_p_makers=sum(eng_p_makers,Pscore1=100 and flag='engaged' and Makers in ('BDM','TDM'));
  eng_e_makers=sum(eng_e_makers,escore2 = 100 and flag='engaged' and Makers in ('BDM','TDM'));
 end;
 do _n_=1 to _n_;
  set have;
   output;
 end;
run;
codyV
Fluorite | Level 6
i tried this data step, where in i the total sum is not getting update to all other rows of the respective CID by country.

novinosrin
Tourmaline | Level 20

"where in i the total sum is not getting update to all other rows of the respective CID by country." -- Can you please elaborate this like which one you are referring to?

 

Coz all I did was copy past your condition that you wrote at the top and sum(sum, boolean(1,0)). So did i miss anything ?

codyV
Fluorite | Level 6

This is the result i receive now...

 

CIDGIDcountryMakersflagP-score1E-score2total-makerseng_p_makerseng_e_makers
..      00
..      00
..      00
1234567UKTDMengaged100100212
1234567UKBDMengaged0100212
1234567UKBusinessunengaged0100000
6243899790NZBDMengaged1000110
7777788888UStechnicalengaged0100000
7777788888USTDMengaged1000422
7777788888USBusinessengaged1000000
7777788888USBDMengaged0100422
7777788888USBDMengaged100100422
7777788888USBDMunengaged1000000
7777788888USBusinessunengaged00000
7777788888US unengaged00000
7777788888US unengaged00000
642384724JPBDMengaged1000110

 

You see i have underlined the 0's received, instead can have the total sum number being updated based on that CID.

novinosrin
Tourmaline | Level 20

Hi @codyV  So you mean this condition I think

 

 if eng_p_makers=0 then eng_p_makers=Total_makers;
  if eng_e_makers=0 then eng_e_makers=Total_makers;

Updated full one:


data have;
if _n_=1 then do _n_=1 to 3;
output;
end;
input (CID	GID	country	Makers	flag)(:$10.)	Pscore1	Escore2;
output;
cards;	
1234	567	UK	TDM	engaged	100	100
1234	567	UK	BDM	engaged	0	100
1234	567	UK	Business	unengaged	0	100
62438	99790	NZ	BDM	engaged	100	0
77777	88888	US	technical	engaged	0	100
77777	88888	US	TDM	engaged	100	0
77777	88888	US	Business	engaged	100	0
77777	88888	US	BDM	engaged	0	100
77777	88888	US	BDM	engaged	100	100
77777	88888	US	BDM	unengaged	100	0
77777	88888	US	Business	unengaged	0	0
77777	88888	US	 .	unengaged	0	0
77777	88888	US	 .	unengaged	0	0
64238	4724	JP	BDM	engaged	100	0
;


data want;
if 0 then set have;
Total_makers=.;
eng_p_makers=0;
eng_e_makers=0;
 do _n_=1 by 1 until(last.country);
  set have;
  by cid country notsorted;
  Total_makers=sum(Total_makers,Makers in ('BDM','TDM'));
  eng_p_makers=sum(eng_p_makers,Pscore1=100 and flag='engaged' and Makers in ('BDM','TDM'));
  eng_e_makers=sum(eng_e_makers,escore2 = 100 and flag='engaged' and Makers in ('BDM','TDM'));
 end;
 do _n_=1 to _n_;
  set have;
  if eng_p_makers=0 then eng_p_makers=Total_makers;
  if eng_e_makers=0 then eng_e_makers=Total_makers;
  output;
 end;
run;
codyV
Fluorite | Level 6
Sadly, i receive the same results!
novinosrin
Tourmaline | Level 20

I ran a test with the sample you gave, results here


eng_p_makers=0 eng_e_makers=0
eng_p_makers=0 eng_e_makers=0
eng_p_makers=0 eng_e_makers=0
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=2
eng_p_makers=1 eng_e_makers=1
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=2 eng_e_makers=2
eng_p_makers=1 eng_e_makers=1
NOTE: There were 17 observations read from the data set WORK.HAVE.
NOTE: There were 17 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 17 observations and 10 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds