BookmarkSubscribeRSS Feed
Midi
Obsidian | Level 7

Hello Everyone , so , I have This Data Set :

 

UserOrdersFam
2563154266
2563225166
2563154466
2563222228
2563233528

 

And What I want to get is The Distinct Number of Fam Per User :

So This My Code :

 

proc sql;
    create Table Max_Fam as
    select A*. count(distinct(Fam)) as Max_N_Fam
    from Base_For_Counts
    group by User 
	;
quit;

And This is What I get :

 

UserOrdersFamNum_Fam
25631542661
25632251661
25631544661
25632222281
25632335281

 

 

And What I want Is This Output :

 

UserOrdersFamNum_Fam
25631542661
2563225166
2563154466
25632222281
2563233528

 

Any Idea to get there ?

 

Any Help would be much appreciated , thank you

8 REPLIES 8
novinosrin
Tourmaline | Level 20

Hi @Midi  wouldn't the count(distinct Fam) for user=2563 be 2?

Midi
Obsidian | Level 7

Exactly , In this Case Yes.

novinosrin
Tourmaline | Level 20

So would have been nicer to post a correct sample to help folks not get confused

 


data have;
input User	Orders	Fam;
cards;
2563	1542	66
2563	2251	66
2563	1544	66
2563	2222	28
2563	2335	28
;

proc sql;
    create Table Max_Fam(drop=_user) as
    select *,user as _user, count(distinct(Fam)) as Max_N_Fam
    from have
    group by _User
	;
quit;
Midi
Obsidian | Level 7

Thank You , but i already have this output , what i want is for the cells in common to be joined and having the max that "covers" the three of them , basically not have the max for each cell , i don't know if that was clear , thank's again

novinosrin
Tourmaline | Level 20

data have;
input id	Orders	Fam;
cards;
2563	1542	66
2563	2251	66
2563	1544	66
2563	2222	28
2563	2335	28
;

data temp;
set have;
rownum+1;
run;
proc sql;
create table want(drop=c rownum) as
select *,ifn(min(rownum)=rownum,c,.) as Max_N_Fam
from
(select *,count(distinct fam) as c 
from temp
group by id)
group by id,fam
order by id,rownum;
quit;
Reeza
Super User
Why are you doing this? If it's for display purposes, using PROC REPORT is likely a better option that creating a dataset.
Kurt_Bremser
Super User

Use a data step:

data have;
input User :$4. Orders Fam;
datalines;
2563 1542 66
2563 2251 66
2563 1544 66
2563 2222 28
2563 2335 28
;

data want;
count = 0;
do until (last.user);
  set have;
  by user fam notsorted;
  if first.fam then count + 1;
end;
do until (last.user);
  set have;
  by user fam notsorted;
  if first.fam
  then num_fam = count;
  else num_fam = .;
  output;
end;
drop count;
run;

proc print data=want noobs;
run;

Result:

User	Orders	Fam	num_fam
2563	1542	66	2
2563	2251	66	.
2563	1544	66	.
2563	2222	28	2
2563	2335	28	.
hashman
Ammonite | Level 13

@Midi:

SAS is good enough to get what you want in a single step, even if the input data are completely unsorted. Note that below, a few records have been added to the sample input to make it more representative with respect to more than one user and different summary counts per user.

data have ;                                        
  input User Orders Fam ;                          
  cards ;                                          
2500  1000  55                                     
2500  2000  55                                     
2500  3000  55                                     
2563  1542  66                                     
2563  2251  66                                     
2563  1544  66                                     
2563  2222  28                                     
2563  2335  28                                     
;                                                  
run ;                                              
                                                   
data want ;                                        
  if _n_ = 1 then do ;                             
    dcl hash h () ;                                
    h.definekey ("user") ;                         
    h.definedata ("num_fam", "_n_") ;              
    h.definedone () ;                              
    dcl hash u () ;                                
    u.definekey ("user", "fam") ;                  
    u.definedone () ;                              
    do until (z) ;                                 
      set have end = z ;                           
      if h.find() ne 0 then num_fam = 0 ;          
      if u.check() ne 0 then do ;                  
        num_fam + 1 ;                              
        u.add() ;                                  
      end ;                                        
      h.replace() ;                                
    end ;                                          
  end ;                                            
  set have ;                                       
  h.find() ;                                       
  if _n_ = 0 then call missing (num_fam) ;         
  else h.replace (key:user, data:num_fam, data:0) ;
run ;                                              

At first glance, it may look kind of complex but in actuality it's pretty simple:

  1. The distinct counts of FAM are accumulated per user in hash table H using table U to see if a given [USER,FAM] combo has already been seen.
  2. These counts are matched by USER on the second pass through HAVE. If a given USER key-value hasn't already been encountered, the current record is output with non-missing NUM_FAM coming from table H, and _N_ in the corresponding item of table H is changed from the initial value of 1 to 0.
  3. Thus, on all subsequent encounters with the same input key-value of USER, _N_ coming from H is 0, which signals to set NUM_FAM to missing.    

Of course, if your input data are already sorted by [User,Fam] (or just grouped, as in your sample), it's still much simpler, as @Kurt_Bremser has already indicated (below, his double DoW-loop program is merely presented in a slightly different guise):

data want ;                                 
  do _n_ = 1 by 1 until (last.user) ;       
    set have ;                              
    by user fam notsorted ;                 
    Num_Fam = sum (num_fam, first.fam) ;    
  end ;                                     
  do _n_ = 1 to _n_ ;                       
    set have ;                              
    if _n_ > 1 then call missing (num_fam) ;
    output ;                                
  end ;                                     
run ;                                       

Kind regards

Paul D.

 

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2328 views
  • 1 like
  • 5 in conversation