BookmarkSubscribeRSS Feed
Midi
Obsidian | Level 7

Hello Everyone , so , I have This Data Set :

 

UserOrdersFam
2563154266
2563225166
2563154466
2563222228
2563233528

 

And What I want to get is The Distinct Number of Fam Per User :

So This My Code :

 

proc sql;
    create Table Max_Fam as
    select A*. count(distinct(Fam)) as Max_N_Fam
    from Base_For_Counts
    group by User 
	;
quit;

And This is What I get :

 

UserOrdersFamNum_Fam
25631542661
25632251661
25631544661
25632222281
25632335281

 

 

And What I want Is This Output :

 

UserOrdersFamNum_Fam
25631542661
2563225166
2563154466
25632222281
2563233528

 

Any Idea to get there ?

 

Any Help would be much appreciated , thank you

8 REPLIES 8
novinosrin
Tourmaline | Level 20

Hi @Midi  wouldn't the count(distinct Fam) for user=2563 be 2?

Midi
Obsidian | Level 7

Exactly , In this Case Yes.

novinosrin
Tourmaline | Level 20

So would have been nicer to post a correct sample to help folks not get confused

 


data have;
input User	Orders	Fam;
cards;
2563	1542	66
2563	2251	66
2563	1544	66
2563	2222	28
2563	2335	28
;

proc sql;
    create Table Max_Fam(drop=_user) as
    select *,user as _user, count(distinct(Fam)) as Max_N_Fam
    from have
    group by _User
	;
quit;
Midi
Obsidian | Level 7

Thank You , but i already have this output , what i want is for the cells in common to be joined and having the max that "covers" the three of them , basically not have the max for each cell , i don't know if that was clear , thank's again

novinosrin
Tourmaline | Level 20

data have;
input id	Orders	Fam;
cards;
2563	1542	66
2563	2251	66
2563	1544	66
2563	2222	28
2563	2335	28
;

data temp;
set have;
rownum+1;
run;
proc sql;
create table want(drop=c rownum) as
select *,ifn(min(rownum)=rownum,c,.) as Max_N_Fam
from
(select *,count(distinct fam) as c 
from temp
group by id)
group by id,fam
order by id,rownum;
quit;
Reeza
Super User
Why are you doing this? If it's for display purposes, using PROC REPORT is likely a better option that creating a dataset.
Kurt_Bremser
Super User

Use a data step:

data have;
input User :$4. Orders Fam;
datalines;
2563 1542 66
2563 2251 66
2563 1544 66
2563 2222 28
2563 2335 28
;

data want;
count = 0;
do until (last.user);
  set have;
  by user fam notsorted;
  if first.fam then count + 1;
end;
do until (last.user);
  set have;
  by user fam notsorted;
  if first.fam
  then num_fam = count;
  else num_fam = .;
  output;
end;
drop count;
run;

proc print data=want noobs;
run;

Result:

User	Orders	Fam	num_fam
2563	1542	66	2
2563	2251	66	.
2563	1544	66	.
2563	2222	28	2
2563	2335	28	.
hashman
Ammonite | Level 13

@Midi:

SAS is good enough to get what you want in a single step, even if the input data are completely unsorted. Note that below, a few records have been added to the sample input to make it more representative with respect to more than one user and different summary counts per user.

data have ;                                        
  input User Orders Fam ;                          
  cards ;                                          
2500  1000  55                                     
2500  2000  55                                     
2500  3000  55                                     
2563  1542  66                                     
2563  2251  66                                     
2563  1544  66                                     
2563  2222  28                                     
2563  2335  28                                     
;                                                  
run ;                                              
                                                   
data want ;                                        
  if _n_ = 1 then do ;                             
    dcl hash h () ;                                
    h.definekey ("user") ;                         
    h.definedata ("num_fam", "_n_") ;              
    h.definedone () ;                              
    dcl hash u () ;                                
    u.definekey ("user", "fam") ;                  
    u.definedone () ;                              
    do until (z) ;                                 
      set have end = z ;                           
      if h.find() ne 0 then num_fam = 0 ;          
      if u.check() ne 0 then do ;                  
        num_fam + 1 ;                              
        u.add() ;                                  
      end ;                                        
      h.replace() ;                                
    end ;                                          
  end ;                                            
  set have ;                                       
  h.find() ;                                       
  if _n_ = 0 then call missing (num_fam) ;         
  else h.replace (key:user, data:num_fam, data:0) ;
run ;                                              

At first glance, it may look kind of complex but in actuality it's pretty simple:

  1. The distinct counts of FAM are accumulated per user in hash table H using table U to see if a given [USER,FAM] combo has already been seen.
  2. These counts are matched by USER on the second pass through HAVE. If a given USER key-value hasn't already been encountered, the current record is output with non-missing NUM_FAM coming from table H, and _N_ in the corresponding item of table H is changed from the initial value of 1 to 0.
  3. Thus, on all subsequent encounters with the same input key-value of USER, _N_ coming from H is 0, which signals to set NUM_FAM to missing.    

Of course, if your input data are already sorted by [User,Fam] (or just grouped, as in your sample), it's still much simpler, as @Kurt_Bremser has already indicated (below, his double DoW-loop program is merely presented in a slightly different guise):

data want ;                                 
  do _n_ = 1 by 1 until (last.user) ;       
    set have ;                              
    by user fam notsorted ;                 
    Num_Fam = sum (num_fam, first.fam) ;    
  end ;                                     
  do _n_ = 1 to _n_ ;                       
    set have ;                              
    if _n_ > 1 then call missing (num_fam) ;
    output ;                                
  end ;                                     
run ;                                       

Kind regards

Paul D.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 947 views
  • 1 like
  • 5 in conversation