BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NTR
Fluorite | Level 6 NTR
Fluorite | Level 6

hi all; may you please assist. I have a dataset where one account can be in multiple industries. How do I calculate the number of industries in a data step.

 

Below is the example of dataset:

 

Account number     Industry

148060                   IN
148060                   CL
148060                   CC
148060                   HW
148060                   FC

92865                     PL

150021                   PL

150021                   MB

150021                   NL

 

 

the expected outcome is:

 

Account number         number of industries

148060                        5 

92865                          1

150021                        3

 

 

thanks ahead of time for your assistance.

1 ACCEPTED SOLUTION

Accepted Solutions
andreas_lds
Jade | Level 19

@PeterClemmensen Thanks for providing test-data in an usable form 😉

 

 

Here is  data-step, assuming that your source dataset is not sorted, but grouped by AccountNumber.

 

data want;
   set have(keep=AccountNumber);
   by AccountNumber notsorted;  

   attrib noi length= 8 label= 'number of industries';
   retain noi;

   if first.AccountNumber then do;
      noi = 0;
   end;

   noi = noi + 1;

   if last.AccountNumber then do;
      output;
   end;
run;

 

 

View solution in original post

7 REPLIES 7
PeterClemmensen
Tourmaline | Level 20

Why limit yourself to a data step?

 

data have;
input Accountnumber $ Industry $;
datalines;
148060 IN
148060 CL
148060 CC
148060 HW
148060 FC
92865 PL
150021 PL
150021 MB
150021 NL
;

proc sql;
   create table want as
   select Accountnumber
         ,count(distinct Industry) as number_of_industries
   from have
   group by Accountnumber;
quit;
andreas_lds
Jade | Level 19

@PeterClemmensen Thanks for providing test-data in an usable form 😉

 

 

Here is  data-step, assuming that your source dataset is not sorted, but grouped by AccountNumber.

 

data want;
   set have(keep=AccountNumber);
   by AccountNumber notsorted;  

   attrib noi length= 8 label= 'number of industries';
   retain noi;

   if first.AccountNumber then do;
      noi = 0;
   end;

   noi = noi + 1;

   if last.AccountNumber then do;
      output;
   end;
run;

 

 

novinosrin
Tourmaline | Level 20

Hi @andreas_lds  I'm afraid should the requirement be count of "distinct industries" for each account number, your code would yield incorrect results, like for the revised "have" 

 

data have;
input Accountnumber $ Industry $;
datalines;
148060 IN
148060 CL
148060 CL
148060 CC
148060 HW
148060 FC
92865 PL
92865 PL
150021 PL
150021 MB
150021 NL
;

 

Of course a simple proc sort and nodupkey will offset the problem:

 

proc sort data=have out=_have nodupkey;
by Accountnumber industry;
run;

 

data want;
set _have(keep=AccountNumber);
by AccountNumber notsorted;

attrib noi length= 8 label= 'number of industries';
retain noi;

if first.AccountNumber then do;
noi = 0;
end;

noi = noi + 1;

if last.AccountNumber then do;
output;
end;
run;

 

/*Or a DOW fun*/

 


proc sort data=have out=_have nodupkey;
by Accountnumber industry;
run;


data want;
do number_of_industries= 1 by 1 until(last.Accountnumber);
set _have;
by Accountnumber ;
end;
drop industry;
run;

 

 

 

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Post test data in the form of a datastep.

data have;
input Account Industry $;
datalines;
148060 IN
148060 CL
148060 CC
148060 CL
148060 HW
148060 FC
;
run;
proc freq data=have;
tables account*industry / out=want;
run;
PaigeMiller
Diamond | Level 26

How do I calculate the number of industries in a data step?

 

I just don't understand why people limit themselves to doing this in a data step, thus writing their own code; when SAS has created a procedure to do this, and added in lots amount of error checking and verification of results. See @RW9's answer.

--
Paige Miller
novinosrin
Tourmaline | Level 20

@PaigeMiller Good morning, I am afraid RW's proc freq will not produce the results that OP wants for the reason Proc freq counts the number of occurrences of values however OP wants the total number of distinct occurrences within a by group. 

PaigeMiller
Diamond | Level 26

This is easily fixed by doing two consecutive PROC FREQs

--
Paige Miller

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 956 views
  • 5 likes
  • 6 in conversation