BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
mcalde33
Obsidian | Level 7

I need to get my data to turn out like table B (below) , my code so far used is this:

 

OPTION PS= 58 LS= 72 NODATE NONUMBER;

DATA Events;
INPUT EmpID 1-4 @6 EV1 $ @10 EV2 $ @14 EV3 $ EV4 $ 18-20 EV5 $22-24 EV6 $;

DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;




PROC PRINT data=events NOOBS;
Title 'TableB';
RUN;

 

 

 

Table A

 

Emp Id

Ev1

Ev2

Ev3

Ev4

Ev5

Ev6

 1001

AB

AT

BTR

S

XYZ

AT

1001

AT

MY

LOV

.

LOV

LOV

1001

.

MY

.

S

MY

S

1001

AB

BTR

LOV

BTR

AT

AT

1001

AT

LOV

KUL

MY

.

S

1002

MY

LOV

.

AT

AT

BTR

1002

MY

AT

BTR

.

AT

MY

1003

BTR

MY

.

XYZ

MY

MY

1003

.

MY

MY

MY

MY

BTR

1003

BTR

AB

MY

.

BTR

.

1004

S

.

MY

MY

.

XYZ

1004

XYZ

AT

KUL

AB

KUL

S

1004

.

.

BTR

LOV

MY

MY

1004

MY

KUL

AT

KUL

XYZ

AB

 

Table B

 

Ev

N

AB

3

AT

3

BTR

4

S

2

XYZ

3

MY

4

LOV

3

KUL

2

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The code you posted to re-create your dataset probably will not work.  That is because it seems to have tab characters randomly inserted into it. (Yet another reason why you should not be putting tab characters into program files.).

 

Also in your dataset there are multiple observations for the same EMPID without any variable to distinguish them.  

That will make it more difficult to work with the file (and makes one wonder what the multiple observations are supposed to represent).

 

That is easy enough to fix by adding a counter.  Either one overall counter or on that resets to one when you start a new EMPID.

DATA Events;
  length EmpID 8 EV1-EV6 $3 ;
  input EmpID EV1-EV6 ;
  row+1;
DATALINES;
1001 AB  AT  BTR S   XYZ AT
1001 AT  MY  LOV .   LOV LOV
1001 .   MY  .   S   MY  S
1001 AB  BTR LOV BTR AT  AT
1001 AT  LOV KUL MY  .   S
1002 MY  LOV .   AT  AT  BTR
1002 MY  AT  BTR .   AT  MY
1003 BTR MY  .   XYZ MY  MY
1003 .   MY  MY  MY  MY  BTR
1003 BTR AB  MY  .   BTR .
1004 S   .   MY  MY  .   XYZ
1004 XYZ AT  KUL AB  KUL S
1004 .   .   BTR LOV MY  MY
1004 MY  KUL AT  KUL XYZ AB
;

Now you can use PROC TRANSPOSE to convert to a more useful structure.

proc transpose data=events out=tall(rename=(col1=event));
  by row empid;
  var ev1-ev6 ;
run;

Where it is easy enough to count how many times a particular event occurs for each EMPID.

proc freq data=tall ;
  tables event*empid / noprint out=counts ;
run;

Which can then be used to get a count of how many EMPID values there are per event.

proc freq data=counts ;
  tables event ;
run;

Result

Tom_0-1666470315067.png

 

View solution in original post

14 REPLIES 14
PeterClemmensen
Tourmaline | Level 20

I think this is what you want to do

 

data temp(drop = EV1 - EV6);
   set Events;
   array e EV:;
   do over e;
      ev = e;
      ouptut;
   end;
run;

proc sql;
   create table want as
   select EV
        , count(distinct EmpID) as N
   from temp
   where ev ne ''
   group by EV
   ;
quit;
PeterClemmensen
Tourmaline | Level 20

Also, you can do this in a single pass with the hash object.

 

data _null_;
   dcl hash h (ordered: "A");
   h.definekey ("ev");
   h.definedata ("ev", "N");
   h.definedone ();
 
   dcl hash u ();
   u.definekey ("EmpID", "ev");
   u.definedone ();
 
   do until (z);
      set Events end = z;
      array e EV:;

      do over e;
         ev = e;

         if h.find() ne 0 then call missing (N);
         if u.check() ne 0 then do;
            N = sum (N, 1);
            u.add();
         end;
         h.replace();
      end;
   end;
 
   h.output (dataset: "count_distinct");
   stop;
run;
Reeza
Super User
OPTION PS= 58 LS= 72;

Those options are for the listing destination. If you're using SAS EG, Viya, Studio or Foundation with HTML they're likely not doing what you expect. 


What are you expecting from those options?

mcalde33
Obsidian | Level 7
Not sure, I guess it was a habit when I was using it before. I thought it was standard, If theres no use for it in SAS studio then i'll remove it.
Tom
Super User Tom
Super User

It will still impact how the SAS log is formatted.

Tom
Super User Tom
Super User

The code you posted to re-create your dataset probably will not work.  That is because it seems to have tab characters randomly inserted into it. (Yet another reason why you should not be putting tab characters into program files.).

 

Also in your dataset there are multiple observations for the same EMPID without any variable to distinguish them.  

That will make it more difficult to work with the file (and makes one wonder what the multiple observations are supposed to represent).

 

That is easy enough to fix by adding a counter.  Either one overall counter or on that resets to one when you start a new EMPID.

DATA Events;
  length EmpID 8 EV1-EV6 $3 ;
  input EmpID EV1-EV6 ;
  row+1;
DATALINES;
1001 AB  AT  BTR S   XYZ AT
1001 AT  MY  LOV .   LOV LOV
1001 .   MY  .   S   MY  S
1001 AB  BTR LOV BTR AT  AT
1001 AT  LOV KUL MY  .   S
1002 MY  LOV .   AT  AT  BTR
1002 MY  AT  BTR .   AT  MY
1003 BTR MY  .   XYZ MY  MY
1003 .   MY  MY  MY  MY  BTR
1003 BTR AB  MY  .   BTR .
1004 S   .   MY  MY  .   XYZ
1004 XYZ AT  KUL AB  KUL S
1004 .   .   BTR LOV MY  MY
1004 MY  KUL AT  KUL XYZ AB
;

Now you can use PROC TRANSPOSE to convert to a more useful structure.

proc transpose data=events out=tall(rename=(col1=event));
  by row empid;
  var ev1-ev6 ;
run;

Where it is easy enough to count how many times a particular event occurs for each EMPID.

proc freq data=tall ;
  tables event*empid / noprint out=counts ;
run;

Which can then be used to get a count of how many EMPID values there are per event.

proc freq data=counts ;
  tables event ;
run;

Result

Tom_0-1666470315067.png

 

mcalde33
Obsidian | Level 7

I adjusted the code to prevent the Cumulative column and the percentage column from printing but somehow the rows AB, AT, MY get repeated?

I also want to prevent the row at the bottom from being printed "Frequency Missing=4" as well as the "The Freq procedure" title from being printed.

 see the code used and see the output I am getting:

 

 

 



DATA Events;
length EmpID 8 EV1-EV6 $3 ;
INPUT EmpID EV1-EV6 ;
row+1;


DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;

/*Using PROC TRANSPOSE to convert to a more useful structure*/
proc transpose data=events out=tall(rename=(col1=Event));
by row empid;
var ev1-ev6 ;
run;

/*Formating to count how many times a particular event occurs for each EMPID*/
proc freq data=tall;
tables Event*empid / noprint out=counts;
run;

/*Formating to get a count of how many EMPID values there are per event*/
proc freq data=counts ;
tables event /NOCUM NOPERCENT;
Title 'Table B';
run;



 

 

Table B

The FREQ Procedure

 

Event

Frequency

 

AB

3

AB

1

AT

3

AT

1

BTR

4

KUL

2

LOV

3

MY

4

MY

3

S

2

XYZ

3

Frequency Missing = 4

Tom
Super User Tom
Super User

You could eliminate the missing values at many places in that code.

 

For example by adding a WHERE statement in the last step:

proc freq data=counts ;
  where not missing(event);
  tables event /NOCUM NOPERCENT;
  Title 'Table B';
run;
mcalde33
Obsidian | Level 7
how do I rename the "frequency" column to "N". I have tried the rename option but it gave me an error.
Tom
Super User Tom
Super User

@mcalde33 wrote:
how do I rename the "frequency" column to "N". I have tried the rename option but it gave me an error.

Do you mean the COUNT variable?

tables event / out=event_count(rename=(count=N));

 

Or do you want to just make your own print out, in which case you can use any column header you want.

proc freq data=counts;
  tables event / noprint out=event_count;
run;

proc print data=event_count label ;
  var event count;
  label count='N' ;
run;

 

mcalde33
Obsidian | Level 7

I got it to work thanks!
Now I noticed the observations dont match how the desired outcome should be:

what im getting:
Table B

Ev N
AB 3
AB 1
AT 3
AT 1
BTR 4
KUL 2
LOV 3
MY 4
MY 3
S 2
XYZ 3


What i should be getting:
Table B



Ev N
AB 3
AT 3
BTR 4
S 2
XYZ 3
MY 4
LOV 3
KUL 2

 

Here is the code I am using:

 

 

DATA Events;
length EmpID 8 EV1-EV6 $3 ;
INPUT EmpID EV1-EV6 ;
row+1;


DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;

/*Using PROC TRANSPOSE to convert to a more useful structure*/
proc transpose data=events out=tall(rename=(col1=Ev));
by row empid;
var ev1-ev6 ;
run;

/*Formating to count how many times a particular event occurs for each EMPID*/
proc freq data=tall;
tables Ev*empid / noprint out=counts;
RUN;

/*Formating to get a count of how many EMPID values there are per event*/
ods noproctitle;
proc freq data=counts;
where not missing(ev);
tables ev /noprint out=event_count NOCUM NOPERCENT;
Title 'Table B';
run;

proc print data=event_count label NOOBS ;
var ev count;
label count='N' ;
run;

Reeza
Super User
DATA Events;
length EmpID 8 EV1-EV6 $3 ;
INPUT EmpID EV1-EV6 ;
row+1;
DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;



proc transpose data=events out=tall(rename=(col1=Ev));
by row empid;
var ev1-ev6 ;
run;

data tall;
set tall;
ev=trim(compress(ev, , 's')); *remove any extra spaces;
run;

/*Formating to count how many times a particular event occurs for each EMPID*/
proc freq data=tall;
tables Ev/  out=counts;
RUN;
Tom
Super User Tom
Super User

You have two different "AB" values.

Probably related the TAB characters that messed up your original posting.

If you are using SAS to edit your program there should be options in the editor to replace tabs with spaces. I really recommend turning that option on so that it is much harder to accidentally get tab characters inserted into your file.  For example in the SAS/Studio version available in SAS ODA here is what the preferences setting looks like:

Tom_0-1666715135059.png

But it might be other invisible characters.

Could just be leading spaces.

But could be other things like non-breaking spaces 'A0'x or Linefeed '0A'x or CarriageReturn '0D'x or really anthing that is not a normal letter or digit or punctuation symbol.

 

NOTE:  The ODS output system HIDES leading spaces.  You can make them more visible by using the $QUOTE. format with the variable.  But even then it is hard to see spaces when looking at text displayed with a proportional font.  If you can produce plain old text output (ODS LISTING is new terminology) then the leading spaces are much easier to see.

 

ballardw
Super User

@mcalde33 wrote:

I adjusted the code to prevent the Cumulative column and the percentage column from printing but somehow the rows AB, AT, MY get repeated?

I also want to prevent the row at the bottom from being printed "Frequency Missing=4" as well as the "The Freq procedure" title from being printed.

 see the code used and see the output I am getting:

 


DATA Events;
length EmpID 8 EV1-EV6 $3 ;
INPUT EmpID EV1-EV6 ;
row+1;


DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;

/*Using PROC TRANSPOSE to convert to a more useful structure*/
proc transpose data=events out=tall(rename=(col1=Event));
by row empid;
var ev1-ev6 ;
run;

/*Formating to count how many times a particular event occurs for each EMPID*/
proc freq data=tall;
tables Event*empid / noprint out=counts;
run;

/*Formating to get a count of how many EMPID values there are per event*/
proc freq data=counts ;
tables event /NOCUM NOPERCENT;
Title 'Table B';
run;



 

 

Table B

The FREQ Procedure

 

Event

Frequency

 

AB

3

AB

1

AT

3

AT

1

BTR

4

KUL

2

LOV

3

MY

4

MY

3

S

2

XYZ

3

Frequency Missing = 4


This is a symptom that you have either a leading space or more than one with some of the values. Proc Freq and most of the table generated output will left justify and drop the leading spaces.

 

See if this code shows the leading space(s);

proc tabulate data=counts ;
  class event ;
  classlev event / style=[asis=on];
  Title 'Table B';
  table event, n;
run;

The CLASSLEV statement provides options on how row and column header variable values display. The ASIS=on preserves such things as leading spaces when present.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 1899 views
  • 5 likes
  • 5 in conversation