I need to get my data to turn out like table B (below) , my code so far used is this:
OPTION PS= 58 LS= 72 NODATE NONUMBER;
DATA Events;
INPUT EmpID 1-4 @6 EV1 $ @10 EV2 $ @14 EV3 $ EV4 $ 18-20 EV5 $22-24 EV6 $;
DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;
PROC PRINT data=events NOOBS;
Title 'TableB';
RUN;
Table A
|
Table B
|
The code you posted to re-create your dataset probably will not work. That is because it seems to have tab characters randomly inserted into it. (Yet another reason why you should not be putting tab characters into program files.).
Also in your dataset there are multiple observations for the same EMPID without any variable to distinguish them.
That will make it more difficult to work with the file (and makes one wonder what the multiple observations are supposed to represent).
That is easy enough to fix by adding a counter. Either one overall counter or on that resets to one when you start a new EMPID.
DATA Events;
length EmpID 8 EV1-EV6 $3 ;
input EmpID EV1-EV6 ;
row+1;
DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
Now you can use PROC TRANSPOSE to convert to a more useful structure.
proc transpose data=events out=tall(rename=(col1=event));
by row empid;
var ev1-ev6 ;
run;
Where it is easy enough to count how many times a particular event occurs for each EMPID.
proc freq data=tall ;
tables event*empid / noprint out=counts ;
run;
Which can then be used to get a count of how many EMPID values there are per event.
proc freq data=counts ;
tables event ;
run;
Result
I think this is what you want to do
data temp(drop = EV1 - EV6);
set Events;
array e EV:;
do over e;
ev = e;
ouptut;
end;
run;
proc sql;
create table want as
select EV
, count(distinct EmpID) as N
from temp
where ev ne ''
group by EV
;
quit;
Also, you can do this in a single pass with the hash object.
data _null_;
dcl hash h (ordered: "A");
h.definekey ("ev");
h.definedata ("ev", "N");
h.definedone ();
dcl hash u ();
u.definekey ("EmpID", "ev");
u.definedone ();
do until (z);
set Events end = z;
array e EV:;
do over e;
ev = e;
if h.find() ne 0 then call missing (N);
if u.check() ne 0 then do;
N = sum (N, 1);
u.add();
end;
h.replace();
end;
end;
h.output (dataset: "count_distinct");
stop;
run;
OPTION PS= 58 LS= 72;
Those options are for the listing destination. If you're using SAS EG, Viya, Studio or Foundation with HTML they're likely not doing what you expect.
What are you expecting from those options?
It will still impact how the SAS log is formatted.
The code you posted to re-create your dataset probably will not work. That is because it seems to have tab characters randomly inserted into it. (Yet another reason why you should not be putting tab characters into program files.).
Also in your dataset there are multiple observations for the same EMPID without any variable to distinguish them.
That will make it more difficult to work with the file (and makes one wonder what the multiple observations are supposed to represent).
That is easy enough to fix by adding a counter. Either one overall counter or on that resets to one when you start a new EMPID.
DATA Events;
length EmpID 8 EV1-EV6 $3 ;
input EmpID EV1-EV6 ;
row+1;
DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
Now you can use PROC TRANSPOSE to convert to a more useful structure.
proc transpose data=events out=tall(rename=(col1=event));
by row empid;
var ev1-ev6 ;
run;
Where it is easy enough to count how many times a particular event occurs for each EMPID.
proc freq data=tall ;
tables event*empid / noprint out=counts ;
run;
Which can then be used to get a count of how many EMPID values there are per event.
proc freq data=counts ;
tables event ;
run;
Result
I adjusted the code to prevent the Cumulative column and the percentage column from printing but somehow the rows AB, AT, MY get repeated?
I also want to prevent the row at the bottom from being printed "Frequency Missing=4" as well as the "The Freq procedure" title from being printed.
see the code used and see the output I am getting:
DATA Events;
length EmpID 8 EV1-EV6 $3 ;
INPUT EmpID EV1-EV6 ;
row+1;
DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;
/*Using PROC TRANSPOSE to convert to a more useful structure*/
proc transpose data=events out=tall(rename=(col1=Event));
by row empid;
var ev1-ev6 ;
run;
/*Formating to count how many times a particular event occurs for each EMPID*/
proc freq data=tall;
tables Event*empid / noprint out=counts;
run;
/*Formating to get a count of how many EMPID values there are per event*/
proc freq data=counts ;
tables event /NOCUM NOPERCENT;
Title 'Table B';
run;
Table B
The FREQ Procedure
Event | Frequency |
| |
AB | 3 |
AB | 1 |
AT | 3 |
AT | 1 |
BTR | 4 |
KUL | 2 |
LOV | 3 |
MY | 4 |
MY | 3 |
S | 2 |
XYZ | 3 |
Frequency Missing = 4
You could eliminate the missing values at many places in that code.
For example by adding a WHERE statement in the last step:
proc freq data=counts ;
where not missing(event);
tables event /NOCUM NOPERCENT;
Title 'Table B';
run;
@mcalde33 wrote:
how do I rename the "frequency" column to "N". I have tried the rename option but it gave me an error.
Do you mean the COUNT variable?
tables event / out=event_count(rename=(count=N));
Or do you want to just make your own print out, in which case you can use any column header you want.
proc freq data=counts;
tables event / noprint out=event_count;
run;
proc print data=event_count label ;
var event count;
label count='N' ;
run;
I got it to work thanks!
Now I noticed the observations dont match how the desired outcome should be:
what im getting:
Table B
Ev N
AB 3
AB 1
AT 3
AT 1
BTR 4
KUL 2
LOV 3
MY 4
MY 3
S 2
XYZ 3
What i should be getting:
Table B
Ev N
AB 3
AT 3
BTR 4
S 2
XYZ 3
MY 4
LOV 3
KUL 2
Here is the code I am using:
DATA Events;
length EmpID 8 EV1-EV6 $3 ;
INPUT EmpID EV1-EV6 ;
row+1;
DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;
/*Using PROC TRANSPOSE to convert to a more useful structure*/
proc transpose data=events out=tall(rename=(col1=Ev));
by row empid;
var ev1-ev6 ;
run;
/*Formating to count how many times a particular event occurs for each EMPID*/
proc freq data=tall;
tables Ev*empid / noprint out=counts;
RUN;
/*Formating to get a count of how many EMPID values there are per event*/
ods noproctitle;
proc freq data=counts;
where not missing(ev);
tables ev /noprint out=event_count NOCUM NOPERCENT;
Title 'Table B';
run;
proc print data=event_count label NOOBS ;
var ev count;
label count='N' ;
run;
DATA Events;
length EmpID 8 EV1-EV6 $3 ;
INPUT EmpID EV1-EV6 ;
row+1;
DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;
proc transpose data=events out=tall(rename=(col1=Ev));
by row empid;
var ev1-ev6 ;
run;
data tall;
set tall;
ev=trim(compress(ev, , 's')); *remove any extra spaces;
run;
/*Formating to count how many times a particular event occurs for each EMPID*/
proc freq data=tall;
tables Ev/ out=counts;
RUN;
You have two different "AB" values.
Probably related the TAB characters that messed up your original posting.
If you are using SAS to edit your program there should be options in the editor to replace tabs with spaces. I really recommend turning that option on so that it is much harder to accidentally get tab characters inserted into your file. For example in the SAS/Studio version available in SAS ODA here is what the preferences setting looks like:
But it might be other invisible characters.
Could just be leading spaces.
But could be other things like non-breaking spaces 'A0'x or Linefeed '0A'x or CarriageReturn '0D'x or really anthing that is not a normal letter or digit or punctuation symbol.
NOTE: The ODS output system HIDES leading spaces. You can make them more visible by using the $QUOTE. format with the variable. But even then it is hard to see spaces when looking at text displayed with a proportional font. If you can produce plain old text output (ODS LISTING is new terminology) then the leading spaces are much easier to see.
@mcalde33 wrote:
I adjusted the code to prevent the Cumulative column and the percentage column from printing but somehow the rows AB, AT, MY get repeated?
I also want to prevent the row at the bottom from being printed "Frequency Missing=4" as well as the "The Freq procedure" title from being printed.
see the code used and see the output I am getting:
DATA Events;
length EmpID 8 EV1-EV6 $3 ;
INPUT EmpID EV1-EV6 ;
row+1;
DATALINES;
1001 AB AT BTR S XYZ AT
1001 AT MY LOV . LOV LOV
1001 . MY . S MY S
1001 AB BTR LOV BTR AT AT
1001 AT LOV KUL MY . S
1002 MY LOV . AT AT BTR
1002 MY AT BTR . AT MY
1003 BTR MY . XYZ MY MY
1003 . MY MY MY MY BTR
1003 BTR AB MY . BTR .
1004 S . MY MY . XYZ
1004 XYZ AT KUL AB KUL S
1004 . . BTR LOV MY MY
1004 MY KUL AT KUL XYZ AB
;
RUN;
/*Using PROC TRANSPOSE to convert to a more useful structure*/
proc transpose data=events out=tall(rename=(col1=Event));
by row empid;
var ev1-ev6 ;
run;
/*Formating to count how many times a particular event occurs for each EMPID*/
proc freq data=tall;
tables Event*empid / noprint out=counts;
run;
/*Formating to get a count of how many EMPID values there are per event*/
proc freq data=counts ;
tables event /NOCUM NOPERCENT;
Title 'Table B';
run;
Table B
The FREQ Procedure
Event
Frequency
AB
3
AB
1
AT
3
AT
1
BTR
4
KUL
2
LOV
3
MY
4
MY
3
S
2
XYZ
3
Frequency Missing = 4
This is a symptom that you have either a leading space or more than one with some of the values. Proc Freq and most of the table generated output will left justify and drop the leading spaces.
See if this code shows the leading space(s);
proc tabulate data=counts ; class event ; classlev event / style=[asis=on]; Title 'Table B'; table event, n; run;
The CLASSLEV statement provides options on how row and column header variable values display. The ASIS=on preserves such things as leading spaces when present.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.