BookmarkSubscribeRSS Feed
Uknown_user
Quartz | Level 8

It looks like it will be the most efficient way how to run this report. Thanks for your suggestion.

Rick_SAS
SAS Super FREQ

> Thanks for your contribution to this post. Well yes, the default behaviour is to print a html file, however, if this exceeds limit, it simply pups up a window (when the script finishes execution) that asks if I wish to display the results or not, by clicking no, it displays nothing. Unfortunatelly, the html issue is not my case.

 

 

I won't debate with you. However, you might ask yourself "How does SAS know that the HTML file exceeds the limit?"  Maybe the HTML file is written to disk.....

 

Good luck!

Uknown_user
Quartz | Level 8

Well, that is true. I tried to set noprint opt but afterwards found out that in that case I am unable to reproduce the output which is obtained with ods table onewayfreqs=...

Patrick
Opal | Level 21

@Uknown_user

I believe what really would help to take this lengthy discussion forward and come up with a solution for you:

1. Create a representative sample data set (with just 2-3 variables) provided as a tested and working SAS data step creating this sample data.

2. Explain us the desired result based on this sample data set (show us what you want to get)

3. Let us know the actual volumes of your data source (you've done that already) and the source system (I believe it's SAS tables).

 

Try to spend sufficient time to come up with a representative data sample which contains ideally all the challenges of your real data (like ties).

Tom
Super User Tom
Super User

Sounds like you need a method to first weed out the non-categorical variables.

You could follow these steps.

1) Find out how many observations are in the dataset. For a SAS datsaet that is available in the metadata so no need to actually read the whole dataset.

2) If that is too large then you will probably want to take a random sample of the records in the original file and analyze that instead of the whole file.

3) You could use the NLEVELS option on PROC FREQ or COUNT(distinct XXX) in PROC SQL to find the number of distinct values for each variable.

4) Eliminate from your hunt for the TOP 5 any variable that has too many distinct values relative to the number of observations.

Uknown_user
Quartz | Level 8

I slightly changed build-in macro so now the code is capable of returning output even for biggest markets. The problem with insufficient memory was solved. As one of the contributors suggested, the problem was likely caused by writing huge html files into memory. Here is the amended code:

 

data loop;
do i = 1 to 1000;
output;
end;
run;

data demonstrate;
set loop;
_a = ' ';
_aa = .;
if mod(i,2) = 0 then _b = 'value1';
else _b = 'value2';
_x = floorz(abs(rand("NORMAL")*2))+1;
if _x > length(_b) then _x = length(_b) - 1;
if _x = 0 then _x = 1;
_y = floorz(abs(rand("NORMAL")*2))+1;
if _y + _x > length(_b) then _y = 1;
_c = substr(_b,_y,_x);
d = cats(of _:);
run;

%let file = demonstrate;

%_eg_conditional_dropds(WORK.FREQ&file,
                        WORK.TTAFTempTableAccumFreq,
                        WORK.TCONTempTableContents,);

DATA WORK.FREQ&file(LABEL="Frequency Counts for WORK.&file");  
    LENGTH DataSet $ 41 Variable $32 Label $ 256 Format $ 31 Value $ 32 Count Percent 8;  
    LABEL Count='Frequency Count' Percent='Percent of Total Frequency';  
    RETAIN DataSet Variable Label Format Value ' ' Count Percent 0;  
    STOP;  
RUN;

%global dataset obs;

%MACRO _EG_CHARACT(data, lib, dsn, catobs);  

DATA WORK.TTAFTempTableAccumFreq;  
    LENGTH DataSet $ 41 Variable $32 Label $ 256 Format $ 31 Value $ 32 Count Percent 8;  
    LABEL Count='Frequency Count' Percent='Percent of Total Frequency';  
    RETAIN DataSet Variable Label Format Value ' ' Count Percent 0;  
    STOP;  
RUN;

PROC CONTENTS
    DATA=&data.
    OUT=WORK.TCONTempTableContents  
    NOPRINT;
RUN;  

DATA _NULL_;
    CALL SYMPUT('numobs',PUT(numobs, 12.));
    STOP;
    SET WORK.TCONTempTableContents NOBS=numobs;
RUN;

%DO i=1 %to &numobs.;

DATA _NULL_;
POINTER=&i.;
SET WORK.TCONTempTableContents point=pointer;
CALL SYMPUT('var', QUOTE(name));
CALL SYMPUT('var_n', QUOTE(name) || "n");
CALL SYMPUT('type', PUT(type, 1.));
CALL SYMPUT('label', label);
CALL SYMPUT('format', format);
STOP;
RUN;

%LET charVarsFlag = 1;

PROC FREQ DATA=&data. NOPRINT;
    TABLES &var_n./MISSING OUT=WORK.TPFRTempTableFrequencies2;
RUN;

DATA WORK.TTAFTempTableAccumFreq;
DROP InVar;
LENGTH Value $ 32;
SET WORK.TTAFTempTableAccumFreq
    WORK.TPFRTempTableFrequencies2(IN=intemp RENAME=(&var_n.=InVar));
IF intemp = 1 THEN DO;
    Value=InVar;
    Variable=&var.;
    Label="%nrbquote(&label.)";
    DataSet="&lib..&dsn.";
    Format="&FORMAT.";
END;
RUN;

%END;

%IF &charVarsFlag = 1 %THEN
  %DO;

PROC SORT DATA=WORK.TTAFTempTableAccumFreq;
    WHERE dataset NE ' ';
    BY variable label descending count;
RUN;

DATA WORK.TTAFTempTableAccumFreq;
DROP i newcount newperc;
RETAIN i newcount newperc 0;
SET WORK.TTAFTempTableAccumFreq;
BY variable;
IF value=' ' THEN
    value='***Missing***';
%IF %EVAL(&catobs.) NE -1 %THEN
    %DO;
    IF FIRST.variable = 1 THEN
        i=1;
    ELSE
        i=i+1;
    IF i > %EVAL(&catobs.) THEN DO;
        newcount=newcount+count;
        newperc=newperc+percent;
    END;
    IF i > %EVAL(&catobs.) AND LAST.variable = 0 THEN
        DELETE;
    IF LAST.variable & i > %EVAL(&catobs.) THEN DO;
        value='***All other values***';
        count=newcount;
        percent=newperc;
        newcount=0;
        newperc=0;
    END;
    %END;
RUN;
%END;

%IF &charVarsFlag = 1 %THEN
%DO;
PROC APPEND BASE=WORK.FREQ&file DATA=WORK.TTAFTempTableAccumFreq FORCE;
RUN;
%END;

%MEND _EG_CHARACT;

%_EG_CHARACT(WORK.&file, WORK, &file, 5);

RUN;
QUIT;

%_eg_conditional_dropds(WORK.TTAFTempTableAccumFreq,
                        WORK.TCONTempTableContents,
                        WORK.TPFRTempTableFrequencies2);
TITLE; FOOTNOTE;

proc sort data=freq&file;
by variable;
run;

data variables_pf;
set
freq&file (keep=Variable);
run;

data variables;
set variables_pf;
by variable;
if first.variable then output;
run;

proc transpose data=freq&file(keep=variable value count) out=nums prefix=obs;
by variable;
run;

proc transpose data=freq&file (keep=variable value count) out=chars prefix=var;
by variable;
var value;
run;

data merged_trans (drop= _name_ _label_);
merge
variables (in=a)
nums (in=b)
chars (in=c);
by variable;
if a;
run;

 

At the moment, there still miss total number of complete observations and total number of missing observations. Is there any way how to add these easily (I found this post: http://support.sas.com/kb/44/124.html but that is another macro that also goes proc freq one by one so perhaps I could just add something in the current charact macro)? Thanks!

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 35 replies
  • 2093 views
  • 10 likes
  • 7 in conversation