BookmarkSubscribeRSS Feed
Reeza
Super User

Can you verify that the above variables are numeric not character. Proc Means will just ignore nulls/missing.

andy_wk
Calcite | Level 5

Reeza,

Some of the characters are alpha numeric. That is the reason.

Reeza
Super User

So your actual criteria is sum of numeric variables is 0 and if character variable the number of missing? You can't add character values...

andy_wk
Calcite | Level 5

I'm realizing that , Reeza.

I've tried to do a dos2unix on my csv files to see if it can be fix.

Tom
Super User Tom
Super User

First, if you have 6000 NUMERIC variables and you want to find the SUM of each over all observations you should use PROC SUMMARY to find the sums and then query those results to get the list of variables that are all zero or any other statistic that you want PROC SUMMARY to calculate.  Search this site for examples of code drop variables that are all missing for example.


Second , if you have the data in a CSV file then do NOT use PROC IMPORT to read it because it will try to guess whether the variables are numeric or character. It will frequently make some variable with missing values character instead of numeric.  If you know the structure then just read the file using a data step so that you have control over the variable types assigned.  So assuming that the first column is a date you could read the file with a data step like this.

data mydata ;

   infile 'mydata.csv' dsd firstobs=2 lrecl=30000 truncover termstr=CRLF ;

   informat date date9. ;

   input date col1-col6000 ;

run;


Since you are going to drop the empty ones anyway there is not much harm if you are unsure if the file has 5000 or 6000 columns, just read enough and then drop the empty ones. Or read one line and count how many variables there are.


andy_wk
Calcite | Level 5

Hi Tom,

I'm using a proc import. Is there an option to deal with empty columns like with infile?

Thanks.

jwillis
Quartz | Level 8

Andy,

The MONOTONIC() function is not supported in PROC SQL.  Using the

MONOTONIC() function in PROC SQL can cause missing or non-sequential

values to be returned.

http://support.sas.com/kb/15/138.html

andy_wk
Calcite | Level 5

Hi JWillis,

Fair enough. But in my case , it is working quite well as i'm not asking the function to do amazing things ,

just list the number of the observations, which will be like a rowid.

smiller933
Calcite | Level 5


andy_wk,

Here is a solution to your originally proposed problem. I came up with proc transpose, which was proposed earlier by LinusH. You mentioned that your original data has around 6000 values, but when transposing only the summarized data, the proc transpose operations take a trivial amount of time. Whatever you do to summarize the data will require way more resources and execution time.

It sounds like you want to do something else with the character values that are missing. If you could state your full requirements on that, maybe someone could come up with a solution for that as well.


Scott

data test_data;

   length num1 num2 num3 8 char1 char2 char3 $1;

   input num1 num2 num3 char1 char2 char3 @@;

   *  Just to prove that the transpose will perform just fine even when you have 6000 vars  *;

   retain num4-num6000 0;

datalines;

0 0 0 a b c 1 0 2 x y z 1 0 3 m m m

run;

proc summary data=test_data;

   var _numeric_;

   output out=temp1 (drop=_TYPE_ _FREQ_) sum=;

run;

proc transpose data=temp1 out=temp2 (where=(col1=0));

run;

proc transpose data=temp2 out=final (drop=_NAME_);

run;

stat_sas
Ammonite | Level 13

data have;
input col1 col2 col3;
datalines;
0   0   0
1   0   2
1   0   3
;
proc contents data=have out=new(where=(type=1));
run;

proc sql;
select name into :var_list separated by ' ' from new;
quit;

ods output summary=summary;
proc means data=have stackods sum;
var &var_list;
run;

proc sql;
select variable into :required separated by ' ' from summary
where sum=0;
quit;

%put &required;

marty_ca
Fluorite | Level 6

Hi, Here is a SAS macros solution that does not need a proc transpose.

*** Find all the column variables where all records are zero;

data data_file;

     input col1 col2 col3;

     cards;

     0 0 0

     1 0 2

     1 0 3

     ;

%macro find_zeros(input_data=);

*** Find all the column variables in the input dataset;

proc contents data = &input_data;

     noprint out = _varlst_ (keep = name);

*** Convert all your column variables to macro variables;

proc sql noprint;

     select count (*)

     into :incrn

     from _varlst_;

     select name

     into :varn1-:varn%left(&incrn)

     from _varlst_;

     quit;

*** Create a cumulative sum for each column variable.  The last

     observation in this dataset will contain the total sum.  A zero will

     indicate that all records for that column variable are zero.;


data cumulative_sum;

     input &input_data;

    

     %do k=1 %to &incrn;


          %let check_zero_&&varn&k=0;


          check_zero_&&varn&k + &&varn&k;


          retain check_zero_&&varn&k;


     %end;


*** Now list all your variables and those with a zero entry are the ones you want;


data zero_variables_present(drop=%do y=1 %to &incrn;


                                                                 &&varn&y


                                                            %end;

                                                  );

     set cumulative_sum end=last;


     %do m=1 %to &incrn;


          if last and check_zero_&&varn&m = 0 then output;


     %end;


*** Reduce the list to only those variables that are zero;


data keep_only_zero_variables(%do k=1 %to &incrn;


                                                            keep=


                                                                 %if check_zero_&&varn&k = 0 %then %do;


                                                                      check_zero_&&varn&k


                                                                 %end;


                                                           %end;

                                                            );

     set zero_variables_present;


      run;quit;


%mend;


%find_zeros(input_data=data_file);


The dataset,  "zero_variables_present" should give the result


Obs     check_zero_col1     check_zero_col2     check_zero_col3

1                                     2                                 0                                5


With the dataset, "keep_only_zero_variables" giving


Obs     check_zero_col2    

1                                        0


Which is the desired result.


sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 25 replies
  • 4763 views
  • 3 likes
  • 12 in conversation