BookmarkSubscribeRSS Feed
robwork
Calcite | Level 5

I am trying to dynamically build an output file depending on if data is present in the input file.

For example I have the following input file:

     Name     DOB               Add1          Add2         Addr3

1     John     10/10/80          data           data

2     Alan      11/11/81          data

3     Paul      10/10/79          data

In the ouput file I would not want Addr3 to be included as there is no data, obviously this is easy to see in a small file but I have a file with 1000's of records how can I write a progam to check each varaiable and only include variables that have data against at least 1 observation in the new output file.


Thanks

14 REPLIES 14
Linlin
Lapis Lazuli | Level 10

you  can run the code below to drop the variables with all missing values:

data one;

infile datalines missover;

input name $ name1 name2 name3 name4 $;

datalines;

A . . . .

B . . . .

C 1 . . .

D 2 . . .

;

options missing= ' ';

proc transpose data=one out=two;

var _all_;

run;

options missing='.';

data _null_;

length drop $500;

if 0 then set one nobs=nvar;

do until (done);

  set two end=done;

  if cmiss(of col:) eq nvar then drop=catx(' ',drop,_name_);

end;

call symputx('drop',drop);

run;

data want;

set one (drop=&drop);

run;

proc print data=want;run;

Linlin

robwork
Calcite | Level 5

sorry little confused, do you have to transpose the data? an ideal ouput would be

Name     Name 1

A          .

B          .

C          1

D          2

Linlin
Lapis Lazuli | Level 10

Yes. the output from my code is

                          Obs    name    name1

                                1      A        .

                                2      B        .

                                3      C        1

                                4      D        2

The purpose of the code is to drop the variables that have no value. what SAS version do you use? I think 'cmiss' is only available for sas 9.2  and 9.3.

robwork
Calcite | Level 5

I am using WPS Workbench at the moment and unfortunately it does not seem to support all the latest functions

Haikuo
Onyx | Level 15

There must be better approaches, but it is already 10pm and I could barely open my eye lips. So here is my 'brute force' solution:

data have;

infile cards missover;

input   Name  $   DOB  :$10.    (Add1  Add2 Addr3) (:$);

cards;

     John     10/10/80          data           data

     Alan      11/11/81          data

     Paul      10/10/79          data

;

data _null_;

length vdrop $300.;

    set have nobs=nobs end=done;

    array  v (5) $ name--addr3;

    array m (5) m1-m5;

    do i=1 to 5;

      if missing(v(i)) then m(i)+1;

    end;

     if (done) then do i=1 to 5;

         if m(i)=nobs then do;

            vdrop=catx(' ',vdrop,vname(v(i)));

          end;

          end;

          call symput ('vdrop', vdrop);

      call execute ('data want; set have (drop=&vdrop);run;');

      run;

Regards,

Haikuo

Edit: You should go for Linlin's solution, since you only have 1k rows, unless you have more than 32,767 rows.

robwork
Calcite | Level 5

Hi Haikuo

Will try your solution now, having just checked my files I found some of them have over 200000 rows so your solution might be best.

Thanks, Robbie

robwork
Calcite | Level 5

Unfortunately and I can only assume that this is because I am using WPS workbench that output still has blank observations in.

                          Obs    Name      DOB       Add1    Add2

                             1                                    
                             2     John    10/10/80    data    data
                             3                                    
                             4     Alan    11/11/81    data       
                             5                                    
                             6     Paul    10/10/79    data       
                             7                                    

Ksharp
Super User

I have coded it before .

data have;
infile cards missover;
input   Name  $   DOB  :$10.    (Add1  Add2 Addr3) (:$);
cards;
     John     10/10/80          data           data
     Alan      11/11/81          data
     Paul      10/10/79          data
;
run;
proc sql noprint;
 select catx(' ','nmiss(',name,') as',name) into : list separated by ','
  from dictionary.columns 
   where libname='WORK' and memname='HAVE';
 select count(*) into : nobs from have;
 create table temp as
  select &list from have;
quit;
data _null_;
 set temp;
 length _list $ 4000;
 array _x{*} _numeric_;
 do i=1 to dim(_x);
  if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));
 end;
 call symputx('drop',_list);
run;
data want;
 set have(drop=&drop);
run;


Ksharp

Haikuo
Onyx | Level 15

Hooray! I knew that we deserve to see something of this quality.

Two things learned:

1. nmiss() can be applied on character variables.

2. nmiss() can be used as 'summary functions' in SQL.

One question: Please explain the construct of nmiss(',name,'). I know this is the only way that works, but how and why?

Thank you, Ksharp!

Haikuo

Ksharp
Super User

HaiKuo,

I just make a series of string like:

nmiss(sex) as sex ,nmiss(name) as name,nmiss(weight) as weight  ..........

and feed them to SELECT statement in SQL to get the number of missing value for all the variables.

Ksharp

Haikuo
Onyx | Level 15

Ksharp,

Thanks for the answer. I was just wondering why it has to be like:  nmiss(',name,') instead of nmiss(name) or nmiss(,name,) or nmiss(,'name',) etc. I know this is the only way that works, but I can't figure it out how this construct be translated to nmiss(sex) as sex ,nmiss(name) as name,nmiss(weight) as weight ....

Any reference link will also be appreciated.

Haikuo

Linlin
Lapis Lazuli | Level 10

Is this helpful?

data have;

infile cards missover;

input   Name  $   DOB  :$10.    (Add1  Add2 Addr3) (:$);

cards;

     John     10/10/80          data           data

     Alan      11/11/81          data

     Paul      10/10/79          data

;

run;

proc sql noprint;

select catx(' ','nmiss(',name,') as',cat('n_missing_',name)) into : list separated by ','

  from dictionary.columns

   where libname='WORK' and memname='HAVE';

quit;

%put &list;

105  %put &list;

nmiss( Name ) as n_missing_Name,nmiss( DOB ) as n_missing_DOB,nmiss( Add1 ) as

n_missing_Add1,nmiss( Add2 ) as n_missing_Add2,nmiss( Addr3 ) as n_missing_Addr3

Haikuo
Onyx | Level 15

Thanks, LinLin. I know my brain stops working on Friday , so bear with me:

I am aware of the final outcome of &list, I am just could not figure out why nmiss(name) not working, while you have to use nmiss(',name,')?

Thanks,

Haikuo

Linlin
Lapis Lazuli | Level 10

Hi Haikuo,

catx(' ','nmiss(',name,')as ' ) will create a string "nmiss(variable-name)".

nmiss(variable-name) will create a number.

maybe you treat "'nmiss(',name,') as' "

"'nmiss(',name,') as'"

It should be

"'nmiss(',name,') as'"

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 2307 views
  • 0 likes
  • 4 in conversation