BookmarkSubscribeRSS Feed
robwork
Calcite | Level 5

I am trying to dynamically build an output file depending on if data is present in the input file.

For example I have the following input file:

     Name     DOB               Add1          Add2         Addr3

1     John     10/10/80          data           data

2     Alan      11/11/81          data

3     Paul      10/10/79          data

In the ouput file I would not want Addr3 to be included as there is no data, obviously this is easy to see in a small file but I have a file with 1000's of records how can I write a progam to check each varaiable and only include variables that have data against at least 1 observation in the new output file.


Thanks

14 REPLIES 14
Linlin
Lapis Lazuli | Level 10

you  can run the code below to drop the variables with all missing values:

data one;

infile datalines missover;

input name $ name1 name2 name3 name4 $;

datalines;

A . . . .

B . . . .

C 1 . . .

D 2 . . .

;

options missing= ' ';

proc transpose data=one out=two;

var _all_;

run;

options missing='.';

data _null_;

length drop $500;

if 0 then set one nobs=nvar;

do until (done);

  set two end=done;

  if cmiss(of col:) eq nvar then drop=catx(' ',drop,_name_);

end;

call symputx('drop',drop);

run;

data want;

set one (drop=&drop);

run;

proc print data=want;run;

Linlin

robwork
Calcite | Level 5

sorry little confused, do you have to transpose the data? an ideal ouput would be

Name     Name 1

A          .

B          .

C          1

D          2

Linlin
Lapis Lazuli | Level 10

Yes. the output from my code is

                          Obs    name    name1

                                1      A        .

                                2      B        .

                                3      C        1

                                4      D        2

The purpose of the code is to drop the variables that have no value. what SAS version do you use? I think 'cmiss' is only available for sas 9.2  and 9.3.

robwork
Calcite | Level 5

I am using WPS Workbench at the moment and unfortunately it does not seem to support all the latest functions

Haikuo
Onyx | Level 15

There must be better approaches, but it is already 10pm and I could barely open my eye lips. So here is my 'brute force' solution:

data have;

infile cards missover;

input   Name  $   DOB  :$10.    (Add1  Add2 Addr3) (:$);

cards;

     John     10/10/80          data           data

     Alan      11/11/81          data

     Paul      10/10/79          data

;

data _null_;

length vdrop $300.;

    set have nobs=nobs end=done;

    array  v (5) $ name--addr3;

    array m (5) m1-m5;

    do i=1 to 5;

      if missing(v(i)) then m(i)+1;

    end;

     if (done) then do i=1 to 5;

         if m(i)=nobs then do;

            vdrop=catx(' ',vdrop,vname(v(i)));

          end;

          end;

          call symput ('vdrop', vdrop);

      call execute ('data want; set have (drop=&vdrop);run;');

      run;

Regards,

Haikuo

Edit: You should go for Linlin's solution, since you only have 1k rows, unless you have more than 32,767 rows.

robwork
Calcite | Level 5

Hi Haikuo

Will try your solution now, having just checked my files I found some of them have over 200000 rows so your solution might be best.

Thanks, Robbie

robwork
Calcite | Level 5

Unfortunately and I can only assume that this is because I am using WPS workbench that output still has blank observations in.

                          Obs    Name      DOB       Add1    Add2

                             1                                    
                             2     John    10/10/80    data    data
                             3                                    
                             4     Alan    11/11/81    data       
                             5                                    
                             6     Paul    10/10/79    data       
                             7                                    

Ksharp
Super User

I have coded it before .

data have;
infile cards missover;
input   Name  $   DOB  :$10.    (Add1  Add2 Addr3) (:$);
cards;
     John     10/10/80          data           data
     Alan      11/11/81          data
     Paul      10/10/79          data
;
run;
proc sql noprint;
 select catx(' ','nmiss(',name,') as',name) into : list separated by ','
  from dictionary.columns 
   where libname='WORK' and memname='HAVE';
 select count(*) into : nobs from have;
 create table temp as
  select &list from have;
quit;
data _null_;
 set temp;
 length _list $ 4000;
 array _x{*} _numeric_;
 do i=1 to dim(_x);
  if _x{i} eq &nobs then _list=catx(' ',_list,vname(_x{i}));
 end;
 call symputx('drop',_list);
run;
data want;
 set have(drop=&drop);
run;


Ksharp

Haikuo
Onyx | Level 15

Hooray! I knew that we deserve to see something of this quality.

Two things learned:

1. nmiss() can be applied on character variables.

2. nmiss() can be used as 'summary functions' in SQL.

One question: Please explain the construct of nmiss(',name,'). I know this is the only way that works, but how and why?

Thank you, Ksharp!

Haikuo

Ksharp
Super User

HaiKuo,

I just make a series of string like:

nmiss(sex) as sex ,nmiss(name) as name,nmiss(weight) as weight  ..........

and feed them to SELECT statement in SQL to get the number of missing value for all the variables.

Ksharp

Haikuo
Onyx | Level 15

Ksharp,

Thanks for the answer. I was just wondering why it has to be like:  nmiss(',name,') instead of nmiss(name) or nmiss(,name,) or nmiss(,'name',) etc. I know this is the only way that works, but I can't figure it out how this construct be translated to nmiss(sex) as sex ,nmiss(name) as name,nmiss(weight) as weight ....

Any reference link will also be appreciated.

Haikuo

Linlin
Lapis Lazuli | Level 10

Is this helpful?

data have;

infile cards missover;

input   Name  $   DOB  :$10.    (Add1  Add2 Addr3) (:$);

cards;

     John     10/10/80          data           data

     Alan      11/11/81          data

     Paul      10/10/79          data

;

run;

proc sql noprint;

select catx(' ','nmiss(',name,') as',cat('n_missing_',name)) into : list separated by ','

  from dictionary.columns

   where libname='WORK' and memname='HAVE';

quit;

%put &list;

105  %put &list;

nmiss( Name ) as n_missing_Name,nmiss( DOB ) as n_missing_DOB,nmiss( Add1 ) as

n_missing_Add1,nmiss( Add2 ) as n_missing_Add2,nmiss( Addr3 ) as n_missing_Addr3

Haikuo
Onyx | Level 15

Thanks, LinLin. I know my brain stops working on Friday , so bear with me:

I am aware of the final outcome of &list, I am just could not figure out why nmiss(name) not working, while you have to use nmiss(',name,')?

Thanks,

Haikuo

Linlin
Lapis Lazuli | Level 10

Hi Haikuo,

catx(' ','nmiss(',name,')as ' ) will create a string "nmiss(variable-name)".

nmiss(variable-name) will create a number.

maybe you treat "'nmiss(',name,') as' "

"'nmiss(',name,') as'"

It should be

"'nmiss(',name,') as'"

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1534 views
  • 0 likes
  • 4 in conversation