BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
wlierman
Lapis Lazuli | Level 10

I am continuing to get a method to count missing and non-missing obs for columns. I am using some of the code shared earlier which still has a couple issues.

 

In the data set:

Here is what the columns look like in the actual data set  (these are or would be numeric values -- age) DEYEage     DCOMage  DREMage  DEARage  DDDRSage  DOUTage  DLEAage  DLIMage    DMHDage  DPHYage  AgAcq1st

         .                .                    .              .                  .                    .                 .                 .              .                   .               .

         .                .                    .              .                   .                   .                 .                  .             .                     .

 

There are over 181K obs and age is probably missing for 99% of the obs (I'm not sure since that is what I want to determine)

After going this far in the code given earlier

Data OPERA.OPERA_ALL_2;
        call streaminit(45);
		length obs 8.;
		array vars(*) $3. DEYEage DCOMage DREMage DEARage DDRSage DOUTage DLEAage DLIMage DMHDage DPHYage AgAcq1st;
		                 /* DLEAdi DMHDdi DCOMdi DEARdi DEYEdi DDRSdi DOUTdi DREMdi DPHYdi */

		do obs= 1 to 181308;

		         do i=1 to dim(vars);
				         vars(i)= put(rand('table', 0.4, 0.4, 0.2), Z3.);
						 if vars(i) = '003' then call missing(vars(i));
						 
				 end;
				 output;
		end;
		drop i;
run;                    

*set input data set  name;
%let INPUT_DSN = OPERA.OPERA_DIS_AGE;
%let OUTPUT_DSN = OPERA.OPERA_All_5;
* create format for missing;

proc format;
  value $ missfmt ' ' = "Missing"  other = "Not_Missing_char";
  value nmissfmt . ="Missing" other="Not_Missing_num";
run;

* proc freq to count missing / non-missing;
ods select all;
* turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
    table _all_ / missing;
	format _numeric_ nmissfmt. _character_ $missfmt.;
run;

ods select all;

  the results of the proc freq (showing just one of the vars to reduce clutter)

 

DREMage                                         Cumulative               Cumulative
                                      Freq      Percent Freq              Percent 
                      Missing        36332      20.04         36332       20.04 
                  Not_Missing       144976      79.96        181308      100.00 

The problem is there are not 144976 non-missing values.  There are hardly any age responses coded from the survey.

 

These are also numeric values (responses are age).

 

I ran the same code given above just before I ran for the age. The only difference was the vars listed were

the character names for thew disabilities.

 

        /* DLEAdi DMHDdi DCOMdi DEARdi DEYEdi DDRSdi DOUTdi DREMdi DPHYdi */

 

The responses were character values either No  or Yes.   The table shown above is actually giving the percentages for 

the character value No and Yes counts not the numeric value for actual ages where there are ages given

 

The only code that varies is the vars listing.  Why doesn't the proc freq give the correct corresponding counts between the character (disability names) No/Yes counts and the much more sparse numeric (age) counts for the data set.

 

Thanks.

 

wlierman

    

 

            

1 ACCEPTED SOLUTION

Accepted Solutions
Sajid01
Meteorite | Level 14

Hello
Considering the variable as missing and non missing, this is also an approach.

/* Creating a format */
proc format;
  value $ missfmt ' ' = "Missing"  other = "Not_Missing_char";
  value nmissfmt . ="Missing" other="Not_Missing_num";
run;
/* Creating data */
data class;
Format name sex $missfmt.;
Format Age Height Weight nmissfmt.;
set sashelp.class;
if mod(_n_,3)=0 then name=" ";
if mod(_n_,4)=0 then Age=.;
if mod(_n_,5)=0 then Height=.;
if mod(_n_,5)=0 then Weight=.;
if mod(_n_,6)=0 then Sex=" ";
run;
/* counting missing and non missing values*/
proc freq data=class;
Tables _ALL_/ list nopercent nocum missing;
run;

The output will be like this

Screenshot 2021-05-11 7.39.59 PM.png Screenshot 2021-05-11 7.40.53 PM.png

 

View solution in original post

24 REPLIES 24
Reeza
Super User
Can you some data that reflects this situation as your example data here does not illustrate the issue.
ballardw
Super User

@wlierman wrote:

I am continuing to get a method to count missing and non-missing obs for columns. I am using some of the code shared earlier which still has a couple issues.

 

In the data set:

Here is what the columns look like in the actual data set  (these are or would be numeric values -- age) DEYEage     DCOMage  DREMage  DEARage  DDDRSage  DOUTage  DLEAage  DLIMage    DMHDage  DPHYage  AgAcq1st

         .                .                    .              .                  .                    .                 .                 .              .                   .               .

         .                .                    .              .                   .                   .                 .                  .             .                     .

 

There are over 181K obs and age is probably missing for 99% of the obs (I'm not sure since that is what I want to determine)

 

 

These are also numeric values (responses are age).

 

I ran the same code given above just before I ran for the age. The only difference was the vars listed were

the character names for thew disabilities.

 

        /* DLEAdi DMHDdi DCOMdi DEARdi DEYEdi DDRSdi DOUTdi DREMdi DPHYdi */

 

The responses were character values either No  or Yes.   The table shown above is actually giving the percentages for 

the character value No and Yes counts not the numeric value for actual ages where there are ages given

 

The only code that varies is the vars listing.  Why doesn't the proc freq give the correct corresponding counts between the character (disability names) No/Yes counts and the much more sparse numeric (age) counts for the data set.

 

Thanks.

 

wlierman

             


We would need some actual data to tell why a specific code does not show a specific variable count or not.

 

What do you get from proc freq for the variable DREMage without any formats involved? Yes it may be long but perhaps you should LOOK at it.

 

From what I can tell you are creating a data set OPERA.OPERA_ALL_2 and then running proc freq with data=OPERA.OPERA_DIS_AGE. So we have no idea what might be in Opera_dis_age

wlierman
Lapis Lazuli | Level 10

Hello,

 

I have attached a small excel file of the ranges of vars that I am seeking counts for.  The first section are the var names and the No/Yes responses.. I need counts of # of responses and then the # of blank obs. (It would be nice to have # of No, # of Yes , and # of blanks or missing.)  But even # of non-missing and # of missing would be very good.

 

The last section of vars are the var ages.  These are the numeric values.  As I mentioned in the notes these are numeric but they are very very sparse as you can see from these few obs which I sent.  So I need the # of non-missing and # of missing for these.

 

You've supplied great information. I likely lost much in translation.  I appreciate your assistance.

 

Thanks.

 

wlierman

Reeza
Super User
Remove the FORMAT statement from PROC FREQ and see if that gives you what you want.

format _numeric_ nmissfmt. _character_ $missfmt.;
wlierman
Lapis Lazuli | Level 10

Commenting out the Format line did not provide the solution.

 

I will follow the steps to paste in the code and all for the problem.

 

Thank you.

 

wlierman

ballardw
Super User

Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.

 

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

wlierman
Lapis Lazuli | Level 10

I will go through these steps.

Thank you.

wlierman

wlierman
Lapis Lazuli | Level 10

I used the code included in your message on how to generate sample data for the Ask the Expert forum and receiuved the following log:

 %let datasetname = OPERA.AKA_data_forum;
43   * number of obs you want to keep *;
44   %let obsKeep = 100;
45   %let source_path =
45 ! https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/
46   raw/865d2cf18f5150b8e887218dde0fc3951d0ff15b/data2datastep.sas;
47
48
49   filename reprex url "&source_path";
50   %include reprex;
ERROR: The connection has timed out..
ERROR: Cannot open %INCLUDE file REPREX.
51   filename reprex;
NOTE: Fileref REPREX has been deassigned.
52
53   option linesize=max;
54   %data2datastep(dsn=&dataSetName, obs=&obsKeep);
     -
     180
WARNING: Apparent invocation of macro DATA2DATASTEP not resolved.

ERROR 180-322: Statement is not valid or it is used out of proper order.

What is the next step?

 

Thanks.

 

wlierman

Reeza
Super User
Go to the URL, download the code and run it instead of using the automatic download. It seems like your company blocks external access to Github or files.
wlierman
Lapis Lazuli | Level 10

I downloaded the code (very long), set up the little data set of 100 obs to send over and nothing in the log.

 

Like the macro didn't instantiate.  This is the code that I ran and the log is down below

Proc Sql noprint;
CREATE Table OPERA.AKA_data_forum AS
  SELECT
        DLEAdi, DMHDdi, DCOMdi, DEARdi, DEYEdi, DDRSdi, DOUTdi, DREMdi, DPHYdi, DLIMdi, DISdi,
        DEYEage, DCOMage, DREMage, DEARage, DDRSage, DOUTage, DLEAage, DLIMage, DMHDage, DPHYage, AgAcq1st
  FROM OPERA.OPERA_all_2;
quit;

***** This is the section from SAS Experts___Reeza  ***;

* Data set you want to create demo data for *;
%let datasetname = OPERA.AKA_data_forum;
* number of obs you want to keep *;
%let obsKeep = 50;

*** have to download the code from the url to run ***;
*  %let source_path = https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/
raw/865d2cf18f5150b8e887218dde0fc3951d0ff15b/data2datastep.sas;

%macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
%local varlist fmtlist inputlist msgtype ;

%if %superq(obs)= %then %let obs=MAX;

/*added by FKhurshed to set linesize to max. 
This allows the macro to output the datalines correctly to the 
log and not run over lines. Current value is stored here to reset
at the end*/
%let LS= %sysfunc(getoption(LS));
option ls=max;

%let msgtype=NOTE;
%if %superq(dsn)= %then %do;
   %let msgtype=ERROR;
   %put &msgtype: You must specify a data set name;
   %put;
   %goto syntax;
%end;
%let dsn=%qupcase(%superq(dsn));
%if %superq(dsn)=!HELP %then %do;
%syntax:
   data _null_;
      call symput ('LS',getoption('LS','startupvalue'));
   run;
   options ls=100;
   %put &msgtype: &SYSMACRONAME macro help document:;
   %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
   %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
   %put &msgtype- dsn:    Name of the dataset to be converted. Required.;
   %put &msgtype- lib:    LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
   %put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
   %put &msgtype- file:   Fully qualified filename for the DATA step code produced. (Optional);
   %put &msgtype-         Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
   %put &msgtype- obs:    Max observations to include the created dataset.;
   %put &msgtype-         (Optional) Default is MAX (all observations);
   %put &msgtype- fmt:    Format the numeric variables in the output dataset like the original data set? ;
   %put &msgtype-         (YES|NO - Optional) Default is YES;
   %put &msgtype- lbl:    Reproduce column labels in the output dataset? ;
   %put &msgtype-         (YES|NO - Optional) Default is YES;
   %put;
   %put NOTE:   &SYSMACRONAME cannot be used in-line - it generates code.;
   %put NOTE-   Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
   %put NOTE-   Data set label is automatically re-created.;
   %put NOTE-   Only numeric column formats can be re-created, character column formats are ingnored.;
   %put NOTE-   Use !HELP to print these notes.;
   options ls=&ls;
   %return;
%end; 
%if %superq(fmt)= %then %let fmt=YES;
%let fmt=%qupcase(&fmt);
%if %superq(lbl)= %then %let lbl=YES;
%let lbl=%qupcase(&lbl);

%if %superq(lib)= %then %do;
    %let lib=%qscan(%superq(dsn),1,.);
    %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
    %else %let dsn=%qscan(&dsn,2,.);
%end;
%if %superq(outlib)= %then %let outlib=WORK;
%let lib=%qupcase(%superq(lib));
%let dsn=%qupcase(%superq(dsn));

%if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
   %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
   %let msgtype=NOTE;
   %GoTo syntax;
%end;

%if %superq(file)= %then %do;
   %let file=create_&outlib._&dsn._data.sas;
   %if %symexist(USERDIR) %then %let file=&userdir/&file;
%end;

%if %symexist(USERDIR) %then %do;
   %if %qscan(%superq(file),-1,/\)=%superq(file) %then
      %let file=&userdir/&file;
%end;

proc sql noprint;
select Name
      into :varlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
select case type
          when 'num' then 
             case 
                when missing(format) then cats(Name,':32.')
                else cats(Name,':',format)
             end 
          else cats(Name,':$',length,'.')
       end
      into :inputlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
%if %qsubstr(%superq(lbl),1,1)=Y %then %do;
select strip(catx('=',Name,put(label,$quote.)))
   into : lbllist separated by ' '
   from dictionary.columns 
   where libname="&lib"
     and memname="&dsn"
     and label is not null 
;
%end;
%else %let lbllist=;
select memlabel 
   into :memlabel trimmed
   from dictionary.tables
   where libname="&lib"
     and memname="&dsn"
;
%if %qsubstr(%superq(fmt),1,1)=Y %then %do;
select strip(catx(' ',Name,format))
      into :fmtlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
     and format is not null 
     and format not like '$%'
;
%end;
%else %let fmtlist=;
quit;

%put _local_;

data _null_;
   *file "&file" dsd;
   if _n_ =1 then do;
   %if %superq(memlabel)= %then %do;
      put "data &outlib..&dsn;";
   %end;
   %else %do;
      put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
   %end;
      put @3 "infile datalines dsd truncover;";
      put @3 "input %superq(inputlist);";
   %if not (%superq(fmtlist)=) %then %do;
      put @3 "format %superq(fmtlist);";
   %end;
   %if not (%superq(lbllist)=) %then %do;
      put @3 "label %superq(lbllist);";
   %end;
      put "datalines;";
   end;
   set &lib..&dsn(obs=&obs) end=last; 
   put &varlist @;
   if last then do;
      put;
      put ';;;;';
run;





filename reprex url "&source_path";
%include reprex;
filename reprex;

option linesize=max;
%data2datastep(dsn=&dataSetName, obs=&obsKeep);

Log follows:

 

Proc Sql noprint;
78   CREATE Table OPERA.AKA_data_forum AS
79     SELECT
80           DLEAdi, DMHDdi, DCOMdi, DEARdi, DEYEdi, DDRSdi, DOUTdi, DREMdi, DPHYdi, DLIMdi, DISdi,
81           DEYEage, DCOMage, DREMage, DEARage, DDRSage, DOUTage, DLEAage, DLIMage, DMHDage, DPHYage, AgAcq1st
82     FROM OPERA.OPERA_all_2;
NOTE: Compressing data set OPERA.AKA_DATA_FORUM decreased size by 38.87 percent.
      Compressed is 217 pages; un-compressed would require 355 pages.
NOTE: Table OPERA.AKA_DATA_FORUM created, with 181308 rows and 22 columns.

83   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.79 seconds
      cpu time            0.79 seconds


84   %let datasetname = OPERA.AKA_data_forum;
85   * number of obs you want to keep *;
86   %let obsKeep = 50;
87   %macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
88   %local varlist fmtlist inputlist msgtype ;
89
90   %if %superq(obs)= %then %let obs=MAX;
91
92   /*added by FKhurshed to set linesize to max.
93   This allows the macro to output the datalines correctly to the
94   log and not run over lines. Current value is stored here to reset
95   at the end*/
96   %let LS= %sysfunc(getoption(LS));
97   option ls=max;
98
99   %let msgtype=NOTE;
100  %if %superq(dsn)= %then %do;
101     %let msgtype=ERROR;
102     %put &msgtype: You must specify a data set name;
103     %put;
104     %goto syntax;
105  %end;
106  %let dsn=%qupcase(%superq(dsn));
107  %if %superq(dsn)=!HELP %then %do;
108  %syntax:
109     data _null_;
110        call symput ('LS',getoption('LS','startupvalue'));
111     run;
112     options ls=100;
113     %put &msgtype: &SYSMACRONAME macro help document:;
114     %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
115     %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
116     %put &msgtype- dsn:    Name of the dataset to be converted. Required.;
117     %put &msgtype- lib:    LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
118     %put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
119     %put &msgtype- file:   Fully qualified filename for the DATA step code produced. (Optional);
120     %put &msgtype-         Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
121     %put &msgtype- obs:    Max observations to include the created dataset.;
122     %put &msgtype-         (Optional) Default is MAX (all observations);
123     %put &msgtype- fmt:    Format the numeric variables in the output dataset like the original data set? ;
124     %put &msgtype-         (YES|NO - Optional) Default is YES;
125     %put &msgtype- lbl:    Reproduce column labels in the output dataset? ;
126     %put &msgtype-         (YES|NO - Optional) Default is YES;
127     %put;
128     %put NOTE:   &SYSMACRONAME cannot be used in-line - it generates code.;
129     %put NOTE-   Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
130     %put NOTE-   Data set label is automatically re-created.;
131     %put NOTE-   Only numeric column formats can be re-created, character column formats are ingnored.;
132     %put NOTE-   Use !HELP to print these notes.;
133     options ls=&ls;
134     %return;
135  %end;
136  %if %superq(fmt)= %then %let fmt=YES;
137  %let fmt=%qupcase(&fmt);
138  %if %superq(lbl)= %then %let lbl=YES;
139  %let lbl=%qupcase(&lbl);
140
141  %if %superq(lib)= %then %do;
142      %let lib=%qscan(%superq(dsn),1,.);
143      %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
144      %else %let dsn=%qscan(&dsn,2,.);
145  %end;
146  %if %superq(outlib)= %then %let outlib=WORK;
147  %let lib=%qupcase(%superq(lib));
148  %let dsn=%qupcase(%superq(dsn));
149
150  %if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
151     %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
152     %let msgtype=NOTE;
153     %GoTo syntax;
154  %end;
155
156  %if %superq(file)= %then %do;
157     %let file=create_&outlib._&dsn._data.sas;
158     %if %symexist(USERDIR) %then %let file=&userdir/&file;
159  %end;
160
161  %if %symexist(USERDIR) %then %do;
162     %if %qscan(%superq(file),-1,/\)=%superq(file) %then
163        %let file=&userdir/&file;
164  %end;
165
166  proc sql noprint;
167  select Name
168        into :varlist separated by ' '
169     from dictionary.columns
170     where libname="&lib"
171       and memname="&dsn"
172  ;
173  select case type
174            when 'num' then
175               case
176                  when missing(format) then cats(Name,':32.')
177                  else cats(Name,':',format)
178               end
179            else cats(Name,':$',length,'.')
180         end
181        into :inputlist separated by ' '
182     from dictionary.columns
183     where libname="&lib"
184       and memname="&dsn"
185  ;
186  %if %qsubstr(%superq(lbl),1,1)=Y %then %do;
187  select strip(catx('=',Name,put(label,$quote.)))
188     into : lbllist separated by ' '
189     from dictionary.columns
190     where libname="&lib"
191       and memname="&dsn"
192       and label is not null
193  ;
194  %end;
195  %else %let lbllist=;
196  select memlabel
197     into :memlabel trimmed
198     from dictionary.tables
199     where libname="&lib"
200       and memname="&dsn"
201  ;
202  %if %qsubstr(%superq(fmt),1,1)=Y %then %do;
203  select strip(catx(' ',Name,format))
204        into :fmtlist separated by ' '
205     from dictionary.columns
206     where libname="&lib"
207       and memname="&dsn"
208       and format is not null
209       and format not like '$%'
210  ;
211  %end;
212  %else %let fmtlist=;
213  quit;
214
215  %put _local_;
216
217  data _null_;
218     *file "&file" dsd;
219     if _n_ =1 then do;
220     %if %superq(memlabel)= %then %do;
221        put "data &outlib..&dsn;";
222     %end;
223     %else %do;
224        put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
225     %end;
226        put @3 "infile datalines dsd truncover;";
227        put @3 "input %superq(inputlist);";
228     %if not (%superq(fmtlist)=) %then %do;
229        put @3 "format %superq(fmtlist);";
230     %end;
231     %if not (%superq(lbllist)=) %then %do;
232        put @3 "label %superq(lbllist);";
233     %end;
234        put "datalines;";
235     end;
236     set &lib..&dsn(obs=&obs) end=last;
237     put &varlist @;
238     if last then do;
239        put;
240        put ';;;;';
241  %macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
242  %local varlist fmtlist inputlist msgtype ;
243
244  %if %superq(obs)= %then %let obs=MAX;
245
246  /*added by FKhurshed to set linesize to max.
247  This allows the macro to output the datalines correctly to the
248  log and not run over lines. Current value is stored here to reset
249  at the end*/
250  %let LS= %sysfunc(getoption(LS));
251  option ls=max;
252
253  %let msgtype=NOTE;
254  %if %superq(dsn)= %then %do;
255     %let msgtype=ERROR;
256     %put &msgtype: You must specify a data set name;
257     %put;
258     %goto syntax;
259  %end;
260  %let dsn=%qupcase(%superq(dsn));
261  %if %superq(dsn)=!HELP %then %do;
262  %syntax:
263     data _null_;
264        call symput ('LS',getoption('LS','startupvalue'));
265     run;
266     options ls=100;
267     %put &msgtype: &SYSMACRONAME macro help document:;
268     %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
269     %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
270     %put &msgtype- dsn:    Name of the dataset to be converted. Required.;
271     %put &msgtype- lib:    LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
272     %put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
273     %put &msgtype- file:   Fully qualified filename for the DATA step code produced. (Optional);
274     %put &msgtype-         Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
275     %put &msgtype- obs:    Max observations to include the created dataset.;
276     %put &msgtype-         (Optional) Default is MAX (all observations);
277     %put &msgtype- fmt:    Format the numeric variables in the output dataset like the original data set? ;
278     %put &msgtype-         (YES|NO - Optional) Default is YES;
279     %put &msgtype- lbl:    Reproduce column labels in the output dataset? ;
280     %put &msgtype-         (YES|NO - Optional) Default is YES;
281     %put;
282     %put NOTE:   &SYSMACRONAME cannot be used in-line - it generates code.;
283     %put NOTE-   Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
284     %put NOTE-   Data set label is automatically re-created.;
285     %put NOTE-   Only numeric column formats can be re-created, character column formats are ingnored.;
286     %put NOTE-   Use !HELP to print these notes.;
287     options ls=&ls;
288     %return;
289  %end;
290  %if %superq(fmt)= %then %let fmt=YES;
291  %let fmt=%qupcase(&fmt);
292  %if %superq(lbl)= %then %let lbl=YES;
293  %let lbl=%qupcase(&lbl);
294
295  %if %superq(lib)= %then %do;
296      %let lib=%qscan(%superq(dsn),1,.);
297      %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
298      %else %let dsn=%qscan(&dsn,2,.);
299  %end;
300  %if %superq(outlib)= %then %let outlib=WORK;
301  %let lib=%qupcase(%superq(lib));
302  %let dsn=%qupcase(%superq(dsn));
303
304  %if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
305     %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
306     %let msgtype=NOTE;
307     %GoTo syntax;
308  %end;
309
310  %if %superq(file)= %then %do;
311     %let file=create_&outlib._&dsn._data.sas;
312     %if %symexist(USERDIR) %then %let file=&userdir/&file;
313  %end;
314
315  %if %symexist(USERDIR) %then %do;
316     %if %qscan(%superq(file),-1,/\)=%superq(file) %then
317        %let file=&userdir/&file;
318  %end;
319
320  proc sql noprint;
321  select Name
322        into :varlist separated by ' '
323     from dictionary.columns
324     where libname="&lib"
325       and memname="&dsn"
326  ;
327  select case type
328            when 'num' then
329               case
330                  when missing(format) then cats(Name,':32.')
331                  else cats(Name,':',format)
332               end
333            else cats(Name,':$',length,'.')
334         end
335        into :inputlist separated by ' '
336     from dictionary.columns
337     where libname="&lib"
338       and memname="&dsn"
339  ;
340  %if %qsubstr(%superq(lbl),1,1)=Y %then %do;
341  select strip(catx('=',Name,put(label,$quote.)))
342     into : lbllist separated by ' '
343     from dictionary.columns
344     where libname="&lib"
345       and memname="&dsn"
346       and label is not null
347  ;
348  %end;
349  %else %let lbllist=;
350  select memlabel
351     into :memlabel trimmed
352     from dictionary.tables
353     where libname="&lib"
354       and memname="&dsn"
355  ;
356  %if %qsubstr(%superq(fmt),1,1)=Y %then %do;
357  select strip(catx(' ',Name,format))
358        into :fmtlist separated by ' '
359     from dictionary.columns
360     where libname="&lib"
361       and memname="&dsn"
362       and format is not null
363       and format not like '$%'
364  ;
365  %end;
366  %else %let fmtlist=;
367  quit;
368
369  %put _local_;
370
371  data _null_;
372     *file "&file" dsd;
373     if _n_ =1 then do;
374     %if %superq(memlabel)= %then %do;
375        put "data &outlib..&dsn;";
376     %end;
377     %else %do;
378        put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
379     %end;
380        put @3 "infile datalines dsd truncover;";
381        put @3 "input %superq(inputlist);";
382     %if not (%superq(fmtlist)=) %then %do;
383        put @3 "format %superq(fmtlist);";
384     %end;
385     %if not (%superq(lbllist)=) %then %do;
386        put @3 "label %superq(lbllist);";
387     %end;
388        put "datalines;";
389     end;
390     set &lib..&dsn(obs=&obs) end=last;
391     put &varlist @;
392     if last then do;
393        put;
394        put ';;;;';
395  run;

This is just to send 50 obs.

 

Thanks.

 

wlierman

Reeza
Super User

And apparently you have no %MEND to end the macro? Maybe left out of the copy/paste step. You also need to delete some of the code, which was used only to get the macro code in the first place.

 

Proc Sql noprint;
CREATE Table OPERA.AKA_data_forum AS
  SELECT
        DLEAdi, DMHDdi, DCOMdi, DEARdi, DEYEdi, DDRSdi, DOUTdi, DREMdi, DPHYdi, DLIMdi, DISdi,
        DEYEage, DCOMage, DREMage, DEARage, DDRSage, DOUTage, DLEAage, DLIMage, DMHDage, DPHYage, AgAcq1st
  FROM OPERA.OPERA_all_2;
quit;

***** This is the section from SAS Experts___Reeza  ***;

* Data set you want to create demo data for *;
%let datasetname = OPERA.AKA_data_forum;
* number of obs you want to keep *;
%let obsKeep = 50;

*** have to download the code from the url to run ***;
*  %let source_path = https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/
raw/865d2cf18f5150b8e887218dde0fc3951d0ff15b/data2datastep.sas;

%macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
%local varlist fmtlist inputlist msgtype ;

%if %superq(obs)= %then %let obs=MAX;

/*added by FKhurshed to set linesize to max. 
This allows the macro to output the datalines correctly to the 
log and not run over lines. Current value is stored here to reset
at the end*/
%let LS= %sysfunc(getoption(LS));
option ls=max;

%let msgtype=NOTE;
%if %superq(dsn)= %then %do;
   %let msgtype=ERROR;
   %put &msgtype: You must specify a data set name;
   %put;
   %goto syntax;
%end;
%let dsn=%qupcase(%superq(dsn));
%if %superq(dsn)=!HELP %then %do;
%syntax:
   data _null_;
      call symput ('LS',getoption('LS','startupvalue'));
   run;
   options ls=100;
   %put &msgtype: &SYSMACRONAME macro help document:;
   %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
   %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
   %put &msgtype- dsn:    Name of the dataset to be converted. Required.;
   %put &msgtype- lib:    LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
   %put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
   %put &msgtype- file:   Fully qualified filename for the DATA step code produced. (Optional);
   %put &msgtype-         Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
   %put &msgtype- obs:    Max observations to include the created dataset.;
   %put &msgtype-         (Optional) Default is MAX (all observations);
   %put &msgtype- fmt:    Format the numeric variables in the output dataset like the original data set? ;
   %put &msgtype-         (YES|NO - Optional) Default is YES;
   %put &msgtype- lbl:    Reproduce column labels in the output dataset? ;
   %put &msgtype-         (YES|NO - Optional) Default is YES;
   %put;
   %put NOTE:   &SYSMACRONAME cannot be used in-line - it generates code.;
   %put NOTE-   Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
   %put NOTE-   Data set label is automatically re-created.;
   %put NOTE-   Only numeric column formats can be re-created, character column formats are ingnored.;
   %put NOTE-   Use !HELP to print these notes.;
   options ls=&ls;
   %return;
%end; 
%if %superq(fmt)= %then %let fmt=YES;
%let fmt=%qupcase(&fmt);
%if %superq(lbl)= %then %let lbl=YES;
%let lbl=%qupcase(&lbl);

%if %superq(lib)= %then %do;
    %let lib=%qscan(%superq(dsn),1,.);
    %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
    %else %let dsn=%qscan(&dsn,2,.);
%end;
%if %superq(outlib)= %then %let outlib=WORK;
%let lib=%qupcase(%superq(lib));
%let dsn=%qupcase(%superq(dsn));

%if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
   %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
   %let msgtype=NOTE;
   %GoTo syntax;
%end;

%if %superq(file)= %then %do;
   %let file=create_&outlib._&dsn._data.sas;
   %if %symexist(USERDIR) %then %let file=&userdir/&file;
%end;

%if %symexist(USERDIR) %then %do;
   %if %qscan(%superq(file),-1,/\)=%superq(file) %then
      %let file=&userdir/&file;
%end;

proc sql noprint;
select Name
      into :varlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
select case type
          when 'num' then 
             case 
                when missing(format) then cats(Name,':32.')
                else cats(Name,':',format)
             end 
          else cats(Name,':$',length,'.')
       end
      into :inputlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
;
%if %qsubstr(%superq(lbl),1,1)=Y %then %do;
select strip(catx('=',Name,put(label,$quote.)))
   into : lbllist separated by ' '
   from dictionary.columns 
   where libname="&lib"
     and memname="&dsn"
     and label is not null 
;
%end;
%else %let lbllist=;
select memlabel 
   into :memlabel trimmed
   from dictionary.tables
   where libname="&lib"
     and memname="&dsn"
;
%if %qsubstr(%superq(fmt),1,1)=Y %then %do;
select strip(catx(' ',Name,format))
      into :fmtlist separated by ' '
   from dictionary.columns
   where libname="&lib"
     and memname="&dsn"
     and format is not null 
     and format not like '$%'
;
%end;
%else %let fmtlist=;
quit;

%put _local_;

data _null_;
   *file "&file" dsd;
   if _n_ =1 then do;
   %if %superq(memlabel)= %then %do;
      put "data &outlib..&dsn;";
   %end;
   %else %do;
      put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
   %end;
      put @3 "infile datalines dsd truncover;";
      put @3 "input %superq(inputlist);";
   %if not (%superq(fmtlist)=) %then %do;
      put @3 "format %superq(fmtlist);";
   %end;
   %if not (%superq(lbllist)=) %then %do;
      put @3 "label %superq(lbllist);";
   %end;
      put "datalines;";
   end;
   set &lib..&dsn(obs=&obs) end=last; 
   put &varlist @;
   if last then do;
      put;
      put ';;;;';
run;

%mend;

option linesize=max;
%data2datastep(dsn=&dataSetName, obs=&obsKeep);
Reeza
Super User
Yeah, looks like the last two lines in the macro code were missed in the copy/paste step. Which is part of why I tried to set it up so that users didn't need to do that step....
Tom
Super User Tom
Super User
  • You are missing the last step of that code to use the output of PROC FREQ to count missing/non-missing values.
  • You also have defined the numeric missing format incorrectly so that special missing values like .A and .Z are treated as not missing.

 

Try this. First make some sample data:

data have;
  call streaminit(45);
  length obs 8.;
  array vars(*) $3. DEYEage DCOMage DREMage DEARage DDRSage DOUTage DLEAage DLIMage DMHDage DPHYage AgAcq1st;
  do obs= 1 to 181308;
    do i=1 to dim(vars);
      vars(i)= put(rand('table', 0.4, 0.4, 0.2), Z3.);
      if vars(i) = '003' then call missing(vars(i));
    end;
    output;
  end;
  drop i;
run;                    

Now run it trough the summary code.

*set input/output dataset names;
%let INPUT_DSN = have ;
%let OUTPUT_DSN = want;

* create formats for missing;
proc format;
  value $missfmt ' ' = "Missing"  other = "Not_Missing";
  value nmissfmt low-high ="Not_Missing" other="Missing";
run;

* turn off output and capture the one way freq table TEMP dataset ; 
ods select none;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN.;
  table _all_ / missing;
  format _numeric_ nmissfmt. _character_ $missfmt.;
run;

* turn outputs back on ;
ods select all;

* Collapse to one observation per variable ;
data &OUTPUT_DSN ;
  length name $32 missing not_missing total 8 ;
  set temp;
  by table notsorted ;
  if first.table then call missing(of missing not_missing);
  name = substr(table,7);
  if vvaluex(name)='Missing' then missing=frequency;
  else not_missing=frequency;
  retain missing not_missing;
  if last.table then do;
     missing=sum(0,missing);
     not_missing=sum(0,not_missing);
     total=sum(missing,not_missing);
     percent = divide(missing,total);
     output;
  end;
  keep name missing not_missing total percent;
run;

Results:

                                not_
Obs    name        missing    missing     total    Percent

  1    obs              0      181308    181308      0.00
  2    DEYEage      36233      145075    181308      0.20
  3    DCOMage      36086      145222    181308      0.20
  4    DREMage      36332      144976    181308      0.20
  5    DEARage      36397      144911    181308      0.20
  6    DDRSage      36183      145125    181308      0.20
  7    DOUTage      36225      145083    181308      0.20
  8    DLEAage      36358      144950    181308      0.20
  9    DLIMage      36521      144787    181308      0.20
 10    DMHDage      36281      145027    181308      0.20
 11    DPHYage      36448      144860    181308      0.20
 12    AgAcq1st     36425      144883    181308      0.20

Or try it on SASHELP.HEART.

                                      not_
Obs    name              missing    missing    total    Percent

  1    Status                 0       5209      5209      0.00
  2    DeathCause          3218       1991      5209      0.62
  3    AgeCHDdiag          3760       1449      5209      0.72
  4    Sex                    0       5209      5209      0.00
  5    AgeAtStart             0       5209      5209      0.00
  6    Height                 6       5203      5209      0.00
  7    Weight                 6       5203      5209      0.00
  8    Diastolic              0       5209      5209      0.00
  9    Systolic               0       5209      5209      0.00
 10    MRW                    6       5203      5209      0.00
 11    Smoking               36       5173      5209      0.01
 12    AgeAtDeath          3218       1991      5209      0.62
 13    Cholesterol          152       5057      5209      0.03
 14    Chol_Status          152       5057      5209      0.03
 15    BP_Status              0       5209      5209      0.00
 16    Weight_Status          6       5203      5209      0.00
 17    Smoking_Status        36       5173      5209      0.01
wlierman
Lapis Lazuli | Level 10

Okay I will try that.  The included tables are exactly what I am trying to get at.

 

Thanks.

 

wlierman

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 24 replies
  • 1559 views
  • 4 likes
  • 5 in conversation