I am continuing to get a method to count missing and non-missing obs for columns. I am using some of the code shared earlier which still has a couple issues.
In the data set:
Here is what the columns look like in the actual data set (these are or would be numeric values -- age) DEYEage DCOMage DREMage DEARage DDDRSage DOUTage DLEAage DLIMage DMHDage DPHYage AgAcq1st
. . . . . . . . . . .
. . . . . . . . . .
There are over 181K obs and age is probably missing for 99% of the obs (I'm not sure since that is what I want to determine)
After going this far in the code given earlier
Data OPERA.OPERA_ALL_2;
call streaminit(45);
length obs 8.;
array vars(*) $3. DEYEage DCOMage DREMage DEARage DDRSage DOUTage DLEAage DLIMage DMHDage DPHYage AgAcq1st;
/* DLEAdi DMHDdi DCOMdi DEARdi DEYEdi DDRSdi DOUTdi DREMdi DPHYdi */
do obs= 1 to 181308;
do i=1 to dim(vars);
vars(i)= put(rand('table', 0.4, 0.4, 0.2), Z3.);
if vars(i) = '003' then call missing(vars(i));
end;
output;
end;
drop i;
run;
*set input data set name;
%let INPUT_DSN = OPERA.OPERA_DIS_AGE;
%let OUTPUT_DSN = OPERA.OPERA_All_5;
* create format for missing;
proc format;
value $ missfmt ' ' = "Missing" other = "Not_Missing_char";
value nmissfmt . ="Missing" other="Not_Missing_num";
run;
* proc freq to count missing / non-missing;
ods select all;
* turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
the results of the proc freq (showing just one of the vars to reduce clutter)
DREMage Cumulative Cumulative Freq Percent Freq Percent Missing 36332 20.04 36332 20.04 Not_Missing 144976 79.96 181308 100.00
The problem is there are not 144976 non-missing values. There are hardly any age responses coded from the survey.
These are also numeric values (responses are age).
I ran the same code given above just before I ran for the age. The only difference was the vars listed were
the character names for thew disabilities.
/* DLEAdi DMHDdi DCOMdi DEARdi DEYEdi DDRSdi DOUTdi DREMdi DPHYdi */
The responses were character values either No or Yes. The table shown above is actually giving the percentages for
the character value No and Yes counts not the numeric value for actual ages where there are ages given
The only code that varies is the vars listing. Why doesn't the proc freq give the correct corresponding counts between the character (disability names) No/Yes counts and the much more sparse numeric (age) counts for the data set.
Thanks.
wlierman
Hello
Considering the variable as missing and non missing, this is also an approach.
/* Creating a format */
proc format;
value $ missfmt ' ' = "Missing" other = "Not_Missing_char";
value nmissfmt . ="Missing" other="Not_Missing_num";
run;
/* Creating data */
data class;
Format name sex $missfmt.;
Format Age Height Weight nmissfmt.;
set sashelp.class;
if mod(_n_,3)=0 then name=" ";
if mod(_n_,4)=0 then Age=.;
if mod(_n_,5)=0 then Height=.;
if mod(_n_,5)=0 then Weight=.;
if mod(_n_,6)=0 then Sex=" ";
run;
/* counting missing and non missing values*/
proc freq data=class;
Tables _ALL_/ list nopercent nocum missing;
run;
The output will be like this
@wlierman wrote:
I am continuing to get a method to count missing and non-missing obs for columns. I am using some of the code shared earlier which still has a couple issues.
In the data set:
Here is what the columns look like in the actual data set (these are or would be numeric values -- age) DEYEage DCOMage DREMage DEARage DDDRSage DOUTage DLEAage DLIMage DMHDage DPHYage AgAcq1st
. . . . . . . . . . .
. . . . . . . . . .
There are over 181K obs and age is probably missing for 99% of the obs (I'm not sure since that is what I want to determine)
These are also numeric values (responses are age).
I ran the same code given above just before I ran for the age. The only difference was the vars listed were
the character names for thew disabilities.
/* DLEAdi DMHDdi DCOMdi DEARdi DEYEdi DDRSdi DOUTdi DREMdi DPHYdi */
The responses were character values either No or Yes. The table shown above is actually giving the percentages for
the character value No and Yes counts not the numeric value for actual ages where there are ages given
The only code that varies is the vars listing. Why doesn't the proc freq give the correct corresponding counts between the character (disability names) No/Yes counts and the much more sparse numeric (age) counts for the data set.
Thanks.
wlierman
We would need some actual data to tell why a specific code does not show a specific variable count or not.
What do you get from proc freq for the variable DREMage without any formats involved? Yes it may be long but perhaps you should LOOK at it.
From what I can tell you are creating a data set OPERA.OPERA_ALL_2 and then running proc freq with data=OPERA.OPERA_DIS_AGE. So we have no idea what might be in Opera_dis_age
Hello,
I have attached a small excel file of the ranges of vars that I am seeking counts for. The first section are the var names and the No/Yes responses.. I need counts of # of responses and then the # of blank obs. (It would be nice to have # of No, # of Yes , and # of blanks or missing.) But even # of non-missing and # of missing would be very good.
The last section of vars are the var ages. These are the numeric values. As I mentioned in the notes these are numeric but they are very very sparse as you can see from these few obs which I sent. So I need the # of non-missing and # of missing for these.
You've supplied great information. I likely lost much in translation. I appreciate your assistance.
Thanks.
wlierman
Commenting out the Format line did not provide the solution.
I will follow the steps to paste in the code and all for the problem.
Thank you.
wlierman
Many users here don't want to download Excel files because of virus potential, others have such things blocked by security software. Also if you give us Excel we have to create a SAS data set and due to the non-existent constraints on Excel data cells the result we end up with may not have variables of the same type (numeric or character) and even values.
Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.
I will go through these steps.
Thank you.
wlierman
I used the code included in your message on how to generate sample data for the Ask the Expert forum and receiuved the following log:
%let datasetname = OPERA.AKA_data_forum;
43 * number of obs you want to keep *;
44 %let obsKeep = 100;
45 %let source_path =
45 ! https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/
46 raw/865d2cf18f5150b8e887218dde0fc3951d0ff15b/data2datastep.sas;
47
48
49 filename reprex url "&source_path";
50 %include reprex;
ERROR: The connection has timed out..
ERROR: Cannot open %INCLUDE file REPREX.
51 filename reprex;
NOTE: Fileref REPREX has been deassigned.
52
53 option linesize=max;
54 %data2datastep(dsn=&dataSetName, obs=&obsKeep);
-
180
WARNING: Apparent invocation of macro DATA2DATASTEP not resolved.
ERROR 180-322: Statement is not valid or it is used out of proper order.
What is the next step?
Thanks.
wlierman
I downloaded the code (very long), set up the little data set of 100 obs to send over and nothing in the log.
Like the macro didn't instantiate. This is the code that I ran and the log is down below
Proc Sql noprint;
CREATE Table OPERA.AKA_data_forum AS
SELECT
DLEAdi, DMHDdi, DCOMdi, DEARdi, DEYEdi, DDRSdi, DOUTdi, DREMdi, DPHYdi, DLIMdi, DISdi,
DEYEage, DCOMage, DREMage, DEARage, DDRSage, DOUTage, DLEAage, DLIMage, DMHDage, DPHYage, AgAcq1st
FROM OPERA.OPERA_all_2;
quit;
***** This is the section from SAS Experts___Reeza ***;
* Data set you want to create demo data for *;
%let datasetname = OPERA.AKA_data_forum;
* number of obs you want to keep *;
%let obsKeep = 50;
*** have to download the code from the url to run ***;
* %let source_path = https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/
raw/865d2cf18f5150b8e887218dde0fc3951d0ff15b/data2datastep.sas;
%macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
%local varlist fmtlist inputlist msgtype ;
%if %superq(obs)= %then %let obs=MAX;
/*added by FKhurshed to set linesize to max.
This allows the macro to output the datalines correctly to the
log and not run over lines. Current value is stored here to reset
at the end*/
%let LS= %sysfunc(getoption(LS));
option ls=max;
%let msgtype=NOTE;
%if %superq(dsn)= %then %do;
%let msgtype=ERROR;
%put &msgtype: You must specify a data set name;
%put;
%goto syntax;
%end;
%let dsn=%qupcase(%superq(dsn));
%if %superq(dsn)=!HELP %then %do;
%syntax:
data _null_;
call symput ('LS',getoption('LS','startupvalue'));
run;
options ls=100;
%put &msgtype: &SYSMACRONAME macro help document:;
%put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
%put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
%put &msgtype- dsn: Name of the dataset to be converted. Required.;
%put &msgtype- lib: LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
%put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
%put &msgtype- file: Fully qualified filename for the DATA step code produced. (Optional);
%put &msgtype- Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
%put &msgtype- obs: Max observations to include the created dataset.;
%put &msgtype- (Optional) Default is MAX (all observations);
%put &msgtype- fmt: Format the numeric variables in the output dataset like the original data set? ;
%put &msgtype- (YES|NO - Optional) Default is YES;
%put &msgtype- lbl: Reproduce column labels in the output dataset? ;
%put &msgtype- (YES|NO - Optional) Default is YES;
%put;
%put NOTE: &SYSMACRONAME cannot be used in-line - it generates code.;
%put NOTE- Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
%put NOTE- Data set label is automatically re-created.;
%put NOTE- Only numeric column formats can be re-created, character column formats are ingnored.;
%put NOTE- Use !HELP to print these notes.;
options ls=&ls;
%return;
%end;
%if %superq(fmt)= %then %let fmt=YES;
%let fmt=%qupcase(&fmt);
%if %superq(lbl)= %then %let lbl=YES;
%let lbl=%qupcase(&lbl);
%if %superq(lib)= %then %do;
%let lib=%qscan(%superq(dsn),1,.);
%if %superq(lib) = %superq(dsn) %then %let lib=WORK;
%else %let dsn=%qscan(&dsn,2,.);
%end;
%if %superq(outlib)= %then %let outlib=WORK;
%let lib=%qupcase(%superq(lib));
%let dsn=%qupcase(%superq(dsn));
%if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
%put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
%let msgtype=NOTE;
%GoTo syntax;
%end;
%if %superq(file)= %then %do;
%let file=create_&outlib._&dsn._data.sas;
%if %symexist(USERDIR) %then %let file=&userdir/&file;
%end;
%if %symexist(USERDIR) %then %do;
%if %qscan(%superq(file),-1,/\)=%superq(file) %then
%let file=&userdir/&file;
%end;
proc sql noprint;
select Name
into :varlist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn"
;
select case type
when 'num' then
case
when missing(format) then cats(Name,':32.')
else cats(Name,':',format)
end
else cats(Name,':$',length,'.')
end
into :inputlist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn"
;
%if %qsubstr(%superq(lbl),1,1)=Y %then %do;
select strip(catx('=',Name,put(label,$quote.)))
into : lbllist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn"
and label is not null
;
%end;
%else %let lbllist=;
select memlabel
into :memlabel trimmed
from dictionary.tables
where libname="&lib"
and memname="&dsn"
;
%if %qsubstr(%superq(fmt),1,1)=Y %then %do;
select strip(catx(' ',Name,format))
into :fmtlist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn"
and format is not null
and format not like '$%'
;
%end;
%else %let fmtlist=;
quit;
%put _local_;
data _null_;
*file "&file" dsd;
if _n_ =1 then do;
%if %superq(memlabel)= %then %do;
put "data &outlib..&dsn;";
%end;
%else %do;
put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
%end;
put @3 "infile datalines dsd truncover;";
put @3 "input %superq(inputlist);";
%if not (%superq(fmtlist)=) %then %do;
put @3 "format %superq(fmtlist);";
%end;
%if not (%superq(lbllist)=) %then %do;
put @3 "label %superq(lbllist);";
%end;
put "datalines;";
end;
set &lib..&dsn(obs=&obs) end=last;
put &varlist @;
if last then do;
put;
put ';;;;';
run;
filename reprex url "&source_path";
%include reprex;
filename reprex;
option linesize=max;
%data2datastep(dsn=&dataSetName, obs=&obsKeep);
Log follows:
Proc Sql noprint;
78 CREATE Table OPERA.AKA_data_forum AS
79 SELECT
80 DLEAdi, DMHDdi, DCOMdi, DEARdi, DEYEdi, DDRSdi, DOUTdi, DREMdi, DPHYdi, DLIMdi, DISdi,
81 DEYEage, DCOMage, DREMage, DEARage, DDRSage, DOUTage, DLEAage, DLIMage, DMHDage, DPHYage, AgAcq1st
82 FROM OPERA.OPERA_all_2;
NOTE: Compressing data set OPERA.AKA_DATA_FORUM decreased size by 38.87 percent.
Compressed is 217 pages; un-compressed would require 355 pages.
NOTE: Table OPERA.AKA_DATA_FORUM created, with 181308 rows and 22 columns.
83 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.79 seconds
cpu time 0.79 seconds
84 %let datasetname = OPERA.AKA_data_forum;
85 * number of obs you want to keep *;
86 %let obsKeep = 50;
87 %macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
88 %local varlist fmtlist inputlist msgtype ;
89
90 %if %superq(obs)= %then %let obs=MAX;
91
92 /*added by FKhurshed to set linesize to max.
93 This allows the macro to output the datalines correctly to the
94 log and not run over lines. Current value is stored here to reset
95 at the end*/
96 %let LS= %sysfunc(getoption(LS));
97 option ls=max;
98
99 %let msgtype=NOTE;
100 %if %superq(dsn)= %then %do;
101 %let msgtype=ERROR;
102 %put &msgtype: You must specify a data set name;
103 %put;
104 %goto syntax;
105 %end;
106 %let dsn=%qupcase(%superq(dsn));
107 %if %superq(dsn)=!HELP %then %do;
108 %syntax:
109 data _null_;
110 call symput ('LS',getoption('LS','startupvalue'));
111 run;
112 options ls=100;
113 %put &msgtype: &SYSMACRONAME macro help document:;
114 %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
115 %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
116 %put &msgtype- dsn: Name of the dataset to be converted. Required.;
117 %put &msgtype- lib: LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
118 %put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
119 %put &msgtype- file: Fully qualified filename for the DATA step code produced. (Optional);
120 %put &msgtype- Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
121 %put &msgtype- obs: Max observations to include the created dataset.;
122 %put &msgtype- (Optional) Default is MAX (all observations);
123 %put &msgtype- fmt: Format the numeric variables in the output dataset like the original data set? ;
124 %put &msgtype- (YES|NO - Optional) Default is YES;
125 %put &msgtype- lbl: Reproduce column labels in the output dataset? ;
126 %put &msgtype- (YES|NO - Optional) Default is YES;
127 %put;
128 %put NOTE: &SYSMACRONAME cannot be used in-line - it generates code.;
129 %put NOTE- Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
130 %put NOTE- Data set label is automatically re-created.;
131 %put NOTE- Only numeric column formats can be re-created, character column formats are ingnored.;
132 %put NOTE- Use !HELP to print these notes.;
133 options ls=&ls;
134 %return;
135 %end;
136 %if %superq(fmt)= %then %let fmt=YES;
137 %let fmt=%qupcase(&fmt);
138 %if %superq(lbl)= %then %let lbl=YES;
139 %let lbl=%qupcase(&lbl);
140
141 %if %superq(lib)= %then %do;
142 %let lib=%qscan(%superq(dsn),1,.);
143 %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
144 %else %let dsn=%qscan(&dsn,2,.);
145 %end;
146 %if %superq(outlib)= %then %let outlib=WORK;
147 %let lib=%qupcase(%superq(lib));
148 %let dsn=%qupcase(%superq(dsn));
149
150 %if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
151 %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
152 %let msgtype=NOTE;
153 %GoTo syntax;
154 %end;
155
156 %if %superq(file)= %then %do;
157 %let file=create_&outlib._&dsn._data.sas;
158 %if %symexist(USERDIR) %then %let file=&userdir/&file;
159 %end;
160
161 %if %symexist(USERDIR) %then %do;
162 %if %qscan(%superq(file),-1,/\)=%superq(file) %then
163 %let file=&userdir/&file;
164 %end;
165
166 proc sql noprint;
167 select Name
168 into :varlist separated by ' '
169 from dictionary.columns
170 where libname="&lib"
171 and memname="&dsn"
172 ;
173 select case type
174 when 'num' then
175 case
176 when missing(format) then cats(Name,':32.')
177 else cats(Name,':',format)
178 end
179 else cats(Name,':$',length,'.')
180 end
181 into :inputlist separated by ' '
182 from dictionary.columns
183 where libname="&lib"
184 and memname="&dsn"
185 ;
186 %if %qsubstr(%superq(lbl),1,1)=Y %then %do;
187 select strip(catx('=',Name,put(label,$quote.)))
188 into : lbllist separated by ' '
189 from dictionary.columns
190 where libname="&lib"
191 and memname="&dsn"
192 and label is not null
193 ;
194 %end;
195 %else %let lbllist=;
196 select memlabel
197 into :memlabel trimmed
198 from dictionary.tables
199 where libname="&lib"
200 and memname="&dsn"
201 ;
202 %if %qsubstr(%superq(fmt),1,1)=Y %then %do;
203 select strip(catx(' ',Name,format))
204 into :fmtlist separated by ' '
205 from dictionary.columns
206 where libname="&lib"
207 and memname="&dsn"
208 and format is not null
209 and format not like '$%'
210 ;
211 %end;
212 %else %let fmtlist=;
213 quit;
214
215 %put _local_;
216
217 data _null_;
218 *file "&file" dsd;
219 if _n_ =1 then do;
220 %if %superq(memlabel)= %then %do;
221 put "data &outlib..&dsn;";
222 %end;
223 %else %do;
224 put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
225 %end;
226 put @3 "infile datalines dsd truncover;";
227 put @3 "input %superq(inputlist);";
228 %if not (%superq(fmtlist)=) %then %do;
229 put @3 "format %superq(fmtlist);";
230 %end;
231 %if not (%superq(lbllist)=) %then %do;
232 put @3 "label %superq(lbllist);";
233 %end;
234 put "datalines;";
235 end;
236 set &lib..&dsn(obs=&obs) end=last;
237 put &varlist @;
238 if last then do;
239 put;
240 put ';;;;';
241 %macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
242 %local varlist fmtlist inputlist msgtype ;
243
244 %if %superq(obs)= %then %let obs=MAX;
245
246 /*added by FKhurshed to set linesize to max.
247 This allows the macro to output the datalines correctly to the
248 log and not run over lines. Current value is stored here to reset
249 at the end*/
250 %let LS= %sysfunc(getoption(LS));
251 option ls=max;
252
253 %let msgtype=NOTE;
254 %if %superq(dsn)= %then %do;
255 %let msgtype=ERROR;
256 %put &msgtype: You must specify a data set name;
257 %put;
258 %goto syntax;
259 %end;
260 %let dsn=%qupcase(%superq(dsn));
261 %if %superq(dsn)=!HELP %then %do;
262 %syntax:
263 data _null_;
264 call symput ('LS',getoption('LS','startupvalue'));
265 run;
266 options ls=100;
267 %put &msgtype: &SYSMACRONAME macro help document:;
268 %put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
269 %put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
270 %put &msgtype- dsn: Name of the dataset to be converted. Required.;
271 %put &msgtype- lib: LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
272 %put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
273 %put &msgtype- file: Fully qualified filename for the DATA step code produced. (Optional);
274 %put &msgtype- Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
275 %put &msgtype- obs: Max observations to include the created dataset.;
276 %put &msgtype- (Optional) Default is MAX (all observations);
277 %put &msgtype- fmt: Format the numeric variables in the output dataset like the original data set? ;
278 %put &msgtype- (YES|NO - Optional) Default is YES;
279 %put &msgtype- lbl: Reproduce column labels in the output dataset? ;
280 %put &msgtype- (YES|NO - Optional) Default is YES;
281 %put;
282 %put NOTE: &SYSMACRONAME cannot be used in-line - it generates code.;
283 %put NOTE- Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
284 %put NOTE- Data set label is automatically re-created.;
285 %put NOTE- Only numeric column formats can be re-created, character column formats are ingnored.;
286 %put NOTE- Use !HELP to print these notes.;
287 options ls=&ls;
288 %return;
289 %end;
290 %if %superq(fmt)= %then %let fmt=YES;
291 %let fmt=%qupcase(&fmt);
292 %if %superq(lbl)= %then %let lbl=YES;
293 %let lbl=%qupcase(&lbl);
294
295 %if %superq(lib)= %then %do;
296 %let lib=%qscan(%superq(dsn),1,.);
297 %if %superq(lib) = %superq(dsn) %then %let lib=WORK;
298 %else %let dsn=%qscan(&dsn,2,.);
299 %end;
300 %if %superq(outlib)= %then %let outlib=WORK;
301 %let lib=%qupcase(%superq(lib));
302 %let dsn=%qupcase(%superq(dsn));
303
304 %if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
305 %put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
306 %let msgtype=NOTE;
307 %GoTo syntax;
308 %end;
309
310 %if %superq(file)= %then %do;
311 %let file=create_&outlib._&dsn._data.sas;
312 %if %symexist(USERDIR) %then %let file=&userdir/&file;
313 %end;
314
315 %if %symexist(USERDIR) %then %do;
316 %if %qscan(%superq(file),-1,/\)=%superq(file) %then
317 %let file=&userdir/&file;
318 %end;
319
320 proc sql noprint;
321 select Name
322 into :varlist separated by ' '
323 from dictionary.columns
324 where libname="&lib"
325 and memname="&dsn"
326 ;
327 select case type
328 when 'num' then
329 case
330 when missing(format) then cats(Name,':32.')
331 else cats(Name,':',format)
332 end
333 else cats(Name,':$',length,'.')
334 end
335 into :inputlist separated by ' '
336 from dictionary.columns
337 where libname="&lib"
338 and memname="&dsn"
339 ;
340 %if %qsubstr(%superq(lbl),1,1)=Y %then %do;
341 select strip(catx('=',Name,put(label,$quote.)))
342 into : lbllist separated by ' '
343 from dictionary.columns
344 where libname="&lib"
345 and memname="&dsn"
346 and label is not null
347 ;
348 %end;
349 %else %let lbllist=;
350 select memlabel
351 into :memlabel trimmed
352 from dictionary.tables
353 where libname="&lib"
354 and memname="&dsn"
355 ;
356 %if %qsubstr(%superq(fmt),1,1)=Y %then %do;
357 select strip(catx(' ',Name,format))
358 into :fmtlist separated by ' '
359 from dictionary.columns
360 where libname="&lib"
361 and memname="&dsn"
362 and format is not null
363 and format not like '$%'
364 ;
365 %end;
366 %else %let fmtlist=;
367 quit;
368
369 %put _local_;
370
371 data _null_;
372 *file "&file" dsd;
373 if _n_ =1 then do;
374 %if %superq(memlabel)= %then %do;
375 put "data &outlib..&dsn;";
376 %end;
377 %else %do;
378 put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
379 %end;
380 put @3 "infile datalines dsd truncover;";
381 put @3 "input %superq(inputlist);";
382 %if not (%superq(fmtlist)=) %then %do;
383 put @3 "format %superq(fmtlist);";
384 %end;
385 %if not (%superq(lbllist)=) %then %do;
386 put @3 "label %superq(lbllist);";
387 %end;
388 put "datalines;";
389 end;
390 set &lib..&dsn(obs=&obs) end=last;
391 put &varlist @;
392 if last then do;
393 put;
394 put ';;;;';
395 run;
This is just to send 50 obs.
Thanks.
wlierman
And apparently you have no %MEND to end the macro? Maybe left out of the copy/paste step. You also need to delete some of the code, which was used only to get the macro code in the first place.
Proc Sql noprint;
CREATE Table OPERA.AKA_data_forum AS
SELECT
DLEAdi, DMHDdi, DCOMdi, DEARdi, DEYEdi, DDRSdi, DOUTdi, DREMdi, DPHYdi, DLIMdi, DISdi,
DEYEage, DCOMage, DREMage, DEARage, DDRSage, DOUTage, DLEAage, DLIMage, DMHDage, DPHYage, AgAcq1st
FROM OPERA.OPERA_all_2;
quit;
***** This is the section from SAS Experts___Reeza ***;
* Data set you want to create demo data for *;
%let datasetname = OPERA.AKA_data_forum;
* number of obs you want to keep *;
%let obsKeep = 50;
*** have to download the code from the url to run ***;
* %let source_path = https://gist.githubusercontent.com/statgeek/bcc55940dd825a13b9c8ca40a904cba9/
raw/865d2cf18f5150b8e887218dde0fc3951d0ff15b/data2datastep.sas;
%macro data2datastep(dsn=,lib=,outlib=,file=,obs=,fmt=,lbl=);
%local varlist fmtlist inputlist msgtype ;
%if %superq(obs)= %then %let obs=MAX;
/*added by FKhurshed to set linesize to max.
This allows the macro to output the datalines correctly to the
log and not run over lines. Current value is stored here to reset
at the end*/
%let LS= %sysfunc(getoption(LS));
option ls=max;
%let msgtype=NOTE;
%if %superq(dsn)= %then %do;
%let msgtype=ERROR;
%put &msgtype: You must specify a data set name;
%put;
%goto syntax;
%end;
%let dsn=%qupcase(%superq(dsn));
%if %superq(dsn)=!HELP %then %do;
%syntax:
data _null_;
call symput ('LS',getoption('LS','startupvalue'));
run;
options ls=100;
%put &msgtype: &SYSMACRONAME macro help document:;
%put &msgtype- Purpose: Converts a data set to a SAS DATA step.;
%put &msgtype- Syntax: %nrstr(%%)&SYSMACRONAME(dsn<,lib,outlib,file,obs,fmt,lbl>);
%put &msgtype- dsn: Name of the dataset to be converted. Required.;
%put &msgtype- lib: LIBREF of the original dataset. (Optional - if DSN is not fully qualified);
%put &msgtype- outlib: LIBREF for the output dataset. (Optional - default is WORK);
%put &msgtype- file: Fully qualified filename for the DATA step code produced. (Optional);
%put &msgtype- Default is %nrstr(create_&outlib._&dsn._data.sas) in the SAS default directory.;
%put &msgtype- obs: Max observations to include the created dataset.;
%put &msgtype- (Optional) Default is MAX (all observations);
%put &msgtype- fmt: Format the numeric variables in the output dataset like the original data set? ;
%put &msgtype- (YES|NO - Optional) Default is YES;
%put &msgtype- lbl: Reproduce column labels in the output dataset? ;
%put &msgtype- (YES|NO - Optional) Default is YES;
%put;
%put NOTE: &SYSMACRONAME cannot be used in-line - it generates code.;
%put NOTE- Every FORMAT in the original data must have a corresponding INFORMAT of the same name.;
%put NOTE- Data set label is automatically re-created.;
%put NOTE- Only numeric column formats can be re-created, character column formats are ingnored.;
%put NOTE- Use !HELP to print these notes.;
options ls=&ls;
%return;
%end;
%if %superq(fmt)= %then %let fmt=YES;
%let fmt=%qupcase(&fmt);
%if %superq(lbl)= %then %let lbl=YES;
%let lbl=%qupcase(&lbl);
%if %superq(lib)= %then %do;
%let lib=%qscan(%superq(dsn),1,.);
%if %superq(lib) = %superq(dsn) %then %let lib=WORK;
%else %let dsn=%qscan(&dsn,2,.);
%end;
%if %superq(outlib)= %then %let outlib=WORK;
%let lib=%qupcase(%superq(lib));
%let dsn=%qupcase(%superq(dsn));
%if %sysfunc(exist(&lib..&dsn)) ne 1 %then %do;
%put ERROR: (&SYSMACRONAME) - Dataset &lib..&dsn does not exist.;
%let msgtype=NOTE;
%GoTo syntax;
%end;
%if %superq(file)= %then %do;
%let file=create_&outlib._&dsn._data.sas;
%if %symexist(USERDIR) %then %let file=&userdir/&file;
%end;
%if %symexist(USERDIR) %then %do;
%if %qscan(%superq(file),-1,/\)=%superq(file) %then
%let file=&userdir/&file;
%end;
proc sql noprint;
select Name
into :varlist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn"
;
select case type
when 'num' then
case
when missing(format) then cats(Name,':32.')
else cats(Name,':',format)
end
else cats(Name,':$',length,'.')
end
into :inputlist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn"
;
%if %qsubstr(%superq(lbl),1,1)=Y %then %do;
select strip(catx('=',Name,put(label,$quote.)))
into : lbllist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn"
and label is not null
;
%end;
%else %let lbllist=;
select memlabel
into :memlabel trimmed
from dictionary.tables
where libname="&lib"
and memname="&dsn"
;
%if %qsubstr(%superq(fmt),1,1)=Y %then %do;
select strip(catx(' ',Name,format))
into :fmtlist separated by ' '
from dictionary.columns
where libname="&lib"
and memname="&dsn"
and format is not null
and format not like '$%'
;
%end;
%else %let fmtlist=;
quit;
%put _local_;
data _null_;
*file "&file" dsd;
if _n_ =1 then do;
%if %superq(memlabel)= %then %do;
put "data &outlib..&dsn;";
%end;
%else %do;
put "data &outlib..&dsn(label=%tslit(%superq(memlabel)));";
%end;
put @3 "infile datalines dsd truncover;";
put @3 "input %superq(inputlist);";
%if not (%superq(fmtlist)=) %then %do;
put @3 "format %superq(fmtlist);";
%end;
%if not (%superq(lbllist)=) %then %do;
put @3 "label %superq(lbllist);";
%end;
put "datalines;";
end;
set &lib..&dsn(obs=&obs) end=last;
put &varlist @;
if last then do;
put;
put ';;;;';
run;
%mend;
option linesize=max;
%data2datastep(dsn=&dataSetName, obs=&obsKeep);
Try this. First make some sample data:
data have;
call streaminit(45);
length obs 8.;
array vars(*) $3. DEYEage DCOMage DREMage DEARage DDRSage DOUTage DLEAage DLIMage DMHDage DPHYage AgAcq1st;
do obs= 1 to 181308;
do i=1 to dim(vars);
vars(i)= put(rand('table', 0.4, 0.4, 0.2), Z3.);
if vars(i) = '003' then call missing(vars(i));
end;
output;
end;
drop i;
run;
Now run it trough the summary code.
*set input/output dataset names;
%let INPUT_DSN = have ;
%let OUTPUT_DSN = want;
* create formats for missing;
proc format;
value $missfmt ' ' = "Missing" other = "Not_Missing";
value nmissfmt low-high ="Not_Missing" other="Missing";
run;
* turn off output and capture the one way freq table TEMP dataset ;
ods select none;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN.;
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
* turn outputs back on ;
ods select all;
* Collapse to one observation per variable ;
data &OUTPUT_DSN ;
length name $32 missing not_missing total 8 ;
set temp;
by table notsorted ;
if first.table then call missing(of missing not_missing);
name = substr(table,7);
if vvaluex(name)='Missing' then missing=frequency;
else not_missing=frequency;
retain missing not_missing;
if last.table then do;
missing=sum(0,missing);
not_missing=sum(0,not_missing);
total=sum(missing,not_missing);
percent = divide(missing,total);
output;
end;
keep name missing not_missing total percent;
run;
Results:
not_ Obs name missing missing total Percent 1 obs 0 181308 181308 0.00 2 DEYEage 36233 145075 181308 0.20 3 DCOMage 36086 145222 181308 0.20 4 DREMage 36332 144976 181308 0.20 5 DEARage 36397 144911 181308 0.20 6 DDRSage 36183 145125 181308 0.20 7 DOUTage 36225 145083 181308 0.20 8 DLEAage 36358 144950 181308 0.20 9 DLIMage 36521 144787 181308 0.20 10 DMHDage 36281 145027 181308 0.20 11 DPHYage 36448 144860 181308 0.20 12 AgAcq1st 36425 144883 181308 0.20
Or try it on SASHELP.HEART.
not_ Obs name missing missing total Percent 1 Status 0 5209 5209 0.00 2 DeathCause 3218 1991 5209 0.62 3 AgeCHDdiag 3760 1449 5209 0.72 4 Sex 0 5209 5209 0.00 5 AgeAtStart 0 5209 5209 0.00 6 Height 6 5203 5209 0.00 7 Weight 6 5203 5209 0.00 8 Diastolic 0 5209 5209 0.00 9 Systolic 0 5209 5209 0.00 10 MRW 6 5203 5209 0.00 11 Smoking 36 5173 5209 0.01 12 AgeAtDeath 3218 1991 5209 0.62 13 Cholesterol 152 5057 5209 0.03 14 Chol_Status 152 5057 5209 0.03 15 BP_Status 0 5209 5209 0.00 16 Weight_Status 6 5203 5209 0.00 17 Smoking_Status 36 5173 5209 0.01
Okay I will try that. The included tables are exactly what I am trying to get at.
Thanks.
wlierman
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.