all i did was filling: "%macro drop_missing_pct(input_dsn= have, output_dsn=want, pct = 0,4 , id_vars=);"
my code is:
data surowe_train;
set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train';
run;
data surowe_valid;
set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid';
run;
proc sql noprint;
create table sklejone as
select a.*, b.* from surowe_train as a left join surowe_valid as b
On a.cid=b.cid;
quit;
log from it is:
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 data surowe_train;
29 set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train';
30 run;
NOTE: There were 52841 observations read from the data set
C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train.
NOTE: The data set WORK.SUROWE_TRAIN has 52841 observations and 2310 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 9.87 seconds
cpu time 3.18 seconds
31
32 data surowe_valid;
33 set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid';
34 run;
NOTE: There were 53070 observations read from the data set
C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid.
NOTE: The data set WORK.SUROWE_VALID has 53070 observations and 2310 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 9.49 seconds
cpu time 3.32 seconds
35
36 proc sql noprint;
37 create table sklejone as
38 select a.*, b.* from surowe_train as a left join surowe_valid as b
39 On a.cid=b.cid;
then there is a few thousand warnings about data already existing in WORK.SKLEJONE>.(it has to do with my attempts to properly merge two files, i propaby created new library).
theres no log from your code because it had error(i suppose that is the reason, because i can olnly see error beside whole code in the log)
Take this code and post it in a brand new, clean SAS session.
Run it, and post the log from the output. Do not change ANYTHING.
%macro drop_missing_pct(input_dsn = , output_dsn=, pct = , id_vars=);
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN. (drop = &ID_Vars);
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
*not required for display purposes;
proc sort data=long;
by variable;
run;
*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;
*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;
*clean up;
*uncomment after testing;
/* proc sql; */
/* drop table long; */
/* drop table temp; */
/* quit; */
%mend;
data class;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
%drop_missing_pct(input_dsn = class, output_dsn = want, pct = 20, id_vars = Name);
*check output;
proc contents data=want;
run;
title 'Missing Report for data set';
proc print data=long;
run;
title 'Data set with columns missing more than 20% dropped';
proc print data=want (obs=5);
run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 5 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.07 seconds
cpu time 0.00 seconds
97
98
99 %drop_missing_pct(input_dsn = class, output_dsn = want, pct = 20, id_vars = Name);
NOTE: Format $MISSFMT has been output.
NOTE: Format NMISSFMT has been output.
NOTE: PROCEDURE FORMAT zajęła (całkowity czas przetwarzania):
real time 0.03 seconds
cpu time 0.00 seconds
3 System SAS 20:04 Tuesday, January 19, 2021
NOTE: The data set WORK.TEMP has 8 observations and 13 variables.
NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: PROCEDURE FREQ zajęła (całkowity czas przetwarzania):
real time 0.28 seconds
cpu time 0.04 seconds
NOTE: There were 8 observations read from the data set WORK.TEMP.
NOTE: The data set WORK.LONG has 8 observations and 7 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.10 seconds
cpu time 0.01 seconds
NOTE: There were 8 observations read from the data set WORK.LONG.
NOTE: The data set WORK.LONG has 8 observations and 7 variables.
NOTE: PROCEDURE SORT zajęła (całkowity czas przetwarzania):
real time 0.03 seconds
cpu time 0.01 seconds
NOTE: PROCEDURE SQL zajęła (całkowity czas przetwarzania):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.WANT has 19 observations and 2 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.01 seconds
cpu time 0.00 seconds
100
101
102
103 *check output;
104 proc contents data=want;
105 run;
NOTE: PROCEDURE CONTENTS zajęła (całkowity czas przetwarzania):
real time 0.19 seconds
cpu time 0.06 seconds
106
107 title 'Missing Report for data set';
108 proc print data=long;
109 run;
NOTE: There were 8 observations read from the data set WORK.LONG.
NOTE: PROCEDURE PRINT zajęła (całkowity czas przetwarzania):
real time 0.08 seconds
cpu time 0.03 seconds
4 System SAS 20:04 Tuesday, January 19, 2021
110
111 title 'Data set with columns missing more than 20% dropped';
112 proc print data=want (obs=5);
113 run;
NOTE: There were 5 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT zajęła (całkowity czas przetwarzania):
real time 0.01 seconds
cpu time 0.03 seconds
so how should i apply this data to match my code and get some results instead of errors?
well i run it, changed it to match my code and it did exactly nothing, thers no changes in data output and theres no results, nothing has happened, heres what i wrote:
%macro drop_missing_pct(input_dsn =, output_dsn=, pct = , id_vars=);
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=sklejone;
proc freq data=&INPUT_DSN. (drop = &ID_Vars);
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set sklejone;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
*not required for display purposes;
proc sort data=long;
by variable;
run;
*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;
*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;
clean up;
proc sql;
drop table long;
drop table sklejone;
quit;
%mend;
it didn't drop out missing data in my "sklejone" data set, it stayed completelly unchanged, same variables and values number
you stated that id_vars= is for variables i do not want included, well i dont have any that match that requirement since there's 2310 of them and when i left it witout stating any name it came out as multiple errors,
ERROR: The variable Name in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: No data set open to look up variables.
ERROR: File WORK.TEMP.DATA does not exist
ERROR 22-322: Syntax error, expecting one of the following: name, ;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.
ERROR: File WORK.TEMP.DATA does not exist
all of tehm refere to line:
%drop_missing_pct(input_dsn = sklejone, output_dsn = bez_brk, pct = 40, id_vars = Name);
so what should i do if i dont want to exclude any variables, or i should know what i should do here, but didn't get it because of my stupidity?
%drop_missing_pct(input_dsn = sklejone, output_dsn = bez_brk, pct = 40, id_vars = );
Leave it blank then. Passing in something that doesn't exist obviously won't work. Worse case, add a dummy variable but either way, this works for me. If you're getting errors with this, you need to post your full log.
ok i putted name= period there and it went through BUT the M values are still there, it didnt recognize them as "missing"
They are all numeric somehow
i tried this
value $ missfmt 'M'="Missing" other="Not Missing";
but that doesn't do anything, so that puts us almost entirely back to figuring out how to change this outcome
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.