all i did was filling: "%macro drop_missing_pct(input_dsn= have, output_dsn=want, pct = 0,4 , id_vars=);"
my code is:
data surowe_train;
set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train';
run;
data surowe_valid;
set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid';
run;
proc sql noprint;
create table sklejone as
select a.*, b.* from surowe_train as a left join surowe_valid as b
On a.cid=b.cid;
quit;
log from it is:
NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 data surowe_train;
29 set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train';
30 run;
NOTE: There were 52841 observations read from the data set
C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train.
NOTE: The data set WORK.SUROWE_TRAIN has 52841 observations and 2310 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 9.87 seconds
cpu time 3.18 seconds
31
32 data surowe_valid;
33 set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid';
34 run;
NOTE: There were 53070 observations read from the data set
C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid.
NOTE: The data set WORK.SUROWE_VALID has 53070 observations and 2310 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 9.49 seconds
cpu time 3.32 seconds
35
36 proc sql noprint;
37 create table sklejone as
38 select a.*, b.* from surowe_train as a left join surowe_valid as b
39 On a.cid=b.cid;
then there is a few thousand warnings about data already existing in WORK.SKLEJONE>.(it has to do with my attempts to properly merge two files, i propaby created new library).
theres no log from your code because it had error(i suppose that is the reason, because i can olnly see error beside whole code in the log)
Take this code and post it in a brand new, clean SAS session.
Run it, and post the log from the output. Do not change ANYTHING.
%macro drop_missing_pct(input_dsn = , output_dsn=, pct = , id_vars=);
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;
proc freq data=&INPUT_DSN. (drop = &ID_Vars);
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set temp;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
*not required for display purposes;
proc sort data=long;
by variable;
run;
*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;
*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;
*clean up;
*uncomment after testing;
/* proc sql; */
/* drop table long; */
/* drop table temp; */
/* quit; */
%mend;
data class;
set sashelp.class;
if age=14 then
call missing(height, weight, sex);
if name='Alfred' then
call missing(sex, age, height);
label age="Fancy Age Label";
run;
%drop_missing_pct(input_dsn = class, output_dsn = want, pct = 20, id_vars = Name);
*check output;
proc contents data=want;
run;
title 'Missing Report for data set';
proc print data=long;
run;
title 'Data set with columns missing more than 20% dropped';
proc print data=want (obs=5);
run;
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 5 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.07 seconds
cpu time 0.00 seconds
97
98
99 %drop_missing_pct(input_dsn = class, output_dsn = want, pct = 20, id_vars = Name);
NOTE: Format $MISSFMT has been output.
NOTE: Format NMISSFMT has been output.
NOTE: PROCEDURE FORMAT zajęła (całkowity czas przetwarzania):
real time 0.03 seconds
cpu time 0.00 seconds
3 System SAS 20:04 Tuesday, January 19, 2021
NOTE: The data set WORK.TEMP has 8 observations and 13 variables.
NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: PROCEDURE FREQ zajęła (całkowity czas przetwarzania):
real time 0.28 seconds
cpu time 0.04 seconds
NOTE: There were 8 observations read from the data set WORK.TEMP.
NOTE: The data set WORK.LONG has 8 observations and 7 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.10 seconds
cpu time 0.01 seconds
NOTE: There were 8 observations read from the data set WORK.LONG.
NOTE: The data set WORK.LONG has 8 observations and 7 variables.
NOTE: PROCEDURE SORT zajęła (całkowity czas przetwarzania):
real time 0.03 seconds
cpu time 0.01 seconds
NOTE: PROCEDURE SQL zajęła (całkowity czas przetwarzania):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.WANT has 19 observations and 2 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.01 seconds
cpu time 0.00 seconds
100
101
102
103 *check output;
104 proc contents data=want;
105 run;
NOTE: PROCEDURE CONTENTS zajęła (całkowity czas przetwarzania):
real time 0.19 seconds
cpu time 0.06 seconds
106
107 title 'Missing Report for data set';
108 proc print data=long;
109 run;
NOTE: There were 8 observations read from the data set WORK.LONG.
NOTE: PROCEDURE PRINT zajęła (całkowity czas przetwarzania):
real time 0.08 seconds
cpu time 0.03 seconds
4 System SAS 20:04 Tuesday, January 19, 2021
110
111 title 'Data set with columns missing more than 20% dropped';
112 proc print data=want (obs=5);
113 run;
NOTE: There were 5 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT zajęła (całkowity czas przetwarzania):
real time 0.01 seconds
cpu time 0.03 seconds
so how should i apply this data to match my code and get some results instead of errors?
well i run it, changed it to match my code and it did exactly nothing, thers no changes in data output and theres no results, nothing has happened, heres what i wrote:
%macro drop_missing_pct(input_dsn =, output_dsn=, pct = , id_vars=);
*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;
*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=sklejone;
proc freq data=&INPUT_DSN. (drop = &ID_Vars);
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;
ods select all;
*Format output;
data long;
length variable $32. variable_value $50.;
set sklejone;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;
*not required for display purposes;
proc sort data=long;
by variable;
run;
*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;
*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;
clean up;
proc sql;
drop table long;
drop table sklejone;
quit;
%mend;
it didn't drop out missing data in my "sklejone" data set, it stayed completelly unchanged, same variables and values number
you stated that id_vars= is for variables i do not want included, well i dont have any that match that requirement since there's 2310 of them and when i left it witout stating any name it came out as multiple errors,
ERROR: The variable Name in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: No data set open to look up variables.
ERROR: File WORK.TEMP.DATA does not exist
ERROR 22-322: Syntax error, expecting one of the following: name, ;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.
ERROR: File WORK.TEMP.DATA does not exist
all of tehm refere to line:
%drop_missing_pct(input_dsn = sklejone, output_dsn = bez_brk, pct = 40, id_vars = Name);
so what should i do if i dont want to exclude any variables, or i should know what i should do here, but didn't get it because of my stupidity?
%drop_missing_pct(input_dsn = sklejone, output_dsn = bez_brk, pct = 40, id_vars = );
Leave it blank then. Passing in something that doesn't exist obviously won't work. Worse case, add a dummy variable but either way, this works for me. If you're getting errors with this, you need to post your full log.
ok i putted name= period there and it went through BUT the M values are still there, it didnt recognize them as "missing"
They are all numeric somehow
i tried this
value $ missfmt 'M'="Missing" other="Not Missing";
but that doesn't do anything, so that puts us almost entirely back to figuring out how to change this outcome
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.