BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Antoni
Fluorite | Level 6

all i did was filling: "%macro drop_missing_pct(input_dsn= have, output_dsn=want, pct = 0,4 , id_vars=);"

my code is:

data surowe_train;
set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train';
run;

data surowe_valid;
set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid';
run;

proc sql noprint;
create table sklejone as
select a.*, b.* from surowe_train as a left join surowe_valid as b
On a.cid=b.cid;
quit;

 

log from it is:

 

NOTE: Writing HTML5(EGHTML) Body file: EGHTML
27
28 data surowe_train;
29 set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train';
30 run;

NOTE: There were 52841 observations read from the data set
C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_train.
NOTE: The data set WORK.SUROWE_TRAIN has 52841 observations and 2310 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 9.87 seconds
cpu time 3.18 seconds

31
32 data surowe_valid;
33 set 'C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid';
34 run;

NOTE: There were 53070 observations read from the data set
C:\Users\Mateusz\Documents\Biblioteki\Raport\ABT_beh_sas\abt_sam_beh_valid.
NOTE: The data set WORK.SUROWE_VALID has 53070 observations and 2310 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 9.49 seconds
cpu time 3.32 seconds

35
36 proc sql noprint;
37 create table sklejone as
38 select a.*, b.* from surowe_train as a left join surowe_valid as b
39 On a.cid=b.cid;

then there is a few thousand warnings about data already existing in WORK.SKLEJONE>.(it has to do with my attempts to properly merge two files, i propaby created new library).

 

theres no log from your code because it had error(i suppose that is the reason, because i can olnly see error beside whole code in the log)

 

 

Reeza
Super User

Take this code and post it in a brand new, clean SAS session. 

Run it, and post the log from the output. Do not change ANYTHING. 

 



%macro drop_missing_pct(input_dsn = , output_dsn=, pct = , id_vars=);

*create format for missing;
proc format;
    value $ missfmt ' '="Missing" other="Not Missing";
    value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=temp;

proc freq data=&INPUT_DSN. (drop = &ID_Vars);
    table _all_ / missing;
    format _numeric_ nmissfmt. _character_ $missfmt.;
run;


ods select all;
*Format output;

data long;
    length variable $32. variable_value $50.;
    set temp;
    Variable=scan(table, 2);
    Variable_Value=strip(trim(vvaluex(variable)));
    presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
    keep variable variable_value frequency percent cum: presentation;
    label variable='Variable' variable_value='Variable Value';
run;

*not required for display purposes;
proc sort data=long;
    by variable;
run;

*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;

*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;

*clean up;
*uncomment after testing;
/* proc sql; */
/* drop table long; */
/* drop table temp; */
/* quit; */

%mend;


data class;
    set sashelp.class;

    if age=14 then
        call missing(height, weight, sex);

    if name='Alfred' then
        call missing(sex, age, height);
    label age="Fancy Age Label";
run;


%drop_missing_pct(input_dsn = class, output_dsn = want, pct = 20, id_vars = Name);



*check output;
proc contents data=want;
run;

title 'Missing Report for data set';
proc print data=long;
run;

title 'Data set with columns missing more than 20% dropped';
proc print data=want (obs=5);
run;
Antoni
Fluorite | Level 6

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.CLASS has 19 observations and 5 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.07 seconds
cpu time 0.00 seconds

97
98
99 %drop_missing_pct(input_dsn = class, output_dsn = want, pct = 20, id_vars = Name);
NOTE: Format $MISSFMT has been output.
NOTE: Format NMISSFMT has been output.

NOTE: PROCEDURE FORMAT zajęła (całkowity czas przetwarzania):
real time 0.03 seconds
cpu time 0.00 seconds

3 System SAS 20:04 Tuesday, January 19, 2021


NOTE: The data set WORK.TEMP has 8 observations and 13 variables.
NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: PROCEDURE FREQ zajęła (całkowity czas przetwarzania):
real time 0.28 seconds
cpu time 0.04 seconds


NOTE: There were 8 observations read from the data set WORK.TEMP.
NOTE: The data set WORK.LONG has 8 observations and 7 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.10 seconds
cpu time 0.01 seconds


NOTE: There were 8 observations read from the data set WORK.LONG.
NOTE: The data set WORK.LONG has 8 observations and 7 variables.
NOTE: PROCEDURE SORT zajęła (całkowity czas przetwarzania):
real time 0.03 seconds
cpu time 0.01 seconds

NOTE: PROCEDURE SQL zajęła (całkowity czas przetwarzania):
real time 0.00 seconds
cpu time 0.00 seconds


NOTE: There were 19 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.WANT has 19 observations and 2 variables.
NOTE: Instrukcja DATA zajęła (całkowity czas przetwarzania):
real time 0.01 seconds
cpu time 0.00 seconds

100
101
102
103 *check output;
104 proc contents data=want;
105 run;

NOTE: PROCEDURE CONTENTS zajęła (całkowity czas przetwarzania):
real time 0.19 seconds
cpu time 0.06 seconds

106
107 title 'Missing Report for data set';
108 proc print data=long;
109 run;

NOTE: There were 8 observations read from the data set WORK.LONG.
NOTE: PROCEDURE PRINT zajęła (całkowity czas przetwarzania):
real time 0.08 seconds
cpu time 0.03 seconds
4 System SAS 20:04 Tuesday, January 19, 2021

 

110
111 title 'Data set with columns missing more than 20% dropped';
112 proc print data=want (obs=5);
113 run;

NOTE: There were 5 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT zajęła (całkowity czas przetwarzania):
real time 0.01 seconds
cpu time 0.03 seconds

Reeza
Super User
So that works fine.
Does the output match what you expect?
In this example I removed all variables with more than 20% missing and ignored the Name variable. Is that similar to your use case?
Antoni
Fluorite | Level 6

so how should i apply this data to match my code and get some results instead of errors?

Reeza
Super User
Go through the code to first understand it and then you should understand how to apply it to your data set.

If there's a portion you don't understand please ask.
Antoni
Fluorite | Level 6

well i run it, changed it to match my code and it did exactly nothing, thers no changes in data output and theres no results, nothing has happened, heres what i wrote:

%macro drop_missing_pct(input_dsn =, output_dsn=, pct = , id_vars=);

*create format for missing;
proc format;
value $ missfmt ' '="Missing" other="Not Missing";
value nmissfmt .="Missing" other="Not Missing";
run;

*Proc freq to count missing/non missing;
ods select none;
*turns off the output so the results do not get too messy;
ods table onewayfreqs=sklejone;

proc freq data=&INPUT_DSN. (drop = &ID_Vars);
table _all_ / missing;
format _numeric_ nmissfmt. _character_ $missfmt.;
run;


ods select all;
*Format output;

data long;
length variable $32. variable_value $50.;
set sklejone;
Variable=scan(table, 2);
Variable_Value=strip(trim(vvaluex(variable)));
presentation=catt(frequency, " (", trim(put(percent/100, percent7.1)), ")");
keep variable variable_value frequency percent cum: presentation;
label variable='Variable' variable_value='Variable Value';
run;

*not required for display purposes;
proc sort data=long;
by variable;
run;

*select variables more than x% missing;
proc sql noprint;
select variable into :drop_var_list separated by " "
from long where variable_value = 'Missing' and percent > &pct;
quit;

*Drop variables;
data &output_dsn;
set &input_dsn;
drop &drop_var_list;
run;

clean up;
proc sql;
drop table long;
drop table sklejone;
quit;

%mend;

 

it didn't drop out missing data in my "sklejone" data set, it stayed completelly unchanged, same variables and values number

Reeza
Super User
You never called the macro you only defined it.

Note the bottom portion of the code where I said test? That's the part that actually does the analysis and you don't have any of that. You shouldn't change anything in the macro definition code.


*macro definition - nothing happens after you run this part;
%macro demo_print(datasetName=);
proc print data=&datasetName(obs=5) label noobs;
run;
%mend;

*macro execution, actually does the work, using the parameters provided;
%demo_print(datasetName=sashelp.class);
*show an example with another one;
%demo_print(datasetName = sashelp.cars);





Antoni
Fluorite | Level 6

you stated that id_vars= is for variables i do not want included, well i dont have any that match that requirement since there's 2310 of them and when i left it witout stating any name it came out as multiple  errors, 

ERROR: The variable Name in the DROP, KEEP, or RENAME list has never been referenced.
ERROR: No data set open to look up variables.

ERROR: File WORK.TEMP.DATA does not exist
ERROR 22-322: Syntax error, expecting one of the following: name, ;, _ALL_, _CHARACTER_, _CHAR_, _NUMERIC_.

ERROR: File WORK.TEMP.DATA does not exist

all of tehm refere to line:

%drop_missing_pct(input_dsn = sklejone, output_dsn = bez_brk, pct = 40, id_vars = Name);

 

so what should i do if i dont want to exclude any variables, or i should know what i should do here, but didn't get it because of my stupidity?

Reeza
Super User
%drop_missing_pct(input_dsn = sklejone, output_dsn = bez_brk, pct = 40, id_vars = );

Leave it blank then. Passing in something that doesn't exist obviously won't work. Worse case, add a dummy variable but either way, this works for me. If you're getting errors with this, you need to post your full log.

Antoni
Fluorite | Level 6

ok i putted name= period there and it went through BUT the M values are still there, it didnt recognize them as "missing"

 

Reeza
Super User
Have you determined if it's M or the special missing value .M as others have previously mentioned?
What is the type of the variables with M? All of them.
Antoni
Fluorite | Level 6

They are all numeric somehow

Antoni_0-1611101352330.png

 



Antoni
Fluorite | Level 6

i tried this
value $ missfmt 'M'="Missing" other="Not Missing";
but that doesn't do anything, so that puts us almost entirely back to figuring out how to change this outcome

 

Reeza
Super User
Someone else posted the fix earlier, the format code needs to change to include .M not just the missing/period.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 46 replies
  • 2929 views
  • 14 likes
  • 6 in conversation