BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Fersal
Calcite | Level 5

Hi all,

I have a dataset from which I wish to drop (for subsequent analysis) those values of the variables with only one observation, that is with a Freq count of 1. In following table there is a similar example. The values in bold are those I do not want to consider in my analysis. I appreciate any suggestion. Thanks

 

Trait Var1 Var2 ...Var300
1 200/200 120/120 100/100
1 150/150 140/140 99/99
1 100/100 180/180 135/135
1 150/150 120/120 100/100
1 100/100 180/180 160/160
1 150/150 119/119 160/160
1 200/200 140/140 135/135
0 150/150 180/180
0 149/149 140/140 160/160
0 100/100 120/120 133/133
0 100/100 140/140 100/100
0 200/200 180/180 135/135
0 150/150 140/140 160/160
0 200/200 180/180 100/100
0 100/100 120/120 135/135

1 ACCEPTED SOLUTION

Accepted Solutions
Loko
Barite | Level 11

Hello,

 

It need a little bit of refinement (not creating the macro variable allvars manually), but may be a solution:

 


data have;
infile datalines missover;
input Trait $ Var1 :$7. Var2 :$7. Var300 :$7.;
datalines;
1 200/200 120/120 100/100
1 150/150 140/140 99/99
1 100/100 180/180 135/135
1 150/150 120/120 100/100
1 100/100 180/180 160/160
1 150/150 119/119 160/160
1 200/200 140/140 135/135
0 150/150 180/180
0 149/149 140/140 160/160
0 100/100 120/120 133/133
0 100/100 140/140 100/100
0 200/200 180/180 135/135
0 150/150 140/140 160/160
0 200/200 180/180 100/100
0 100/100 120/120 135/135
run;

%let allvars=Var1 Var2 Var300;

%macro h;

%let i=1;



%do %while (%scan(&allvars,&i) ne);

%let var=%scan(&allvars,&i);
proc sql;

create table int as
select &var, count(&var) as no
from have
group by &var
having no eq 1;

quit;

data have;

set have;

if _n_=1 then
 do;
     declare hash i(dataset:'int');
    i.definekey("&var");
    i.definedone();

 end;
 
rc=i.check();

if rc=0 then call missing(&var);

run;

%let i=%eval(&i+1);
%end;


%mend h;

%h

View solution in original post

7 REPLIES 7
Loko
Barite | Level 11

Hello,

 

It need a little bit of refinement (not creating the macro variable allvars manually), but may be a solution:

 


data have;
infile datalines missover;
input Trait $ Var1 :$7. Var2 :$7. Var300 :$7.;
datalines;
1 200/200 120/120 100/100
1 150/150 140/140 99/99
1 100/100 180/180 135/135
1 150/150 120/120 100/100
1 100/100 180/180 160/160
1 150/150 119/119 160/160
1 200/200 140/140 135/135
0 150/150 180/180
0 149/149 140/140 160/160
0 100/100 120/120 133/133
0 100/100 140/140 100/100
0 200/200 180/180 135/135
0 150/150 140/140 160/160
0 200/200 180/180 100/100
0 100/100 120/120 135/135
run;

%let allvars=Var1 Var2 Var300;

%macro h;

%let i=1;



%do %while (%scan(&allvars,&i) ne);

%let var=%scan(&allvars,&i);
proc sql;

create table int as
select &var, count(&var) as no
from have
group by &var
having no eq 1;

quit;

data have;

set have;

if _n_=1 then
 do;
     declare hash i(dataset:'int');
    i.definekey("&var");
    i.definedone();

 end;
 
rc=i.check();

if rc=0 then call missing(&var);

run;

%let i=%eval(&i+1);
%end;


%mend h;

%h

Fersal
Calcite | Level 5

Hi Loko, thank you for your macro, it worked well, one more thing, if I want to remove those variable with only one class, e.g. suppose a variable (look below) with only two classes, this one is going to stay with an unique class (170/170) after running the macro, how I could drop those variables from my data using the same macro?

 

Var4
170/170
170/170
170/170
178/178
170/170
170/170
170/170
170/170
170/170
170/170
170/170
170/170
170/170
170/170
170/170

Reeza
Super User

Look at proc sort with nounique option. 

You'll have to run multiple iterations for each variable so maintaining a macro loop is probably still required. 

FreelanceReinh
Jade | Level 19

Hi @Fersal,

 

First of all, you should be sure that your analysis will still be valid after excluding values just because they happen to occur only once within a trait.

 

If your dataset contains only 7-8 observations per trait (as your sample data), isn't it quite likely that some valid values (in some of the 300 variables) will occur only once? On the other hand, if your real dataset has thousands of observations, some "improbable" values might occur 2, 3 or 4 times, or only once, just by chance.

 

What I'm trying to say is that the rule "exclude values with frequency count 1" is not necessarily sensible.

 

Maybe you can specify exclusion criteria based on the values themselves and apply these by means of (possibly user-defined) functions or formats.

 

Also, it could simplify the code if you transposed the data from wide to long format (esp. if the 300 variables have identical types and lengths).

Fersal
Calcite | Level 5

 

Hi FreelanceReinhard,

Thanks for your comments, actually the original dataset is pretty big, and the number of observation is much higher than number of variables (not like the example I showed). I have had a convergence problem with this data when I try to fit them to a logit model (even after apply the Firth option), so there is not likelihood estimates for these variables, for that reason I want to know what happen if I remove those classes.

 

Regards

 

 

Reeza
Super User

I thought it was 0 cells that caused the issue, not 1's?

 

PGStats
Opal | Level 21

If you still want to do it:

 

data have;
infile datalines truncover;
input Trait (Var1 - Var3) ($);
datalines;
1 200/200 120/120 100/100
1 150/150 140/140 99/99
1 100/100 180/180 135/135
1 150/150 120/120 100/100
1 100/100 180/180 160/160
1 150/150 119/119 160/160
1 200/200 140/140 135/135
0 150/150 180/180
0 149/149 140/140 160/160
0 100/100 120/120 133/133
0 100/100 140/140 100/100
0 200/200 180/180 135/135
0 150/150 140/140 160/160
0 200/200 180/180 100/100
0 100/100 120/120 135/135
;

data list;
set have;
obs = _n_;
array a{*} var:;
do i = 1 to dim(a);
    if not missing(a{i}) then do;
        value = a{i};
        output;
        end; 
    end;
keep Trait obs value;
run;

proc sql;
create table newList as
select Trait, obs, value
from list
group by Trait, value
having count(*) > 1
order by Trait, obs, value;
quit;

proc transpose data=newList out=want(drop=_name_ obs) prefix=var;
var value;
by Trait obs;
run;
PG

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 989 views
  • 2 likes
  • 5 in conversation