SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Data Manipulation

Reply
Regular Contributor
Posts: 184

Data Manipulation

I have 3 columns of data shown in the table below. It's the same format that exist in my data set. I want minimum 1-R2 value of each cluster against their cluster name.

ClusterVariable1-R**2 Ratio
Cluster 1brclus20.4901
MIPhone0.005
MIPOS0.005
MIPOSAmt0.005
MICC0.005
MICCBal0.005
MICCPurc0.005
Cluster 2HMVal0.3206
Age0.4632
MIIncome0.0247
MIHMOwn0.0811
MILORes0.0247
MIHMVal0.0247
MIAge0.1094
Cluster 3Dep0.4235
Checks0.3383
Teller0.5586
Cluster 4MTGBal0.039

The desired output is shown in the table below :

ClusterVariable1-R**2 Ratio
Cluster 1MIPhone0.005
Cluster 2MIIncome0.0247
Cluster 3Checks0.3383
Cluster 4MTGBal0.039

Thanks in anticipation!

Super User
Posts: 5,437

Re: Data Manipulation

So your input data is like a report?

Use retain in the data step to get the cluster name/no on each line.

But how do you chose which "variable" 1-R**2 Ratio-values to keep if there are many that have the same value, the original order?

If so, you could a couple of different techniques. One is to again use retain to keep track on the minimum value. Then read the output database.

Another can be to use SQL. Use monotonic() to keep track of original sorting order, and use having  min(row_no) = row_no and min(1-R**2 Ratio) = 1-R**2 Ratio

Data never sleeps
Occasional Contributor
Posts: 17

Re: Data Manipulation

Hi,

How can we retain the cluster values, can you please explain with SAS Code?

Sorry but I am new so please dont mind.

Regards,

Yogesh

Super User
Posts: 5,437

Re: Data Manipulation

Posted in reply to shubhayog

Go support.sas.com.

Search for RETAIN statement. There you will have syntax and tons of samples.

By reading your response, I think you need some basic SAS programming training.

Data never sleeps
Respected Advisor
Posts: 3,799

Re: Data Manipulation

Posted in reply to shubhayog

RETAIN can't magically fix your data problem as implied by @linus you will need some kind of additional logic.  I like to use UPDATE for this, it can fix a number of variables without having to do too much work.

data cluster;
   input Cluster &$10.  Variable $ Ratio;
   cards;
Cluster 1   brclus2  0.4901
.  MIPhone  0.005
.  MIPOS 0.005
.  MIPOSAmt 0.005
.  MICC  0.005
.  MICCBal  0.005
.  MICCPurc 0.005
Cluster 2   HMVal 0.3206
.  Age   0.4632
.  MIIncome 0.0247
.  MIHMOwn  0.0811
.  MILORes  0.0247
.  MIHMVal  0.0247
.  MIAge 0.1094
Cluster 3   Dep   0.4235
.  Checks   0.3383
.  Teller   0.5586
Cluster 4   MTGBal   0.039
;;;;
   run;
proc print;
  
run;
data cv / view=cv;
   retain dummy 1;
  
set cluster;
   keep dummy cluster;
   run;
data filled;
   update cv(obs=0) cv;
   by dummy;
   set cluster(drop=cluster);
   output;
  
drop dummy;
   run;
proc print;
  
run;

3-7-2015 8-46-55 AM.png
Occasional Contributor
Posts: 17

Re: Data Manipulation

Posted in reply to data_null__

Hi,

Thank you very much for help. Smiley Happy

Regards,

Yogesh

Frequent Contributor
Posts: 95

Re: Data Manipulation

Something like this.     This doesn't answer  multiple variables with same low ratio.     Jim

    Data;    retain cluster $8.;

       input @1 clu  $8.   @20 variable $8.   @35 ratio;

           if clu=:'clust'  then cluster=clu;     **** grab cluster;

      datalines;

         

      proc sort;  by cluster ratio;

      data new;  set;  by cluster;

            if first.cluster then output;    **** lowest ratio;

       proc print;   run; 

Ask a Question
Discussion stats
  • 6 replies
  • 401 views
  • 1 like
  • 5 in conversation