BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
chintanpatel
Calcite | Level 5

Hello,

I have two data set. dataout_mean and dataout_meanfinal. both has only one variable father_mean.

in dataout_mean there are 100 observations for variable father_mean.

in dataout_meanfinal i have mean value of this 100 observations for variable father_mean.

 

for the mean value that i have in dataset dataout_meanfinal, i want to find closest value in dataout_mean. can anyone tell me how it is possible ?

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21
proc sql;
  create table want as
    select a._imp_,a.father_mean1
      from dataset1 a, dataset2 b
        having abs(a.father_mean1-b.father_mean1) eq min(abs(a.father_mean1-b.father_mean1))
  ;
quit;

That will add _imp_ as well,

 

Art, CEO, AnalystFinder.com

 

View solution in original post

18 REPLIES 18
Reeza
Super User

1. Merge mean in with full dataset

2. Take difference between mean and all observations. 

3. Find one with minimum difference - but look at signs, ie difference can be positive or negative so consider absolute difference.

 

 

chintanpatel
Calcite | Level 5

May I ask you for a code please ?

Reeza
Super User

Give it a shot and I'll help debug. Or wait for someone else to answer 🙂

 

Here's step 1 to get you started:

 

https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset

chintanpatel
Calcite | Level 5

Hello,

I took out the value of mean and find the closest value from the data set. so I have only one observation and 5 variables in a dataset.

now based on a variable on this dataset i want to extract observation from other dataset can anyone help me with this ?

 

dataset1

_imputation_   _freq_  father_mean  mother_mean

        45               934      64.21321      65.34324

 

dataset2

_imputation_  fahter  mother  child

        1                64        62       45

        2                67        66       34

       45               45        23       23

       45               34        23       12

and so on in dataset2 i have 90,000 observation from which 934 observation has same _imputation_. I want to have it dynamic because everytime i run my program this value 45 will change. from dataset2 based on this value(45 for this case) i want to keep observation and rest all I want to delete.

anyhelp will be appreciated.

 

Reeza
Super User

I'm not following anymore. Can you please clarify your request? 

chintanpatel
Calcite | Level 5

i have two dataset. in both dataset there is one variable common _imputation_. In dataset2 i have too many observation and i want to keep only those observation where _imputation_ value is same in both dataset. rest I want to delete.

Patrick
Opal | Level 21

@chintanpatel

It would have been an act of courtesy for you to provide us with fully working datasets creating the sample data. I've done this now for you.

 

Does below give you what you're after?

data dataset1;
  input _imputation_ _freq_ father_mean mother_mean;
datalines;
45 934 64.21321 65.34324
;
run;
 
data dataset2;
  input _imputation_ father mother child;
datalines;
1 64 62 45
2 67 66 34
45 45 23 23
45 34 23 12
;
run;


proc sql;
  create table want as
    select
      ds2.*,
      ds1.father_mean,
      ds1.mother_mean
    from 
      dataset1 as ds1 inner join dataset2 as ds2
        on ds1._imputation_=ds2._imputation_
  ;
quit;

 

chintanpatel
Calcite | Level 5
I try my best to explain. First my English is not so good. and second I am new to data analysis and sas. please forgive me.
art297
Opal | Level 21

Is the following all you are trying to do?:

 

data dataset1;
  input _imputation_   _freq_  father_mean  mother_mean;
  cards;
        45               934      64.21321      65.34324
 ;

data dataset2;
  input _imputation_  fahter  mother  child;
  cards;
        1                64        62       45
        2                67        66       34
       45               45        23       23
       45               34        23       12
;

data want;
  merge dataset1 (in=ina) dataset2 (in=inb);
  by _imputation_;
  if ina and inb;
run;

Art, CEO, AnalystFinder.com

 

chintanpatel
Calcite | Level 5

I have two data set.

 

dataset1                              dataset2

column1                               column1

0.693                                   0.604

0.654

0.567

0.754

0.845

 

i want to select closest value of dataset2 in dataset1.

 

chintanpatel
Calcite | Level 5
In dataset1 I want to select closest value to dataset2.
Patrick
Opal | Level 21

@chintanpatel

You've already accepted an answer as solution. Are you now telling us that your requirements changed and that this is no more the solution?

If so then can I suggest that you open a new thread with your new requirements (and link to this already solved thread in your question).

 

Also: Please provide the data AND especially show us how the desired output should look like (which rows with which data in it)

Reeza
Super User

@chintanpatel wrote:

I have two data set.

 

dataset1                              dataset2

column1                               column1

0.693                                   0.604

0.654

0.567

0.754

0.845

 

i want to select closest value of dataset2 in dataset1.

 


This was my initial assumption of what you were trying to do. 

But your father mother data set didn't align with this and the solution you've marked as correct doesn't do this. So I'm confused again. 

 

It has nothing to do with English, but probably more related to being a beginner at analysis. 

Some good guidelines on asking questions is here : https://stackoverflow.com/help/how-to-ask

 

 

 

chintanpatel
Calcite | Level 5

I am writing it again. I did accept it because for the second question I asked he gave me the right solution for it.

Here is my full question

I have two dataset in which i want to find closest or nearest value in dataset1 based on dataset2. My dataset have over 90,000 values. For easy to understand and answer i am just writing a part of dataset.

in dataset1 I have three variables and 5 observations.

_imp_   father_mean1

   46          69.132352143

   51          69.173300123

   37          69.180984823

   105        69.123531423

   981        69.162347234

 

 

in dataset2 I have two varibales and one observation

 _freq_    father_mean1

   9           69.174400123

 

As an output in dataset1 I want to find closest value for father_mean on the bases of father_mean in dataset2. so the output should be

_imp_       father_mean     

   51         69.173300123

 

this was the first part and for second part I got the right solution. please.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 4919 views
  • 0 likes
  • 4 in conversation