Solved: Re: Create a new variable based on value of a specific observation in ...

Report Inappropriate Content · Posted 05-05-2014 03:30 PM

This seems like easy one, but I have not found the solution yet.

I want to create a new variable (numeric) based on the value of another variable (numeric) for a particular observation (numeric).

So I want to create x2 in the following:

Group Group_Value X2

1 3 6

2 6 6

3 1 6

4 0 6

5 1 6

6 7 6

7 2 6

So the statement would says if group = 2 then x2 = the Group_value for that observation. Thus X2 = 6 for all groups now because that is the group_value for group 2 that we told it to populate for all groups.

I can describe in more detail if needed, but felt the topic was simple enough that this should cover it.

Linlin · Posted 05-05-2014 03:40 PM

data have;
input Group    : Group_Value;
cards;

1               3
2               6
3               1
4               0
5               1
6               7
7               2
;
data _null_;
set have(where=(group=2));
call symputx('group2',group_value);

data want;
set have;
x2=&group2;
proc print;run;

View solution in original post

Linlin · Posted 05-05-2014 03:40 PM

data have;
input Group    : Group_Value;
cards;

1               3
2               6
3               1
4               0
5               1
6               7
7               2
;
data _null_;
set have(where=(group=2));
call symputx('group2',group_value);

data want;
set have;
x2=&group2;
proc print;run;

Report Inappropriate Content · Posted 05-05-2014 04:01 PM

Bingo. Thank you that worked. However, I will put a twist in this. What if I wanted another variable to be created based another group=2 variable.

Also, I want eventually to created other values based on values for different groups see following:

Group x1 y1 x2 y2 x3 y3

1 3 5 6 5 7 1

2 6 5 6 5 7 1

3 1 2 6 5 7 1

4 0 4 6 5 7 1

5 1 7 6 5 7 1

6 7 1 6 5 7 1

7 2 9 6 5 7 1

So I have group, x1 and y1 and x1 is pretty much the same thing as group_value in prior example. Now I want x2 and y2 to = values of x1 and y1, respectively if group = 2 AND x3 and y3 to = values of x1 and y1, respectively for group = 6. I can add more information if needed.

PGStats · Posted 05-05-2014 04:10 PM

Hi H. What is the ultimate goal of all these rules?

PG

Report Inappropriate Content · Posted 05-05-2014 04:26 PM

Excellent question. I have a list of zip codes for centers((buildings) aka the group variable), I also have their longitudes and latitudes via sashelp.zipcode. I am then going to calculate distances between all centers ~ 100 and a select list of ~ 10 centers. Next I want to determine how far away the closes of the 10 select centers are to each of the ~ 100 centers.

So what I am doing now is creating a longitude and latitudes variable for each of the ~ 10 select centers (x1-x10 and y1-y10), which I will then use to calculate distances with then create a new variable populated by the value in miles of the closes select center to each of the ~ 100 centers

center = a builiding location.

Report Inappropriate Content · Posted 05-05-2014 04:42 PM

Thank you Linlin, after I finally inputted the correct names into the variables for my dataset, this seems to work perfectly.

Much appreciation!

PGStats · Posted 05-05-2014 05:56 PM

Your approach seems overly complicated. If you have two datasets, allCenters and selectCenters, both with variables (Name, ZIP, lat, long), you can find the nearest centers in a single step:

proc sql;

create table nearest as

select

s.name as selectName,

a.name as nearName,

geodist(s.lat, s.long, a.lat, a.long, "M") as miles

from allCenters as a cross join selectCenters as s

where s.name ne a.name

group by s.name

having calculated miles = min(calculated miles);

select * from nearest;

quit;

PG

Report Inappropriate Content · Posted 05-06-2014 10:31 AM

Thank you to everyone, I was able to get my desired variable. I ended up using Linlin's last bit of code along with: comment by Dan at the following: http://blogs.sas.com/content/iml/2012/05/21/find-the-minumum-value-in-each-row/

Yes, PGSTAT, my data management skills usually leave me over complicating tasks. As I mentioned in a previous post, most of these tasks I perform so sporadically that they are not second nature. I will probably end up sitting on this code and reusing it about once a year, so I end-up being a sub-mediocre data management person. However, if time allows - I will try out your code option as well since as of now I also incorporate the general round distance code with all of its sines and cosines.

Also, I am determining distances based on zip codes, which is less ideal than using addresses. Does anyone know of a way to use addresses, and pull their longitudes and latitudes from SAS to get more exact distances?

Report Inappropriate Content · Posted 05-06-2014 11:54 AM

PGSTAT,

Follow-up, I tried your code and it worked. I only had to switch the order of the datasets to get the nearest center to each of the observations listed in allcenters instead of selectCenters and change the data format to radians ("RM").

My follow-up question is I calculate the distances using

geodist function (you provided)

Great Circle Distance Formula (http://support.sas.com/kb/3/973.html)

DISTANCE = 4000*ARCOS(arc); arc = SIN(Y2)*SIN(Y1) + COS(Y2)*COS(Y1)*COS(X2-X1)

(http://support.sas.com/kb/3/973.html)

Do you or anyone else have any preferences for these three methods, I looked up one distance that I knew in my head to be about two miles by foot in real life (not based on zip code centers) and all three were 1.6158, 1.6078, and 162.81, miles respectively.

I think for all intents and purposes, I should be fine with any of the methods, but did not know if anyone knew if one was confirmed to be better (I am in Midwestern United States)?

PGStats · Posted 05-06-2014 01:35 PM

Well, only birds and planes travel along great circles, and even they have to worry about winds . Others will care more about getting an answer quickly. Given the reference quoted in GEODIST documentation, I would think that GEODIST is the most efficient approximation. It remains to be tested.

PG

TomKari · Posted 05-06-2014 02:00 PM

Very interesting! I didn't know about the GEODIST function, but now that I do, I LIKE it!

I gave the three a try with two places I travel between, and my results were:

GEODIST function: 1202.8013791
Great Circle Distance: 1201.1238757
4000*ARCOS(arc); arc = SIN(Y2)*SIN(Y1) + COS(Y2)*COS(Y1)*COS(X2-X1): 1216.3310547

Since Note 3973 says that the great circle formula more accurately accounts for curvature, and since it's so close to GEODIST, I'd rule out formula #3. As the other two are within .2%, I doubt there's any way to tell which is more accurate!

Personally, I'd go with the GEODIST function, because if any improvements to the methodology are found, SAS will build them into the formula, and you'll automatically derive the benefit.

Tom

P.S. Unless you work for the U.S. DOD, in which case I suspect you have access to information that is MUCH MORE ACCURATE!! 😉

Linlin · Posted 05-06-2014 04:34 PM

Hi Tom,

You are not alone, I didn’t know about the GEODIST function neither.Thanks to PG we all know now. I strongly suggest H mark PG’s answer as correct answer.

Report Inappropriate Content · Posted 05-07-2014 09:20 AM

Well prior to my last post I had marked PG's post as helpful and "Like", but did not see a "Correct Answer" option. Please let me know where that option is and I will mark it accordingly. Or is marking it correct not an option since I marked Linlin's original post as correct?

TomKari · Posted 05-07-2014 11:18 AM

Yes, you can only mark one answer as "correct". I don't know if there's a way to switch it, but to be honest I wouldn't worry about it.

Tom

AnnaBrown · Posted 05-07-2014 12:49 PM

H,

You can certainly not worry about but if you'd like to give this a shot, this may work for you to ID the right answer/s:

-Unmark the initial "correct answer" from Linlin

-Unmark the "helpful answer" from PGStats

-Mark PGStats' answer as correct

-Mark Linlin's answer as helpful if you'd like

I think that should work...

Anna

Access SAS Innovate on-demand content now!

Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Re: Create a new variable based on value of a specific observation in another variable

Registration is open