BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
H
Pyrite | Level 9 H
Pyrite | Level 9

This seems like easy one, but I have not found the solution yet.

I want to create a new variable (numeric) based on the value of another variable (numeric) for a particular observation (numeric).

So I want to create x2 in the following:

Group     Group_Value          X2

1               3                            6

2               6                            6

3               1                            6

4               0                            6

5               1                            6

6               7                            6

7               2                            6

So the statement would says if group = 2 then x2 = the Group_value for that observation.  Thus X2 = 6 for all groups now because that is the group_value for group 2 that we told it to populate for all groups.

I can describe in more detail if needed, but felt the topic was simple enough that this should cover it.

1 ACCEPTED SOLUTION

Accepted Solutions
Linlin
Lapis Lazuli | Level 10

data have;
input Group    : Group_Value;       
cards;

1               3                           
2               6                          
3               1                           
4               0                           
5               1                          
6               7                           
7               2
;
data _null_;
set have(where=(group=2));
call symputx('group2',group_value);

data want;
  set have;
  x2=&group2;
proc print;run;

View solution in original post

18 REPLIES 18
Linlin
Lapis Lazuli | Level 10

data have;
input Group    : Group_Value;       
cards;

1               3                           
2               6                          
3               1                           
4               0                           
5               1                          
6               7                           
7               2
;
data _null_;
set have(where=(group=2));
call symputx('group2',group_value);

data want;
  set have;
  x2=&group2;
proc print;run;

H
Pyrite | Level 9 H
Pyrite | Level 9

Bingo. Thank you that worked. However, I will put a twist in this.  What if I wanted another variable to be created based another group=2 variable.

Also, I want eventually to created other values based on values for different groups see following:

Group     x1          y1            x2          y2          x3         y3

1               3        5               6           5             7          1

2               6        5               6           5             7          1

3               1        2               6           5             7          1

4               0        4               6           5             7          1

5               1        7               6           5             7          1

6               7        1               6           5             7          1

7               2        9               6           5             7          1

So I have group, x1 and y1 and x1 is pretty much the same thing as group_value in prior example.  Now I want x2 and y2 to = values of x1 and y1, respectively if group = 2  AND x3 and y3 to = values of x1 and y1, respectively for group = 6. I can add more information if needed.

PGStats
Opal | Level 21

Hi H. What is the ultimate goal of all these rules?

PG
H
Pyrite | Level 9 H
Pyrite | Level 9

Excellent question.  I have a list of zip codes for centers((buildings) aka the group variable), I also have their longitudes and latitudes via sashelp.zipcode. I am then going to calculate distances between all centers ~ 100 and a select list of ~ 10 centers. Next I want to determine how far away the closes of the 10 select centers are to each of the ~ 100 centers.

So what I am doing now is creating a longitude and latitudes variable for each of the ~ 10 select centers (x1-x10 and y1-y10), which I will then use to calculate distances with then create a new variable populated by the value in miles of the closes select center to each of the ~ 100 centers

center = a builiding location.


H
Pyrite | Level 9 H
Pyrite | Level 9

Thank you Linlin, after I finally inputted the correct names into the variables for my dataset, this seems to work perfectly.

Much appreciation!

PGStats
Opal | Level 21

Your approach seems overly complicated. If you have two datasets, allCenters and selectCenters, both with variables (Name, ZIP, lat, long), you can find the nearest centers in a single step:

proc sql;

create table nearest as

select

  s.name as selectName, 

  a.name as nearName,

  geodist(s.lat, s.long, a.lat, a.long, "M") as miles

from allCenters as a cross join selectCenters as s

where s.name ne a.name

group by s.name

having calculated miles = min(calculated miles);

select * from nearest;

quit;

PG

PG
H
Pyrite | Level 9 H
Pyrite | Level 9

Thank you to everyone, I was able to get my desired variable. I ended up using Linlin's last bit of code along with: comment by Dan at the following: http://blogs.sas.com/content/iml/2012/05/21/find-the-minumum-value-in-each-row/

Yes, PGSTAT, my data management skills usually leave me over complicating tasks. As I mentioned in a previous post, most of these tasks I perform so sporadically that they are not second nature. I will probably end up sitting on this code and reusing it about once a year, so I end-up being a sub-mediocre data management person.  However, if time allows - I will try out your code option as well since as of now I also incorporate the general round distance code with all of its sines and cosines.

Also, I am determining distances based on zip codes, which is less ideal than using addresses. Does anyone know of a way to use addresses, and pull their longitudes and latitudes from SAS to get more exact distances?

H
Pyrite | Level 9 H
Pyrite | Level 9

PGSTAT,

Follow-up, I tried your code and it worked. I only had to switch the order of the datasets to get the nearest center to each of the observations listed in allcenters instead of selectCenters and change the data format to radians ("RM").

My follow-up question is I calculate the distances using

geodist function (you provided)

Great Circle Distance Formula (http://support.sas.com/kb/3/973.html)

DISTANCE = 4000*ARCOS(arc); arc = SIN(Y2)*SIN(Y1) + COS(Y2)*COS(Y1)*COS(X2-X1)

(http://support.sas.com/kb/3/973.html)

Do you or anyone else have any preferences for these three methods, I looked up one distance that I knew in my head to be about two miles by foot in real life (not based on zip code centers) and all three were 1.6158, 1.6078, and 162.81, miles respectively.

I think for all intents and purposes, I should be fine with any of the methods, but did not know if anyone knew if one was confirmed to be better (I am in Midwestern United States)?

PGStats
Opal | Level 21

Well, only birds and planes travel along great circles, and even they have to worry about winds Smiley Happy. Others will care more about getting an answer quickly. Given the reference quoted in GEODIST documentation, I would think that GEODIST is the most efficient approximation. It remains to be tested.

PG

PG
TomKari
Onyx | Level 15

Very interesting! I didn't know about the GEODIST function, but now that I do, I LIKE it!

I gave the three a try with two places I travel between, and my results were:

GEODIST function: 1202.8013791
Great Circle Distance: 1201.1238757
4000*ARCOS(arc); arc = SIN(Y2)*SIN(Y1) + COS(Y2)*COS(Y1)*COS(X2-X1): 1216.3310547

Since Note 3973 says that the great circle formula more accurately accounts for curvature, and since it's so close to GEODIST, I'd rule out formula #3. As the other two are within .2%, I doubt there's any way to tell which is more accurate!

Personally, I'd go with the GEODIST function, because if any improvements to the methodology are found, SAS will build them into the formula, and you'll automatically derive the benefit.

Tom

P.S. Unless you work for the U.S. DOD, in which case I suspect you have access to information that is MUCH MORE ACCURATE!!   😉

Linlin
Lapis Lazuli | Level 10

Hi Tom,

You are not alone, I didn’t know about the GEODIST function neither.Thanks to PG we all know nowSmiley Happy. I strongly suggest H mark PG’s answer as correct answer.

H
Pyrite | Level 9 H
Pyrite | Level 9

Well prior to my last post I had marked PG's post as helpful and "Like", but did not see a "Correct Answer" option. Please let me know where that option is and I will mark it accordingly. Or is marking it correct not an option since I marked Linlin's original post as correct?

TomKari
Onyx | Level 15

Yes, you can only mark one answer as "correct". I don't know if there's a way to switch it, but to be honest I wouldn't worry about it.

Tom

AnnaBrown
Community Manager

H,

You can certainly not worry about but if you'd like to give this a shot, this may work for you to ID the right answer/s:

-Unmark the initial "correct answer" from Linlin

-Unmark the "helpful answer" from PGStats

-Mark PGStats' answer as correct

-Mark Linlin's answer as helpful if you'd like

I think that should work...

Anna

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 18 replies
  • 5745 views
  • 12 likes
  • 6 in conversation