SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Create a new variable based on value of a specific observation in another variable

Accepted Solution Solved
Reply
Frequent Contributor
Frequent Contributor
Posts: 109
Accepted Solution

Create a new variable based on value of a specific observation in another variable

This seems like easy one, but I have not found the solution yet.

I want to create a new variable (numeric) based on the value of another variable (numeric) for a particular observation (numeric).

So I want to create x2 in the following:

Group     Group_Value          X2

1               3                            6

2               6                            6

3               1                            6

4               0                            6

5               1                            6

6               7                            6

7               2                            6

So the statement would says if group = 2 then x2 = the Group_value for that observation.  Thus X2 = 6 for all groups now because that is the group_value for group 2 that we told it to populate for all groups.

I can describe in more detail if needed, but felt the topic was simple enough that this should cover it.


Accepted Solutions
Solution
‎05-05-2014 03:40 PM
Super Contributor
Posts: 1,636

Re: Create a new variable based on value of a specific observation in another variable

data have;
input Group    : Group_Value;       
cards;

1               3                           
2               6                          
3               1                           
4               0                           
5               1                          
6               7                           
7               2
;
data _null_;
set have(where=(group=2));
call symputx('group2',group_value);

data want;
  set have;
  x2=&group2;
proc print;run;

View solution in original post


All Replies
Solution
‎05-05-2014 03:40 PM
Super Contributor
Posts: 1,636

Re: Create a new variable based on value of a specific observation in another variable

data have;
input Group    : Group_Value;       
cards;

1               3                           
2               6                          
3               1                           
4               0                           
5               1                          
6               7                           
7               2
;
data _null_;
set have(where=(group=2));
call symputx('group2',group_value);

data want;
  set have;
  x2=&group2;
proc print;run;

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Create a new variable based on value of a specific observation in another variable

Bingo. Thank you that worked. However, I will put a twist in this.  What if I wanted another variable to be created based another group=2 variable.

Also, I want eventually to created other values based on values for different groups see following:

Group     x1          y1            x2          y2          x3         y3

1               3        5               6           5             7          1

2               6        5               6           5             7          1

3               1        2               6           5             7          1

4               0        4               6           5             7          1

5               1        7               6           5             7          1

6               7        1               6           5             7          1

7               2        9               6           5             7          1

So I have group, x1 and y1 and x1 is pretty much the same thing as group_value in prior example.  Now I want x2 and y2 to = values of x1 and y1, respectively if group = 2  AND x3 and y3 to = values of x1 and y1, respectively for group = 6. I can add more information if needed.

Respected Advisor
Posts: 4,919

Re: Create a new variable based on value of a specific observation in another variable

Hi H. What is the ultimate goal of all these rules?

PG
Frequent Contributor
Frequent Contributor
Posts: 109

Re: Create a new variable based on value of a specific observation in another variable

Excellent question.  I have a list of zip codes for centers((buildings) aka the group variable), I also have their longitudes and latitudes via sashelp.zipcode. I am then going to calculate distances between all centers ~ 100 and a select list of ~ 10 centers. Next I want to determine how far away the closes of the 10 select centers are to each of the ~ 100 centers.

So what I am doing now is creating a longitude and latitudes variable for each of the ~ 10 select centers (x1-x10 and y1-y10), which I will then use to calculate distances with then create a new variable populated by the value in miles of the closes select center to each of the ~ 100 centers

center = a builiding location.


Frequent Contributor
Frequent Contributor
Posts: 109

Re: Create a new variable based on value of a specific observation in another variable

Thank you Linlin, after I finally inputted the correct names into the variables for my dataset, this seems to work perfectly.

Much appreciation!

Respected Advisor
Posts: 4,919

Re: Create a new variable based on value of a specific observation in another variable

Your approach seems overly complicated. If you have two datasets, allCenters and selectCenters, both with variables (Name, ZIP, lat, long), you can find the nearest centers in a single step:

proc sql;

create table nearest as

select

  s.name as selectName, 

  a.name as nearName,

  geodist(s.lat, s.long, a.lat, a.long, "M") as miles

from allCenters as a cross join selectCenters as s

where s.name ne a.name

group by s.name

having calculated miles = min(calculated miles);

select * from nearest;

quit;

PG

PG
Frequent Contributor
Frequent Contributor
Posts: 109

Re: Create a new variable based on value of a specific observation in another variable

Thank you to everyone, I was able to get my desired variable. I ended up using Linlin's last bit of code along with: comment by Dan at the following: http://blogs.sas.com/content/iml/2012/05/21/find-the-minumum-value-in-each-row/

Yes, PGSTAT, my data management skills usually leave me over complicating tasks. As I mentioned in a previous post, most of these tasks I perform so sporadically that they are not second nature. I will probably end up sitting on this code and reusing it about once a year, so I end-up being a sub-mediocre data management person.  However, if time allows - I will try out your code option as well since as of now I also incorporate the general round distance code with all of its sines and cosines.

Also, I am determining distances based on zip codes, which is less ideal than using addresses. Does anyone know of a way to use addresses, and pull their longitudes and latitudes from SAS to get more exact distances?

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Create a new variable based on value of a specific observation in another variable

PGSTAT,

Follow-up, I tried your code and it worked. I only had to switch the order of the datasets to get the nearest center to each of the observations listed in allcenters instead of selectCenters and change the data format to radians ("RM").

My follow-up question is I calculate the distances using

geodist function (you provided)

Great Circle Distance Formula (http://support.sas.com/kb/3/973.html)

DISTANCE = 4000*ARCOS(arc); arc = SIN(Y2)*SIN(Y1) + COS(Y2)*COS(Y1)*COS(X2-X1)

(http://support.sas.com/kb/3/973.html)

Do you or anyone else have any preferences for these three methods, I looked up one distance that I knew in my head to be about two miles by foot in real life (not based on zip code centers) and all three were 1.6158, 1.6078, and 162.81, miles respectively.

I think for all intents and purposes, I should be fine with any of the methods, but did not know if anyone knew if one was confirmed to be better (I am in Midwestern United States)?

Respected Advisor
Posts: 4,919

Re: Create a new variable based on value of a specific observation in another variable

Well, only birds and planes travel along great circles, and even they have to worry about winds Smiley Happy. Others will care more about getting an answer quickly. Given the reference quoted in GEODIST documentation, I would think that GEODIST is the most efficient approximation. It remains to be tested.

PG

PG
PROC Star
Posts: 1,167

Re: Create a new variable based on value of a specific observation in another variable

Very interesting! I didn't know about the GEODIST function, but now that I do, I LIKE it!

I gave the three a try with two places I travel between, and my results were:

GEODIST function: 1202.8013791
Great Circle Distance: 1201.1238757
4000*ARCOS(arc); arc = SIN(Y2)*SIN(Y1) + COS(Y2)*COS(Y1)*COS(X2-X1): 1216.3310547

Since Note 3973 says that the great circle formula more accurately accounts for curvature, and since it's so close to GEODIST, I'd rule out formula #3. As the other two are within .2%, I doubt there's any way to tell which is more accurate!

Personally, I'd go with the GEODIST function, because if any improvements to the methodology are found, SAS will build them into the formula, and you'll automatically derive the benefit.

Tom

P.S. Unless you work for the U.S. DOD, in which case I suspect you have access to information that is MUCH MORE ACCURATE!!   ;-)

Super Contributor
Posts: 1,636

Re: Create a new variable based on value of a specific observation in another variable

Hi Tom,

You are not alone, I didn’t know about the GEODIST function neither.Thanks to PG we all know nowSmiley Happy. I strongly suggest H mark PG’s answer as correct answer.

Frequent Contributor
Frequent Contributor
Posts: 109

Re: Create a new variable based on value of a specific observation in another variable

Well prior to my last post I had marked PG's post as helpful and "Like", but did not see a "Correct Answer" option. Please let me know where that option is and I will mark it accordingly. Or is marking it correct not an option since I marked Linlin's original post as correct?

PROC Star
Posts: 1,167

Re: Create a new variable based on value of a specific observation in another variable

Yes, you can only mark one answer as "correct". I don't know if there's a way to switch it, but to be honest I wouldn't worry about it.

Tom

Community Manager
Posts: 564

Re: Create a new variable based on value of a specific observation in another variable

H,

You can certainly not worry about but if you'd like to give this a shot, this may work for you to ID the right answer/s:

-Unmark the initial "correct answer" from Linlin

-Unmark the "helpful answer" from PGStats

-Mark PGStats' answer as correct

-Mark Linlin's answer as helpful if you'd like

I think that should work...

Anna

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 729 views
  • 12 likes
  • 6 in conversation