BookmarkSubscribeRSS Feed
kmardinian
Quartz | Level 8

Hi, I am working with the below dataset, and I am trying to add in the column called Location based on the Subject, visit, and area. I want the first are to listed as Location 1 no matter what it is (COL or IL). I was thinking of using Proc SQL, but I couldn't figure out how to organize it

 

Subject    Visit    Image      Area             Location

101          Visit 1   A            COL                 1

101          Visit 1   B            COL                 1

102          Visit 1   A            IL                     1

102          Visit 2   C            COL                 2

103          Visit 1    C           IL                     1

103          Visit 1    B           COL                 2

104          Visit 1    A           IL                     1

104          Visit 2    B           IL                     1

8 REPLIES 8
PaigeMiller
Diamond | Level 26

@kmardinian wrote:

Hi, I am working with the below dataset, and I am trying to add in the column called Location based on the Subject, visit, and area. I want the first are to listed as Location 1 no matter what it is (COL or IL). I was thinking of using Proc SQL, but I couldn't figure out how to organize it

 

Subject    Visit    Image      Area             Location

101          Visit 1   A            COL                 1

101          Visit 1   B            COL                 1

102          Visit 1   A            IL                     1

102          Visit 2   C            COL                 2

103          Visit 1    C           IL                     1

103          Visit 1    B           COL                 2

104          Visit 1    A           IL                     1

104          Visit 2    B           IL                     1


It's not clear to me from your description how Location is determined for all of the cases shown.

 

In any event, a DATA step ought to get the job done.

--
Paige Miller
kmardinian
Quartz | Level 8

Hi PaigeMiller,

 

My location is based on the AREA and VISIT variable for each subject. Basically, whatever the AREA is for Visit 1, that would be assigned Location 1. If for the next Visit 1 or Visit 2, the AREA variable is different, then Location would be 2.

 

If the visits have the same AREA for one subject, I would expect them to all be assigned Location 1. Does that make sense?

ballardw
Super User

Forcing order of operations in SQL is often quite a bit of tedious work.

This almost sounds like you may want to create a subset of the data and then update your existing data with that set.

 

An  example of your starting data and the actual desired output is really a very good idea. I am not quite sure exactly what you want for output.

 

something like this maybe?

data want;
   set have;
   by subject visit ;
 
   if first.visit then location=1;
   run;
kmardinian
Quartz | Level 8

Hi ballard, my starting data is the variables I showed without the location variable. I want my output to have the variable LOCATION in it. I just want to assign Location 1 for the first area that corresponds to visit ID.

 

I had tried it this way with a data step, but it only assigns one to the first visit and I am trying to assign 1,2,3 etc. depending on whether it is Visit 1, Visit 2, Visit 3, etc. But if Visit 1 has the same AREA, then I want Location to be 1 and 1

data want;
   set have;
   by subject visit ;
 
   if first.visit then location=1;
   run;

 

ballardw
Super User

@kmardinian wrote:

Hi ballard, my starting data is the variables I showed without the location variable. I want my output to have the variable LOCATION in it. I just want to assign Location 1 for the first area that corresponds to visit ID.

 

I had tried it this way with a data step, but it only assigns one to the first visit and I am trying to assign 1,2,3 etc. depending on whether it is Visit 1, Visit 2, Visit 3, etc. But if Visit 1 has the same AREA, then I want Location to be 1 and 1

data want;
   set have;
   by subject visit ;
 
   if first.visit then location=1;
   run;

 


Compare what you just said about assigning when visit changes to the first description of your problem.

Try something like:

data want;
   set have;
   by subject visit ;
   retain location;
   if first.subject then location=1;
   else if first.visit then location+1;

run;
kmardinian
Quartz | Level 8

The problem with that is if a SUBJECT has two Visit 1 records and two different AREAS, a COL and a IL, then it assigns them both Location=1. And I would like the second AREA, regardless of Visit to be assigned Location=2

ballardw
Super User

@kmardinian wrote:

The problem with that is if a SUBJECT has two Visit 1 records and two different AREAS, a COL and a IL, then it assigns them both Location=1. And I would like the second AREA, regardless of Visit to be assigned Location=2


So what is the order of importance in assigning location Visit number OR Area? You are not being very consistent in what the hierarchy of order is might be. I can't tell whether you are even considering the values or not at this stage.

Which again is different than your original description.

 

 

Please provide example data in the form of a data step so we can test code with your data. And I think you need to provide a more complex example of data as it seems like you may have more going on then your initial example shows.

 

Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the {i} icon or attached as text to show exactly what you have and that we can test code against.

kmardinian
Quartz | Level 8
data WORK.ALL;
  infile datalines dsd truncover;
  input Subject:$8. Visit:$15. Image:$8. Site:$15. Area:$7.;
datalines;
1102 Visit1 233B 14 NA
1102 Visit2 435A 14 NA
1103 Visit1 463A 14 NA
1103 Visit1 345B 14 NA
1104 Visit1 345A 14 COL
1104 Visit1 345B 14 NA
1106 Visit1 567A 14 IL
1106 Visit1 866A 14 COL
1107 Visit1 461A 14 IL
1107 Visit2 356A 14 NA
;;;;
NOTE: There were 10 observations read from the data set WORK.ALL.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
      

I think I over complicated this in everyone's mind. I only want to assign LOCATION based on the AREA variable. So for Subject 1102, the first AREA is "NA", so this would be Location=1. Then the same subject has another record where the AREA is still NA, so this would still be Location=1 (because they are the same area, therefore same location)

 

For Subject 1107, they contain two different AREAS (IL and NA), I would want the first AREA to be assigned Location=1 and the second AREA to be Location=2, because they are different.

 

I have already sorted based on SUBJECT and VISIT, so the AREA variable is already in the order that I want it, now I just need to assign Location 1 and 2, so I can differentiate the Image IDs later.

 

Does that make more sense?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 5200 views
  • 0 likes
  • 3 in conversation