BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fierceanalytics
Obsidian | Level 7

This is basic BASE data step. The original code was complex. The example here should get the point across. 

/******************************************************/

data selectx; input varname $ countx ;

datalines

AA1 1

AA1 2

AA1 3

AA1 4

AA1 5

AA1 6

AA1 7

AA1 8

AA2 2

AA2 2

AA2 2

AA3 1

AA3 2

AA3 3

AA3 4

;

proc sort; by varnames countx; run;

data setx ; set selectx; by varnames countx; if first.varnames;

run; /********this is what we wanted. everything so far is  simple */

 

/*************************Below is interesting. I was reading long code authored by a colleague****************/

/** Instead of the above. below is in place**/

....

if first.countx =1;

.......

/*****************/

 

If we run the example (you can verify with your own data), we know "if first.countx=1" is not the same as "if first.varnames". 

Trouble is SAS log does NOT flag it in any way. It just gives out something different. 

 

My questions:

1. My understanding is: when more than 1 variable are listed at BY, only the first variable, varnames, is supported for first. and last. device. Is this not correct or is this outdated ? since SAS evolves like others. And it does not flag if first.countx=1 as error or warning. I suppose that means in the process vector first.countx somehow does exist for other reasons? Because 'if first.countx=1' does end up doing the same as "proc sort nodupkeys; by varnames and countx'. Is this by design another intended result or just data driven coincidence? If this is by design, that means first. is already extended beyond the first variable, right? Just to do something else, which is just fine. 

2. If support is indeed already extended to variables beyond the first one on the By list, where is the documentation that says and showcases this? This should remain within the confine of BASE or EG. 

 

Thank you. 

Jia

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@fierceanalytics wrote:

 

My questions:

1. My understanding is: when more than 1 variable are listed at BY, only the first variable, varnames, is supported for first. and last. device. Is this not correct or is this outdated ? since SAS evolves like others. And it does not flag if first.countx=1 as error or warning. I suppose that means in the process vector first.countx somehow does exist for other reasons? Because 'if first.countx=1' does end up doing the same as "proc sort nodupkeys; by varnames and countx'. Is this by design another intended result or just data driven coincidence? If this is by design, that means first. is already extended beyond the first variable, right? Just to do something else, which is just fine. 

2. If support is indeed already extended to variables beyond the first one on the By list, where is the documentation that says and showcases this? This should remain within the confine of BASE or EG. 


You might consider actually creating permanent variables of the first and last to see how this actually works though since you have no repeats of countx within varname it isn't as helpful. The following code is not attempting to solve your logic issue, just to show the values of the first and last created variables so you can follow along and see if your logic matches the values you attempted to use.

data selectx; 
   input varname $ countx ;
datalines ;
AA1 1
AA1 2
AA1 3
AA1 4
AA1 5
AA1 6
AA1 7
AA1 8
AA2 2
AA2 2
AA2 2
AA3 1
AA3 2
AA3 3
AA3 4
;
proc sort data=selectx; 
   by varname countx; 
run;
data firstdemo ; 
   set selectx; 
   by varname countx; 
   firstvarname= first.varname;
   lastvarname = last.varname;
   firstcountx = first.countx;
   lastcountx  = last.countx;
run;

View solution in original post

5 REPLIES 5
Shmuel
Garnet | Level 18

Is it a typo -  reading datalines you have variable named varname  

while in sort and the following step you refer to varnames  ?

 

1) You can edit your post (open menu given by 3 vertical dots) and fix your code.

2) Please post the wanted results.

 

first.varname will result with: AA1 AA2 AA3 ... , while

first.countx will remove duplicates of observation having the samr (varname countx).

fierceanalytics
Obsidian | Level 7
You can disregard the typo. Thanks
fierceanalytics
Obsidian | Level 7

Thanks, Kurt. So it is 'official' that first. are populated for all the variables listed. I thereby update my knowledge. Thank you. 

 

Although 'if first.countx=1' is not a valid alternative for 'if first.varnames' for my app, it has potential for other usage, and it should be supported if using first. on additional variables on the By list? I am going to test a bit more to see where 'if first.countx=1' persists as alternative to  'proc sort nodupkeys; by varnames countx; run;" It looks cute that way, with the benefit of shrinking the data set fast., although it may  end up not sorted any more.  

 

I will leave this thread until next Tuesday to if others have more input. I will then declare it satisfactory solution. I did a lot of python and R work in the past years and now return to SAS. SAS is refreshing today.

 

Regards

Jia

Kurt_Bremser
Super User

Every variable in the BY statement gets a first. and last. indicator variable.

The order in the BY creates a hierarchy from left to right. A first. or last. further to the left implies a first. or last. for everything to the right.

ballardw
Super User

@fierceanalytics wrote:

 

My questions:

1. My understanding is: when more than 1 variable are listed at BY, only the first variable, varnames, is supported for first. and last. device. Is this not correct or is this outdated ? since SAS evolves like others. And it does not flag if first.countx=1 as error or warning. I suppose that means in the process vector first.countx somehow does exist for other reasons? Because 'if first.countx=1' does end up doing the same as "proc sort nodupkeys; by varnames and countx'. Is this by design another intended result or just data driven coincidence? If this is by design, that means first. is already extended beyond the first variable, right? Just to do something else, which is just fine. 

2. If support is indeed already extended to variables beyond the first one on the By list, where is the documentation that says and showcases this? This should remain within the confine of BASE or EG. 


You might consider actually creating permanent variables of the first and last to see how this actually works though since you have no repeats of countx within varname it isn't as helpful. The following code is not attempting to solve your logic issue, just to show the values of the first and last created variables so you can follow along and see if your logic matches the values you attempted to use.

data selectx; 
   input varname $ countx ;
datalines ;
AA1 1
AA1 2
AA1 3
AA1 4
AA1 5
AA1 6
AA1 7
AA1 8
AA2 2
AA2 2
AA2 2
AA3 1
AA3 2
AA3 3
AA3 4
;
proc sort data=selectx; 
   by varname countx; 
run;
data firstdemo ; 
   set selectx; 
   by varname countx; 
   firstvarname= first.varname;
   lastvarname = last.varname;
   firstcountx = first.countx;
   lastcountx  = last.countx;
run;

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 2565 views
  • 0 likes
  • 4 in conversation