BookmarkSubscribeRSS Feed
healtheconomist
Fluorite | Level 6


Can somone please help me in undrstanding fully the below sas command

 

data dups ;
set Work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code < 2 then output ;
proc print data=dups ;
run ;

 

It looks like a new data set "dups" is being created from "work.pgs" data set.

but what is happening next (by q1_code ; if first.q1_code + last.q1_code < 2 then output 😉 , I do not understand at all. and also it looks like there is a problem with the command as it would not work.. and the new data "dups" will be created but with 0 observations...

 

THanks very much in advance  for your time and help. 

12 REPLIES 12
PGStats
Opal | Level 21

The only case where first.q1_code + last.q1_code = 2 is when a q1_code group contains a single record. So this data step will eliminate q1_code groups with a single observation and keep the others in the new dups dataset.

PG
healtheconomist
Fluorite | Level 6

Dear PG,

 

THanks so much for your kind and speedy reply. It is really helpful, I am now able to understand what they were trying to do over here. 

 

I have one more concern regarding the same command. When I run this command

(data dups ; set pbs ; by q1_code ; if first.q1_code + last.q1_code < 2 then output ; proc print data=dups ; run ;), the new data set (dups) comes with 8 variables and zero observations. Which should not happen.

 

ALso in the command it says first.q1_code + last.q1_code <2  but not first.q1_code + last.q1_code = 2 

 

Do you kindly add any further thoughts. 

 

PGStats
Opal | Level 21

first.q1_code + last.q1_code = 0 when observation is not the first or the last

first.q1_code + last.q1_code = 1 when observation is the first but not the last OR is the last but not the first

first.q1_code + last.q1_code = 2 when observation is the first and the last

 

those are the only possible values for first.q1_code + last.q1_code.

If your dataset work.pbs does contain q1_code groups with more than one observations and you get an empty dataset, you should check the SAS log. 

 

In fact, you should always chech the SAS log Smiley Happy

PG
Shmuel
Garnet | Level 18

Let us suppose your input contains the 3 lines:

   Q1_code

      a     - on this line: first.q1_code=1      last_q1_code=0

      a     - on this line: first.q1_code=0      last_q1_code=0

      a     - on this line: first.q1_code=0      last_q1_code=1

 

In case of only one single line Q1_CODE then

     a

     b       - on this line: first.q1_code=1      last_q1_code=1

     

I hope this will help you understand the case.

healtheconomist
Fluorite | Level 6

Dear Advisors,

 

I have run the same sas code with slight modifications (highlighted in red) more than one times.. and here is the sas log statments that I receive for each of those command versions. Out of four, 3 times I get 0 observations in the new data set "dups". Only exception is when (first.q1_code + last.q1_code = 2 then output 😉 where the new data set (dups) contains exactly the same number of observations as was in case of "work.pbs". But the program that has been handed over to me uses (first.q1_code + last.q1_code < 2 then output 😉 and the whole analysis that has been done in the past by someone else is based on that, and when I try to rerun the program file to replicate th results a problem occurs and dups data comes up with 0 entry (can someone please interpret this for me). 

 

data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code < 2 then output ;

 

NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):

 

data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 2 then output ;

NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 3476 observations and 8 variables.
NOTE: DATA statement used (Total process time):

 


data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 1 then output ;

NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):

 

 data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 3 then output ;

 

NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):

RW9
Diamond | Level 26 RW9
Diamond | Level 26

A bettter way of writing may be:

data dups ;
  set work.pbs;
  by q1_code;
  if first.q1_code and last.q1_code then delete;
run;

So remove records where observation is firsts and last - i.e. there is only one.

healtheconomist
Fluorite | Level 6

Dear Advisors,

 

I have run the same sas code with slight modifications (highlighted in red) more than one times.. and here is the sas log statments that I receive for each of those command versions. Out of four, 3 times I get 0 observations in the new data set "dups". Only exception is when (first.q1_code + last.q1_code = 2 then output 😉 where the new data set (dups) contains exactly the same number of observations as was in case of "work.pbs". But the program that has been handed over to me uses (first.q1_code + last.q1_code < 2 then output 😉 and the whole analysis that has been done in the past by someone else is based on that, and when I try to rerun the program file to replicate th results a problem occurs and dups data comes up with 0 entry (can someone please interpret this for me). 

 

data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code < 2 then output ;

 

NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):

 

data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 2 then output ;

NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 3476 observations and 8 variables.
NOTE: DATA statement used (Total process time):

 


data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 1 then output ;

NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):

 

 data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 3 then output ;

 

NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):

 

PGStats
Opal | Level 21

Your second step shows that all q1_code groups contain single records. They were all first and last in their group.

PG
healtheconomist
Fluorite | Level 6

Thanks so much PGStats,

 

So that simply means there are no dubplicate q1_code records?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 802 views
  • 5 likes
  • 5 in conversation