Can somone please help me in undrstanding fully the below sas command
data dups ;
set Work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code < 2 then output ;
proc print data=dups ;
run ;
It looks like a new data set "dups" is being created from "work.pgs" data set.
but what is happening next (by q1_code ; if first.q1_code + last.q1_code < 2 then output 😉 , I do not understand at all. and also it looks like there is a problem with the command as it would not work.. and the new data "dups" will be created but with 0 observations...
THanks very much in advance for your time and help.
The only case where first.q1_code + last.q1_code = 2 is when a q1_code group contains a single record. So this data step will eliminate q1_code groups with a single observation and keep the others in the new dups dataset.
Dear PG,
THanks so much for your kind and speedy reply. It is really helpful, I am now able to understand what they were trying to do over here.
I have one more concern regarding the same command. When I run this command
(data dups ; set pbs ; by q1_code ; if first.q1_code + last.q1_code < 2 then output ; proc print data=dups ; run ;), the new data set (dups) comes with 8 variables and zero observations. Which should not happen.
ALso in the command it says first.q1_code + last.q1_code <2 but not first.q1_code + last.q1_code = 2
Do you kindly add any further thoughts.
first.q1_code + last.q1_code = 0 when observation is not the first or the last
first.q1_code + last.q1_code = 1 when observation is the first but not the last OR is the last but not the first
first.q1_code + last.q1_code = 2 when observation is the first and the last
those are the only possible values for first.q1_code + last.q1_code.
If your dataset work.pbs does contain q1_code groups with more than one observations and you get an empty dataset, you should check the SAS log.
In fact, you should always chech the SAS log
Supply example data that illustrates your problem. Do so in a data step.
Let us suppose your input contains the 3 lines:
Q1_code
a - on this line: first.q1_code=1 last_q1_code=0
a - on this line: first.q1_code=0 last_q1_code=0
a - on this line: first.q1_code=0 last_q1_code=1
In case of only one single line Q1_CODE then
a
b - on this line: first.q1_code=1 last_q1_code=1
I hope this will help you understand the case.
Dear Advisors,
I have run the same sas code with slight modifications (highlighted in red) more than one times.. and here is the sas log statments that I receive for each of those command versions. Out of four, 3 times I get 0 observations in the new data set "dups". Only exception is when (first.q1_code + last.q1_code = 2 then output 😉 where the new data set (dups) contains exactly the same number of observations as was in case of "work.pbs". But the program that has been handed over to me uses (first.q1_code + last.q1_code < 2 then output 😉 and the whole analysis that has been done in the past by someone else is based on that, and when I try to rerun the program file to replicate th results a problem occurs and dups data comes up with 0 entry (can someone please interpret this for me).
data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code < 2 then output ;
NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):
data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 2 then output ;
NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 3476 observations and 8 variables.
NOTE: DATA statement used (Total process time):
data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 1 then output ;
NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):
data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 3 then output ;
NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):
A bettter way of writing may be:
data dups ; set work.pbs; by q1_code; if first.q1_code and last.q1_code then delete; run;
So remove records where observation is firsts and last - i.e. there is only one.
Dear Advisors,
I have run the same sas code with slight modifications (highlighted in red) more than one times.. and here is the sas log statments that I receive for each of those command versions. Out of four, 3 times I get 0 observations in the new data set "dups". Only exception is when (first.q1_code + last.q1_code = 2 then output 😉 where the new data set (dups) contains exactly the same number of observations as was in case of "work.pbs". But the program that has been handed over to me uses (first.q1_code + last.q1_code < 2 then output 😉 and the whole analysis that has been done in the past by someone else is based on that, and when I try to rerun the program file to replicate th results a problem occurs and dups data comes up with 0 entry (can someone please interpret this for me).
data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code < 2 then output ;
NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):
data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 2 then output ;
NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 3476 observations and 8 variables.
NOTE: DATA statement used (Total process time):
data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 1 then output ;
NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):
data dups ;
set work.pbs ;
by q1_code ;
if first.q1_code + last.q1_code = 3 then output ;
NOTE: There were 3476 observations read from the data set WORK.PBS.
NOTE: The data set WORK.DUPS has 0 observations and 8 variables.
NOTE: DATA statement used (Total process time):
Your dataset contains only one observation per q1_code. That's it. Since first. and last. are always true (=1), the sum of both is always 2.
Your second step shows that all q1_code groups contain single records. They were all first and last in their group.
Thanks so much PGStats,
So that simply means there are no dubplicate q1_code records?
@healtheconomist wrote:
Thanks so much PGStats,
So that simply means there are no dubplicate q1_code records?
Aah, yes?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.