Re: first. and last.

Babloo · Posted 02-27-2015 04:51 AM

I have a raw data file with some 3K records. I need to find out customers with different names and same address.

I tried this code, but got note as follows.

data rawdata2;

set rawdata1; /*(my .csv which has name, address and zip)*/

if first.name and last.Address and last.zip_code;

run;

NOTE: Variable 'first.name'n is uninitialized.

NOTE: Variable 'last.Address'n is uninitialized.

NOTE: Variable 'last.zip_code'n is uninitialized.

can someone suggest me to accomplish this task?

RW9 · Posted 02-27-2015 05:00 AM

Your missing a:

by name address zip_code;

However, I would suggest you do:

proc sql;

create table WANT as

select distinct *

from HAVE

group by *

having count(*) gt 1;

quit;

Note, not tested the above as no data, but something along those lines, if you only want those that appear more than once. If you just want a nice list, then either distinct or proc sort with nodupkey will do the trick.

Babloo · Posted 02-27-2015 05:16 AM

I ran your code and the log as follows. Am I doing something wrong?

108 proc sql;

109 create table rawdata3 as

110 select distinct *

111 from rawdata1

112 group by Address,zip_code

113 having count(*) gt 1;

NOTE: The query requires remerging summary statistics back with the original data.

NOTE: Table WORK.RAWDATA3 created, with 0 rows and 13 columns.

114 quit;

NOTE: PROCEDURE SQL used (Total process time):

real time 0.00 seconds

cpu time 0.00 seconds

RW9 · Posted 02-27-2015 05:35 AM

Not sure about your data, the below code does work. Try moving the count() up after distinct *, and see what count its returning, maybe your group by only ever means there is one of each:

data have;

name="AA"; address="ERTY"; zip_code="TYRY"; output;

name="AA"; address="ERTY"; zip_code="DFGH"; output;

name="BB"; address="WESR"; zip_code="TLKJH"; output;

run;

proc sql;

create table WANT as

select distinct *

from HAVE

group by NAME,ADDRESS

having count(*) gt 1;

quit;

Babloo · Posted 02-27-2015 05:41 AM

Thanks again.

If I move the count(*) like below, then what aggregate function I should use in having clause.

proc sql;

create table rawdata3 as

select distinct count(*)

from rawdata1

group by Address,zip_code

having //*<insert aggregate function>*/ gt 1;

RW9 · Posted 02-27-2015 05:44 AM

Yes, I meant like this:

proc sql;

create table WANT as

select distinct *,

count(*) as CNT

from HAVE

group by NAME,ADDRESS;

quit;

From the resultant table you can see the distinct values and what they have counted to. I suspect your grouping is returning a count of 1 for each of them.

Babloo · Posted 02-27-2015 09:55 AM

My data is likethis,

name address zip

Ravi abc 123

Raj abc 123

Hari def 456

Kiran ghi 789

I need output like,

name address zip

Ravi abc 123

Raj abc 123

I need the output with same address and same zip with different names. All the variables are in character.

RW9 · Posted 02-27-2015 10:33 AM

Ok, slightly moved things around, this should work:

data have;

infile datalines;

input name $ address $ zip;

datalines;

Ravi abc 123

Raj abc 123

Hari def 456

;

run;

proc sql;

create table WANT as

select *

from (select distinct NAME,ADDRESS,ZIP from HAVE)

group by ADDRESS,ZIP

having count(NAME) > 1;

quit;

Babloo · Posted 02-28-2015 08:43 AM

Thanks a ton!

Can we do this in data step with first. and last.? Because I'm interested to get the solution via first. and last.

Kurt_Bremser · Posted 02-27-2015 05:05 AM

Read up on this:

SAS 9.2 Language Reference: Concepts, Second Edition: BY-Group Processing in the DATA Step

You are missing the basic concepts for this.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Tom · Posted 02-28-2015 11:15 AM

To use BY processing you need to:

SORT the data BY the variables of interest.

SET the data BY the variables of interest.

Then you can reference the FIRST. and LAST. flags.

Let's assume that you want records with more than one value of NAME for the same values of ADDRESS and ZIP.

data want ;

set have ;

by zip address name ;

if NOT (first.address and last.address);

run;

art297 · Posted 02-28-2015 11:45 AM

Tom's approach will almost get you what you want, but not quite. However, it will work if you use the proc sort do get rid of the exact matches. e.g.:

data have;

input name $ address $ zip;

datalines;

Ravi abc 123

Raj abc 123

Hari def 456

;

proc sort data=have out=want nodupkey;

by address zip name;

run;

data want;

set want;

by address zip;

if not (first.zip and last.zip);

run;

BharathBandi · Posted 03-04-2015 09:39 PM

proc sort data=have out=want;

by address;

run;

data want;

set want;

if first.address=last.address then delete;

run;

you can also write a proc sort with nodupkey with name in the by statement if you have same name and same address.

RyanHoward · Posted 03-06-2015 01:35 PM

Here is another variation..

proc sort data=rawdata1;

by zip_code address name;

run;

data differentNames_SameAddress;

set rawdata1; /*(my .csv which has name, address and zip)*/

by zip_code address name;

/* select records with more than one address */

if not (fist.address and and last.address) then do; ;

output differentNames_SameAddress;

end;

run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!