BookmarkSubscribeRSS Feed
Babloo
Rhodochrosite | Level 12

I have a raw data file with some 3K records. I need to find out customers with different names and same address.

I tried this code, but got note as follows.

       data rawdata2;

         set rawdata1;  /*(my .csv which has name, address and zip)*/

        if first.name and last.Address and last.zip_code;

         run;

NOTE: Variable 'first.name'n is uninitialized.

NOTE: Variable 'last.Address'n is uninitialized.

NOTE: Variable 'last.zip_code'n is uninitialized.

can someone suggest me to accomplish this task?

13 REPLIES 13
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Your missing a:

by name address zip_code;

However, I would suggest you do:

proc sql;

     create table WANT as

     select     distinct *

     from        HAVE

     group by  *

     having     count(*) gt 1;

quit;

Note, not tested the above as no data, but something along those lines, if you only want those that appear more than once.  If you just want a nice list, then either distinct or proc sort with nodupkey will do the trick.

Babloo
Rhodochrosite | Level 12

I ran your code and the log as follows. Am I doing something wrong?

108        proc sql;

109         create table rawdata3 as

110         select     distinct *

111         from        rawdata1

112         group by  Address,zip_code

113         having     count(*) gt 1;

NOTE: The query requires remerging summary statistics back with the original data.

NOTE: Table WORK.RAWDATA3 created, with 0 rows and 13 columns.

114        quit;

NOTE: PROCEDURE SQL used (Total process time):

      real time           0.00 seconds

      cpu time            0.00 seconds

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Not sure about your data, the below code does work.  Try moving the count() up after distinct *, and see what count its returning, maybe your group by only ever means there is one of each:

data have;

  name="AA"; address="ERTY"; zip_code="TYRY"; output;

  name="AA"; address="ERTY"; zip_code="DFGH"; output;

  name="BB"; address="WESR"; zip_code="TLKJH"; output;

run;

proc sql;

  create table WANT as

  select  distinct *

  from   HAVE

  group by NAME,ADDRESS

  having count(*) gt 1;

quit;

Babloo
Rhodochrosite | Level 12

Thanks again.

If I move the count(*) like below, then what aggregate function I should use in having clause.

        proc sql;

        create table rawdata3 as

         select     distinct count(*)

       from        rawdata1

         group by  Address,zip_code

         having  //*<insert aggregate function>*/   gt 1;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, I meant like this:

proc sql;

  create table WANT as

  select  distinct *,

          count(*) as CNT

  from   HAVE

  group by NAME,ADDRESS;

quit;

From the resultant table you can see the distinct values and what they have counted to.  I suspect your grouping is returning a count of 1 for each of them. 

Babloo
Rhodochrosite | Level 12

My data is likethis,

name address zip

Ravi  abc 123

Raj    abc  123

Hari   def   456

Hari   def   456

Kiran  ghi   789

I need output like,

name address zip

Ravi  abc 123

Raj    abc  123

I need the output with same address and same zip with different names. All the variables are in character.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ok, slightly moved things around, this should work:

data have;

  infile datalines;

  input name $ address $ zip;

datalines;

Ravi  abc 123

Raj    abc  123

Hari   def   456

Hari   def   456

;

run;

proc sql;

  create table WANT as

  select  *

  from   (select distinct NAME,ADDRESS,ZIP from HAVE)

  group by ADDRESS,ZIP

  having count(NAME) > 1;

quit;

Babloo
Rhodochrosite | Level 12

Thanks a ton!

Can we do this in data step with first. and last.? Because I'm interested to get the solution via first. and last.

Tom
Super User Tom
Super User

To use BY processing you need to:

SORT the data BY the variables of interest.

SET the data BY the variables of interest.

Then you can reference the FIRST. and LAST. flags.

Let's assume that you want records with more than one value of NAME for the same values of ADDRESS and ZIP.

data want ;

  set have ;

  by zip address name ;

  if NOT (first.address and last.address);

run;

art297
Opal | Level 21

Tom's approach will almost get you what you want, but not quite. However, it will work if you use the proc sort do get rid of the exact matches. e.g.:

data have;

  input name $ address $ zip;

  datalines;

Ravi  abc 123

Raj    abc  123

Hari   def   456

Hari   def   456

;

proc sort data=have out=want nodupkey;

  by address zip name;

run;

data want;

  set want;

  by address zip;

  if not (first.zip and last.zip);

run;

BharathBandi
Calcite | Level 5

proc sort data=have out=want;

     by address;

run;

data want;

     set want;

if first.address=last.address then delete;

run;

you can also write a proc sort with nodupkey with name in the by statement if you have same name and same address. 

RyanHoward
Calcite | Level 5

Here is another variation..

proc sort data=rawdata1;

    by zip_code address name;

run;

data differentNames_SameAddress;

     set rawdata1;  /*(my .csv which has name, address and zip)*/

     by zip_code address name;

     /* select records with more than one address */

     if not (fist.address and and last.address) then do; ;

         output differentNames_SameAddress;    

     end;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2866 views
  • 7 likes
  • 7 in conversation