I have a raw data file with some 3K records. I need to find out customers with different names and same address.
I tried this code, but got note as follows.
data rawdata2;
set rawdata1; /*(my .csv which has name, address and zip)*/
if first.name and last.Address and last.zip_code;
run;
NOTE: Variable 'first.name'n is uninitialized.
NOTE: Variable 'last.Address'n is uninitialized.
NOTE: Variable 'last.zip_code'n is uninitialized.
can someone suggest me to accomplish this task?
Your missing a:
by name address zip_code;
However, I would suggest you do:
proc sql;
create table WANT as
select distinct *
from HAVE
group by *
having count(*) gt 1;
quit;
Note, not tested the above as no data, but something along those lines, if you only want those that appear more than once. If you just want a nice list, then either distinct or proc sort with nodupkey will do the trick.
I ran your code and the log as follows. Am I doing something wrong?
108 proc sql;
109 create table rawdata3 as
110 select distinct *
111 from rawdata1
112 group by Address,zip_code
113 having count(*) gt 1;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: Table WORK.RAWDATA3 created, with 0 rows and 13 columns.
114 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
Not sure about your data, the below code does work. Try moving the count() up after distinct *, and see what count its returning, maybe your group by only ever means there is one of each:
data have;
name="AA"; address="ERTY"; zip_code="TYRY"; output;
name="AA"; address="ERTY"; zip_code="DFGH"; output;
name="BB"; address="WESR"; zip_code="TLKJH"; output;
run;
proc sql;
create table WANT as
select distinct *
from HAVE
group by NAME,ADDRESS
having count(*) gt 1;
quit;
Thanks again.
If I move the count(*) like below, then what aggregate function I should use in having clause.
proc sql;
create table rawdata3 as
select distinct count(*)
from rawdata1
group by Address,zip_code
having //*<insert aggregate function>*/ gt 1;
Yes, I meant like this:
proc sql;
create table WANT as
select distinct *,
count(*) as CNT
from HAVE
group by NAME,ADDRESS;
quit;
From the resultant table you can see the distinct values and what they have counted to. I suspect your grouping is returning a count of 1 for each of them.
My data is likethis,
name address zip
Ravi abc 123
Raj abc 123
Hari def 456
Hari def 456
Kiran ghi 789
I need output like,
name address zip
Ravi abc 123
Raj abc 123
I need the output with same address and same zip with different names. All the variables are in character.
Ok, slightly moved things around, this should work:
data have;
infile datalines;
input name $ address $ zip;
datalines;
Ravi abc 123
Raj abc 123
Hari def 456
Hari def 456
;
run;
proc sql;
create table WANT as
select *
from (select distinct NAME,ADDRESS,ZIP from HAVE)
group by ADDRESS,ZIP
having count(NAME) > 1;
quit;
Thanks a ton!
Can we do this in data step with first. and last.? Because I'm interested to get the solution via first. and last.
Read up on this:
SAS 9.2 Language Reference: Concepts, Second Edition: BY-Group Processing in the DATA Step
You are missing the basic concepts for this.
To use BY processing you need to:
SORT the data BY the variables of interest.
SET the data BY the variables of interest.
Then you can reference the FIRST. and LAST. flags.
Let's assume that you want records with more than one value of NAME for the same values of ADDRESS and ZIP.
data want ;
set have ;
by zip address name ;
if NOT (first.address and last.address);
run;
Tom's approach will almost get you what you want, but not quite. However, it will work if you use the proc sort do get rid of the exact matches. e.g.:
data have;
input name $ address $ zip;
datalines;
Ravi abc 123
Raj abc 123
Hari def 456
Hari def 456
;
proc sort data=have out=want nodupkey;
by address zip name;
run;
data want;
set want;
by address zip;
if not (first.zip and last.zip);
run;
proc sort data=have out=want;
by address;
run;
data want;
set want;
if first.address=last.address then delete;
run;
you can also write a proc sort with nodupkey with name in the by statement if you have same name and same address.
Here is another variation..
proc sort data=rawdata1;
by zip_code address name;
run;
data differentNames_SameAddress;
set rawdata1; /*(my .csv which has name, address and zip)*/
by zip_code address name;
/* select records with more than one address */
if not (fist.address and and last.address) then do; ;
output differentNames_SameAddress;
end;
run;
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.