data have;
length jobs $ 100;
input jobs $;
datalines;
11,2,11,2
11,2
2,11,2
2,11,3,11,4,11
;
run;
From the dataset given, May I know how to add a column that has this logic:
1. If a number appeared only once, it will be removed
2. If a number appeared more than once, it will be retained BUT it will only appear once
To explain further, the output of the added column should be like this:
data want;
length final $ 100;
input final $;
datalines;
11,2
.
2
11
;
run;
The expected output should be like this:
data want;
length final $ 100;
input final $;
datalines;
11,2
.
2
11
;
run;
data have;
length jobs $ 100 job $5;
input jobs $;
n = _N_;
do i = 1 to countw(jobs,',');
job = scan(jobs,i,',');
output;
end;
keep n job;
datalines;
11,2,11,2
11,2
2,11,2
2,11,3,11,4,11
;
proc sort
data=have
dupout=intermediate
nodupkey
;
by n job;
run;
proc sort
data=intermediate
nodupkey
;
by n job;
run;
data want;
set intermediate;
by n;
length jobs $100;
retain jobs;
if first.n
then jobs = job;
else jobs = catx(',',jobs,job);
if last.n;
drop job;
run;
Although I would recommend to keep the long dataset layout instead.
OK, that was the old code.
See this restructured code:
data have;
length jobs $ 100 job $5;
input jobs $;
n = _N_;
datalines;
11,2,11,2
11,2
2,11,2
2,11,3,11,4,11
;
data int1;
set have;
do i = 1 to countw(jobs,',');
job = scan(jobs,i,',');
output;
end;
keep n job;
run;
proc sort
data=int1
dupout=int2
nodupkey
;
by n job;
run;
proc sort
data=int2
nodupkey
;
by n job;
run;
data want;
merge
int2
have (keep=n)
;
by n;
length jobs $100;
retain jobs;
if first.n
then jobs = job;
else jobs = catx(',',jobs,job);
if last.n;
drop job;
run;
proc print data=want noobs;
run;
Result:
n jobs 1 11,2 2 3 2 4 11
data have;
length jobs $ 100 job $5;
input jobs $;
n = _N_;
do i = 1 to countw(jobs,',');
job = scan(jobs,i,',');
output;
end;
keep n job;
datalines;
11,2,11,2
11,2
2,11,2
2,11,3,11,4,11
;
proc sort
data=have
dupout=intermediate
nodupkey
;
by n job;
run;
proc sort
data=intermediate
nodupkey
;
by n job;
run;
data want;
set intermediate;
by n;
length jobs $100;
retain jobs;
if first.n
then jobs = job;
else jobs = catx(',',jobs,job);
if last.n;
drop job;
run;
You can join the result back to have by n.
data have; input jobs $40.; n = _N_; do i = 1 to countw(jobs,','); job = scan(jobs,i,','); output; end; keep n job; cards; 11,2,11,2 11,2 2,11,2 2,11,3,11,4,11 ; proc freq data=have noprint; table n*job/out=freq list; run; data freq; set freq; if count=1 then call missing(job); run; data want; do until(last.n); set freq; by n; length jobs $ 200; jobs=catx(',',jobs,job); end; keep n jobs; run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.