data have;
length jobs $ 100;
input jobs $;
datalines;
11,2,11,2
11,2
2,11,2
2,11,3,11,4,11
;
run;
From the dataset given, May I know how to add a column that has this logic:
1. If a number appeared only once, it will be removed
2. If a number appeared more than once, it will be retained BUT it will only appear once
To explain further, the output of the added column should be like this:
data want;
length final $ 100;
input final $;
datalines;
11,2
.
2
11
;
run;
The expected output should be like this:
data want;
length final $ 100;
input final $;
datalines;
11,2
.
2
11
;
run;
data have;
length jobs $ 100 job $5;
input jobs $;
n = _N_;
do i = 1 to countw(jobs,',');
job = scan(jobs,i,',');
output;
end;
keep n job;
datalines;
11,2,11,2
11,2
2,11,2
2,11,3,11,4,11
;
proc sort
data=have
dupout=intermediate
nodupkey
;
by n job;
run;
proc sort
data=intermediate
nodupkey
;
by n job;
run;
data want;
set intermediate;
by n;
length jobs $100;
retain jobs;
if first.n
then jobs = job;
else jobs = catx(',',jobs,job);
if last.n;
drop job;
run;
Although I would recommend to keep the long dataset layout instead.
OK, that was the old code.
See this restructured code:
data have;
length jobs $ 100 job $5;
input jobs $;
n = _N_;
datalines;
11,2,11,2
11,2
2,11,2
2,11,3,11,4,11
;
data int1;
set have;
do i = 1 to countw(jobs,',');
job = scan(jobs,i,',');
output;
end;
keep n job;
run;
proc sort
data=int1
dupout=int2
nodupkey
;
by n job;
run;
proc sort
data=int2
nodupkey
;
by n job;
run;
data want;
merge
int2
have (keep=n)
;
by n;
length jobs $100;
retain jobs;
if first.n
then jobs = job;
else jobs = catx(',',jobs,job);
if last.n;
drop job;
run;
proc print data=want noobs;
run;
Result:
n jobs 1 11,2 2 3 2 4 11
data have;
length jobs $ 100 job $5;
input jobs $;
n = _N_;
do i = 1 to countw(jobs,',');
job = scan(jobs,i,',');
output;
end;
keep n job;
datalines;
11,2,11,2
11,2
2,11,2
2,11,3,11,4,11
;
proc sort
data=have
dupout=intermediate
nodupkey
;
by n job;
run;
proc sort
data=intermediate
nodupkey
;
by n job;
run;
data want;
set intermediate;
by n;
length jobs $100;
retain jobs;
if first.n
then jobs = job;
else jobs = catx(',',jobs,job);
if last.n;
drop job;
run;
You can join the result back to have by n.
data have; input jobs $40.; n = _N_; do i = 1 to countw(jobs,','); job = scan(jobs,i,','); output; end; keep n job; cards; 11,2,11,2 11,2 2,11,2 2,11,3,11,4,11 ; proc freq data=have noprint; table n*job/out=freq list; run; data freq; set freq; if count=1 then call missing(job); run; data want; do until(last.n); set freq; by n; length jobs $ 200; jobs=catx(',',jobs,job); end; keep n jobs; run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.