Manipulating Data in Base SAS® Part 3 – Deduplicate
Recent Library Articles
Recently in the SAS Community Library: Duplicates in data can badly skew the results of an analysis. @SASJedi demonstrates data deduplication using PROC SORT with the NODUPKEY, OUT=, and DUPOUT= options and PROC SQL and PROC FedSQL
proc sql;
create table t1
as
select *
from
t2
where
name like 'PNB%'
and rn in
(select distinct rn from
t4)
and
datepart(DATE) <= '7Dec2002:00:00:00'd
;quit; I have a very big dataset and its taking forever to run and many times connections disrupts and I need to wait for a long again. can you please suggest a better way to do it. Here table T2 is like very big taking almost an hour to complete the query if everything goes well or need to restart many times . Thank you in advance kajal
... View more
Hi SAS Communities,
We'll be soon migrating to SAS VIYA 4 any tips regarding logging?
Where can we see the logs for the services now is it still on the default location /opt/sas/viya/config/var/logs?
... View more
TEAM 1
TEAM 2
N+R
Resolved
TOTAL
N+R
Resolved
TOTAL
CYCLE
PRODUCT
Count
POS
POS%
Count
POS
POS%
Count
POS
Count
POS
POS%
Count
POS
POS%
Count
POS
xx
a
1
0.11
3.18%
7
0.50
14.91%
46
3.37
0
0
0.00%
2
0.10
16.32%
11
0.62
TOTAL
1
0.11
3.18%
7
0.50
14.91%
46
3.37
0
0
0.00%
2
0.10
16.32%
11
0.62
yy
a
0
0
0.00%
0
0
0.00%
3
0.48
0
0
0.00%
0
0
0.00%
2
0.38
TOTAL
0
0
0.00%
0
0
0.00%
3
0.48
0
0
0.00%
0
0
0.00%
2
0.38
zz
a
74
7.56
6.74%
289
32.58
29.06%
1059
112.10
21
2.86
9.21%
62
9.45
30.44%
244
31.05
TOTAL
74
7.56
6.74%
289
32.58
29.06%
1059
112.10
21
2.86
9.21%
62
9.45
30.44%
244
31.05
aa
a
1
0.03
0.71%
3
0.17
4.26%
46
4.02
1
0.03
0.81%
2
0.11
3.00%
38
3.51
TOTAL
1
0.03
0.71%
3
0.17
4.26%
46
4.02
1
0.03
0.81%
2
0.11
3.00%
38
3.51
TOTAL
a
76
7.69
6.41%
299
33.26
27.72%
1154
119.97
22
2.89
8.13%
66
9.66
27.17%
295
35.55
TOTAL
76
7.69
6.41%
299
33.26
27.72%
1154
119.97
22
2.89
8.13%
66
9.66
27.17%
295
35.55
This is an output I got using Proc Report. In the modification, I want to remove the column count wherever 'N+R'n Column occurs.
Also in this output, what is the blank space above column CYCLE and PRODUCT and is on the left of TEAM 1, as this column is generated in PROC REPORT, and I want to split those two blank cells into 4 as happening below in column CYCLE and PRODUCT so that I can format it into excel with other reports as I want to decrease the cell size of cycle, but cell size reduction should only be applied to this report and not on other reports which are also being printed on the same sheet. I have printed my PROC REPORT below.
Can I get a solution for both of my problems? Thank you!
Please let me know if it is difficult to understand.
PROC REPORT DATA=BKT2_HL_&mnth._&nextmnth.5(WHERE=('STATUS 2'N NE 'Not Resolved')) OUT=WO_NR;
COLUMNS CYCLE1 PRODUCT_CLASS FINAL_ALLOCATION, ("status 2"n,(COUNT TOTAL_POS POS_PER));
DEFINE CYCLE1/nozero 'CYCLE' ORDER=FORMATTED STYLE={width=90PT FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK JUST=CENTER BACKGROUND=WHITE FOREGROUND=BLACK } group STYLE(HEADER)={FONT_WEIGHT=BOLD FONTSIZE=2 BACKGROUND=#000080 FOREGROUND=WHITE};
DEFINE PRODUCT_CLASS/nozero 'PRODUCT' ORDER=FORMATTED STYLE={width=110PT FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK JUST=CENTER BACKGROUND=WHITE FOREGROUND=BLACK } group STYLE(HEADER)={FONT_WEIGHT=BOLD FONTSIZE=2 BACKGROUND=#000080 FOREGROUND=WHITE};
DEFINE FINAL_ALLOCATION/nozero STYLE={FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK} across '' STYLE(HEADER)={FONT_WEIGHT=BOLD FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK BACKGROUND=#000080 FOREGROUND=WHITE};
DEFINE "status 2"n/nozero STYLE={FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK} across '' STYLE(HEADER)={FONT_WEIGHT=BOLD FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK BACKGROUND=#FF6600 FOREGROUND=WHITE};
DEFINE TOTAL_POS/nozero STYLE={ FONTSIZE=2 JUST=CENTER VJUST=MIDDLE BORDERWIDTH=1 BORDERCOLOR=BLACK BACKGROUND=WHITE FOREGROUND=BLACK} 'POS' ANALYSIS SUM FORMAT=6.2 STYLE(HEADER)={FONT_WEIGHT=BOLD FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK BACKGROUND=#000080 FOREGROUND=WHITE};
DEFINE COUNT/nozero STYLE={FONTSIZE=2 JUST=CENTER VJUST=MIDDLE BORDERWIDTH=1 BORDERCOLOR=BLACK BACKGROUND=WHITE FOREGROUND=BLACK} 'Count' ANALYSIS SUM FORMAT=6. STYLE(HEADER)={FONT_WEIGHT=BOLD FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK BACKGROUND=#000080 FOREGROUND=WHITE};
DEFINE POS_SUM/NOPRINT;
DEFINE POS_PER/nozero 'POS%' ANALYSIS SUM FORMAT=PERCENT8.2 STYLE={ FONTSIZE=2 JUST=CENTER VJUST=MIDDLE BORDERWIDTH=1 BORDERCOLOR=BLACK BACKGROUND=WHITE FOREGROUND=BLACK} STYLE(HEADER)={FONT_WEIGHT=BOLD FONTSIZE=2 BORDERWIDTH=1 BORDERCOLOR=BLACK BACKGROUND=#000080 FOREGROUND=WHITE};
compute PRODUCT_CLASS;
if PRXMATCH('%TOTAL%',PRODUCT_CLASS) then
call define (_row_,"style","style={BORDERBOTTOMWIDTH=1 FONT_WEIGHT=BOLD background=#FF6600 FOREGROUND=WHITE}");
endcomp;
run;
... View more
I would like to use the same if statement across different variables. I have 31 of these variables and would like to avoid typing multiple IF statements. Is there a more elegant way to achieve this? Here is my code: data want;
set have;
%let ab = ab1-ab31;
if a1 = '0' and b1='0' then ab1='1';
else ab1='0';
if a2 = '0' and b2='0' then ab2='1';
else ab2='0';
if a3= '0' and b3='0' then ab3='1';
else ab3='0';
...
if a31 = '0' and b31='0' then ab31='1';
else ab31='0';
n_ab=sum((countc(cats(of &ab),'1')));
drop &ab;
run;
... View more
This is the code ods output Quartiles= median_ci(where=(percent=50));
proc lifetest data=test;
time duration*censor(1);
strata group;
run; It runs, but in results for quartile estimates it only shows complete data (point estimate, 95 CI) for the percent 25. There is no point estimate for 50 or 75. For this percents, it only shows the lower CI. Is this something to do with the data? Should I calculate the median and CI with another proc? The median and CI are to describe survival data. Any help is appreciated.
... View more