turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Creating an alternative table from unique counts a...

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-24-2017 05:40 AM

Hello. I have a dataset that contains 5 variables. "ID" is a unique identifier of a person in my dataset. "Gender" is male (1) or female (2). AgeRange contains only 1 through 6, which indicates which age range they fall into based on a categorical scheme for my report. "Screen" is if they were ever medically screened. "Status" has 3 entries, positive, negative, or blank. It is blank when the patient has not been screened. Here is the sample dataset:

```
data JANFEB.SAMPLEUNIQUE;
infile datalines dsd truncover;
input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
;;;;
```

How do I create a table from this dataset that groups screening and status results in this following way? I would like to primarily group my new data table by age range (1-6) and gender (1-2).

Age Range | Gender | Positive | Screened | All | All Screened Difference | Screened Positive Difference | Screening rate | Prevalence Rate |

1 | 1 | 10 | 2780 | 3935 | 1155 | 2770 | 71% | 0.4% |

1 | 2 | 6 | 2325 | 3302 | 977 | 2319 | 70% | 0.3% |

2 | 1 | 25 | 1172 | 1386 | 214 | 1147 | 85% | 2.1% |

2 | 2 | 22 | 3098 | 3592 | 494 | 3076 | 86% | 0.7% |

3 | 1 | 91 | 951 | 1149 | 198 | 860 | 83% | 9.6% |

3 | 2 | 39 | 1835 | 2221 | 386 | 1796 | 83% | 2.1% |

4 | 1 | 189 | 1073 | 1317 | 244 | 884 | 81% | 17.6% |

4 | 2 | 27 | 945 | 1190 | 245 | 918 | 79% | 2.9% |

5 | 1 | 198 | 1586 | 1921 | 335 | 1388 | 83% | 12.5% |

5 | 2 | 42 | 1132 | 1413 | 281 | 1090 | 80% | 3.7% |

6 | 1 | 179 | 3902 | 4793 | 891 | 3723 | 81% | 4.6% |

6 | 2 | 175 | 3932 | 4862 | 930 | 3757 | 81% | 4.5% |

Here are the explanation of the columns:

- Positive: Unique count under each age range and specific gender where the status variable had 'Positive' in it.

- Screened: Unique count under each age range and specific gender where the screened variable had 'Yes' in it.

- All: Unique count under each age range and specific gender in the dataset.

- All Screened Difference: All - Screened

- Screened Positive Difference: Screened - Positive

- Screening rate: Screened/Total

- Positive prevalence; Positive/Screen

Thanks so much for your help!

Accepted Solutions

Solution

04-25-2017
03:35 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-24-2017 11:18 AM

I did a roundabout way of getting to your solution. I separated genders into two different tables and then merged them together and proc sort them according to age range and gender. I'm not sure what exactly happened though. The output of your code and mine are off. Do you know what is wrong with the code? I added a bigger dataset so that all age ranges for all genders would have some kind of entry.

```
/*extended sample dataset of 30 entries*/
data sampleunique;
infile datalines dsd truncover;
input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
15,1,1,Yes,Negative
16,1,2,Yes,Negative
17,1,3,Yes,Negative
18,1,4,No,
19,1,5,No,
20,1,6,No,
21,2,1,No,
22,2,2,No,
23,2,3,No,
24,2,4,No,
25,2,5,Yes,Positive
26,2,6,Yes,Positive
27,2,1,Yes,Positive
28,2,6,Yes,Negative
29,1,3,Yes,Negative
30,2,1,Yes,Positive
;;;;
/*this was your code*/
proc sql;
create table WANT as
select AGERANGE,
GENDER,
count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE,
count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED,
count(*) as TOTAL,
CALCULATED TOTAL - CALCULATED SCREENED as TOTAL_SCREEN_DIFF,
CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF,
(CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREEN_RATE,
(CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV
from SAMPLEUNIQUE
group by AGERANGE,
GENDER;
quit;
/*this is mine*/
proc sql;
create table sampleMALE as
select agerange,
sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
count(*) as TOTAL,
(CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
(CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
(CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
(CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
from sampleunique
where gender=1
group by agerange;
quit;
data samplemale;
set samplemale;
gender=1;
run;
proc sql;
create table sampleFEMALE as
select agerange,
sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
count(*) as TOTAL,
(CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
(CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
(CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
(CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
from sampleunique
where gender=2
group by agerange;
quit;
data sampleFEMALE;
set sampleFEMALE;
Gender=2;
run;
data sampleMERGE;
retain agerange gender positive;
set sampleMALE sampleFEMALE;
run;
proc sort data=sampleMERGE;
by agerange gender;
run;
```

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to byeh2017

04-24-2017 05:56 AM

Exactly the same as in this post:

You just change the groups?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-24-2017 06:20 AM

I got most of the code to work, but not quite sure how to group genders in. Whenenver I select agerange and gender and group by the same two variables, the table creates 31,000+ entries.

I am using my real dataset in here, so the variables are a bit different

```
proc sql;
create table janfeb.Janhosptypescreeningresults as
select agerange,
sum(case when char(hcvresult,1)="P" then 1 else 0 end) as POSITIVE,
sum(case when hcvscreening="Yes" then 1 else 0 end) as SCREENED,
count(*) as TOTAL,
(CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
(CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
(CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
(CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
from janfeb.jfgmi
where dcdeathdate >= dhms(mdy(02,01,2017),0,0,0) and dcdeathdate <= dhms(mdy(02,28,2017),0,0,0)
group by agerange;
quit;
```

How do I include the count for agerange and gender into this new table? Thanks!

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to byeh2017

04-24-2017 06:34 AM

Mabye restart your question by asking what it is your actually trying to do. From what you have said it sounds like your creating a table based on demography data. The simplest way to do this would be to do a proc freq on your data, then post process the resulting output from that, something like:

data SAMPLEUNIQUE; infile datalines dsd truncover; input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.; datalines4; 1,1,1,Yes,Positive 2,2,2,No, 3,1,3,Yes,Negative 4,2,4,No, 5,1,5,Yes,Positive 6,2,6,No, 7,1,1,Yes,Negative 8,2,2,No, 9,1,3,Yes,Positive 10,2,4,No, 11,1,5,Yes,Negative 12,2,6,Yes,Positive 13,1,1,Yes,Negative 14,2,3,Yes,Positive ;;;; run; proc freq data=sampleunique noprint; tables gender * agerange * screen * result / out=want; run;

The want table can then be transposed up if needed, and you can add screening counts to that and then calculate the percentages.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-24-2017 09:07 AM

Thanks for your suggestions. How do I transpose it and add the difference and percentages? I proc freq that table and got this dataset:

```
data WORK.WANT;
infile datalines dsd truncover;
input agerange:32. gender:$1. HCVScreening:$9. HCVresult:$27. COUNT:32. PERCENT:32.;
datalines4;
1,1,No,,1114,
1,1,No,Negative,41,0.1619209352
1,1,Yes,Negative,2770,10.939536353
1,1,Yes,Positive,10,0.039492911
1,2,No,,931,
1,2,No,Negative,46,0.1816673907
1,2,Yes,Negative,2319,9.1584060661
1,2,Yes,Positive,6,0.0236957466
2,1,,,1,
2,1,No,,193,
2,1,No,Negative,18,0.0710872398
2,1,No,Positive,2,0.0078985822
2,1,Yes,Negative,1147,4.5298368943
2,1,Yes,Positive,25,0.0987322776
2,2,No,,389,
2,2,No,Negative,105,0.4146755657
2,2,Yes,Negative,3076,12.148019431
2,2,Yes,Positive,22,0.0868844042
3,1,,,1,
3,1,No,,173,
3,1,No,Negative,23,0.0908336954
3,1,No,Positive,1,0.0039492911
3,1,Yes,Negative,860,3.3963903479
3,1,Yes,Positive,91,0.3593854903
3,2,No,,314,
3,2,No,Negative,72,0.2843489594
3,2,Yes,Negative,1796,7.0929268196
3,2,Yes,Positive,39,0.154022353
4,1,,,4,
4,1,No,,221,
4,1,No,Negative,18,0.0710872398
4,1,No,Positive,1,0.0039492911
4,1,Yes,Negative,884,3.4911733344
4,1,Yes,Positive,189,0.7464160183
4,2,No,,195,
4,2,No,Negative,50,0.1974645551
4,2,Yes,Negative,918,3.6254492319
4,2,Yes,Positive,27,0.1066308598
5,1,,,3,
5,1,No,,305,
5,1,No,Negative,27,0.1066308598
5,1,Yes,Negative,1388,5.4816160499
5,1,Yes,Positive,198,0.7819596382
5,2,No,,241,
5,2,No,Negative,40,0.1579716441
5,2,Yes,Negative,1090,4.3047273014
5,2,Yes,Positive,42,0.1658702263
6,1,,,2,
6,1,No,,825,
6,1,No,Negative,62,0.2448560483
6,1,Yes,Negative,3725,14.711109356
6,1,Yes,Positive,179,0.7069231073
6,2,,,1,
6,2,No,,846,
6,2,No,Negative,82,0.3238418704
6,2,Yes,,1,
6,2,Yes,Negative,3757,14.837486671
6,2,Yes,Positive,175,0.6911259429
;;;;
```

Is there no way to do this on proc sql? I tried to select and group by agerange and gender, but it gave me 31,000 entries. Not quite sure what happened.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to byeh2017

04-24-2017 09:25 AM

Well, you could do it in SQL, it just becomes a bit fiddly see below. Thats why its easier to do a proc freq, then in datastep move your data up (see methods to reshape data, a ton of results on Google). Note your test data doesn't seem to have any differences in it or maybe I misunderstand the total?

data SAMPLEUNIQUE; infile datalines dsd truncover; input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.; datalines4; 1,1,1,Yes,Positive 2,2,2,No, 3,1,3,Yes,Negative 4,2,4,No, 5,1,5,Yes,Positive 6,2,6,No, 7,1,1,Yes,Negative 8,2,2,No, 9,1,3,Yes,Positive 10,2,4,No, 11,1,5,Yes,Negative 12,2,6,Yes,Positive 13,1,1,Yes,Negative 14,2,3,Yes,Positive ;;;; run; proc sql; create table WANT as select AGERANGE, GENDER, count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE, count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED, count(*) as ALL, CALCULATED ALL - CALCULATED SCREENED as ALL_DIFF, CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF, (CALCULATED SCREENED / CALCULATED ALL) * 100 as SCREEN_RATE, (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV from SAMPLEUNIQUE group by AGERANGE, GENDER; quit;

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-24-2017 11:10 AM

There are no differences in the test data. The difference variables are basically the following:

All_Screened_Diff= Distinct count of EVERYONE in that agerange and gender minus distinct count of those SCREENED in that agerange and gender

Screened_Pos_Diff= Distinct count of SCREENED in that agerange and gender minus distinct count of those POSITIVE in result in that agerange and gender

Solution

04-25-2017
03:35 AM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-24-2017 11:18 AM

I did a roundabout way of getting to your solution. I separated genders into two different tables and then merged them together and proc sort them according to age range and gender. I'm not sure what exactly happened though. The output of your code and mine are off. Do you know what is wrong with the code? I added a bigger dataset so that all age ranges for all genders would have some kind of entry.

```
/*extended sample dataset of 30 entries*/
data sampleunique;
infile datalines dsd truncover;
input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
15,1,1,Yes,Negative
16,1,2,Yes,Negative
17,1,3,Yes,Negative
18,1,4,No,
19,1,5,No,
20,1,6,No,
21,2,1,No,
22,2,2,No,
23,2,3,No,
24,2,4,No,
25,2,5,Yes,Positive
26,2,6,Yes,Positive
27,2,1,Yes,Positive
28,2,6,Yes,Negative
29,1,3,Yes,Negative
30,2,1,Yes,Positive
;;;;
/*this was your code*/
proc sql;
create table WANT as
select AGERANGE,
GENDER,
count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE,
count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED,
count(*) as TOTAL,
CALCULATED TOTAL - CALCULATED SCREENED as TOTAL_SCREEN_DIFF,
CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF,
(CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREEN_RATE,
(CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV
from SAMPLEUNIQUE
group by AGERANGE,
GENDER;
quit;
/*this is mine*/
proc sql;
create table sampleMALE as
select agerange,
sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
count(*) as TOTAL,
(CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
(CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
(CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
(CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
from sampleunique
where gender=1
group by agerange;
quit;
data samplemale;
set samplemale;
gender=1;
run;
proc sql;
create table sampleFEMALE as
select agerange,
sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
count(*) as TOTAL,
(CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
(CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
(CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
(CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
from sampleunique
where gender=2
group by agerange;
quit;
data sampleFEMALE;
set sampleFEMALE;
Gender=2;
run;
data sampleMERGE;
retain agerange gender positive;
set sampleMALE sampleFEMALE;
run;
proc sort data=sampleMERGE;
by agerange gender;
run;
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to byeh2017

04-24-2017 12:51 PM

You should not put where clauses with group by's. If you need to restrict it then (select * from have where xyz) subquery it. As for different counts, sure your doing different grouping, you will get different output?

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

04-24-2017 02:21 PM

Thanks. How would you put the where statement here?

```
proc sql;
create table janfeb.Febscreeningresults as
select agerange, gender,
sum(case when char(hcvresult,1)="P" then 1 else 0 end) as POSITIVE,
sum(case when hcvscreening="Yes" then 1 else 0 end) as SCREENED,
count(*) as TOTAL,
(CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
(CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
(CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
(CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
from janfeb.jfgmi
where dcdeathdate >= dhms(mdy(02,01,2017),0,0,0) and dcdeathdate <= dhms(mdy(02,28,2017),0,0,0)
group by agerange, gender;
quit;
```

It seems to work fine on sas when i put the where statement below the from statement. Thanks again.