Solved: Re: Creating an alternative table from unique counts and rates from or...

byeh2017 · Posted 04-24-2017 05:40 AM

Hello. I have a dataset that contains 5 variables. "ID" is a unique identifier of a person in my dataset. "Gender" is male (1) or female (2). AgeRange contains only 1 through 6, which indicates which age range they fall into based on a categorical scheme for my report. "Screen" is if they were ever medically screened. "Status" has 3 entries, positive, negative, or blank. It is blank when the patient has not been screened. Here is the sample dataset:

data JANFEB.SAMPLEUNIQUE;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
;;;;

How do I create a table from this dataset that groups screening and status results in this following way? I would like to primarily group my new data table by age range (1-6) and gender (1-2).

Age Range	Gender	Positive	Screened	All	All Screened Difference	Screened Positive Difference	Screening rate	Prevalence Rate
1	1	10	2780	3935	1155	2770	71%	0.4%
1	2	6	2325	3302	977	2319	70%	0.3%
2	1	25	1172	1386	214	1147	85%	2.1%
2	2	22	3098	3592	494	3076	86%	0.7%
3	1	91	951	1149	198	860	83%	9.6%
3	2	39	1835	2221	386	1796	83%	2.1%
4	1	189	1073	1317	244	884	81%	17.6%
4	2	27	945	1190	245	918	79%	2.9%
5	1	198	1586	1921	335	1388	83%	12.5%
5	2	42	1132	1413	281	1090	80%	3.7%
6	1	179	3902	4793	891	3723	81%	4.6%
6	2	175	3932	4862	930	3757	81%	4.5%

Here are the explanation of the columns:

- Positive: Unique count under each age range and specific gender where the status variable had 'Positive' in it.

- Screened: Unique count under each age range and specific gender where the screened variable had 'Yes' in it.

- All: Unique count under each age range and specific gender in the dataset.

- All Screened Difference: All - Screened

- Screened Positive Difference: Screened - Positive

- Screening rate: Screened/Total

- Positive prevalence; Positive/Screen

Thanks so much for your help!

byeh2017 · Posted 04-24-2017 11:18 AM

I did a roundabout way of getting to your solution. I separated genders into two different tables and then merged them together and proc sort them according to age range and gender. I'm not sure what exactly happened though. The output of your code and mine are off. Do you know what is wrong with the code? I added a bigger dataset so that all age ranges for all genders would have some kind of entry.

/*extended sample dataset of 30 entries*/

data sampleunique;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
15,1,1,Yes,Negative
16,1,2,Yes,Negative
17,1,3,Yes,Negative
18,1,4,No,
19,1,5,No,
20,1,6,No,
21,2,1,No,
22,2,2,No,
23,2,3,No,
24,2,4,No,
25,2,5,Yes,Positive
26,2,6,Yes,Positive
27,2,1,Yes,Positive
28,2,6,Yes,Negative
29,1,3,Yes,Negative
30,2,1,Yes,Positive
;;;;

/*this was your code*/

proc sql;
  create table WANT as
  select  AGERANGE,
          GENDER,
          count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE,
          count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          CALCULATED TOTAL - CALCULATED SCREENED as TOTAL_SCREEN_DIFF,
          CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREEN_RATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV
  from    SAMPLEUNIQUE
  group by AGERANGE,
           GENDER;
quit;

/*this is mine*/

proc sql;
  create table sampleMALE as
  select  agerange,
  		  sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    sampleunique
  where gender=1
  group by agerange;
quit;

data samplemale;
set samplemale;
gender=1;
run;

proc sql;
  create table sampleFEMALE as
  select  agerange,
  		  sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    sampleunique
  where gender=2
  group by agerange;
quit;

data sampleFEMALE;
set sampleFEMALE;
Gender=2;
run;

data sampleMERGE;
retain agerange gender positive;
set sampleMALE sampleFEMALE;
run;

proc sort data=sampleMERGE;
by agerange gender;
run;

View solution in original post

RW9 · Posted 04-24-2017 05:56 AM

Exactly the same as in this post:

https://communities.sas.com/t5/Base-SAS-Programming/Creating-an-alternative-table-from-unique-counts...

You just change the groups?

byeh2017 · Posted 04-24-2017 06:20 AM

I got most of the code to work, but not quite sure how to group genders in. Whenenver I select agerange and gender and group by the same two variables, the table creates 31,000+ entries.

I am using my real dataset in here, so the variables are a bit different

proc sql;
  create table janfeb.Janhosptypescreeningresults as
  select  agerange,
  		  sum(case when char(hcvresult,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when hcvscreening="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    janfeb.jfgmi
  where dcdeathdate >= dhms(mdy(02,01,2017),0,0,0) and dcdeathdate <= dhms(mdy(02,28,2017),0,0,0)
  group by agerange;
quit;

How do I include the count for agerange and gender into this new table? Thanks!

RW9 · Posted 04-24-2017 06:34 AM

Mabye restart your question by asking what it is your actually trying to do. From what you have said it sounds like your creating a table based on demography data. The simplest way to do this would be to do a proc freq on your data, then post process the resulting output from that, something like:

data SAMPLEUNIQUE;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
;;;;
run;

proc freq data=sampleunique noprint;
  tables gender * agerange * screen * result / out=want;
run;

The want table can then be transposed up if needed, and you can add screening counts to that and then calculate the percentages.

byeh2017 · Posted 04-24-2017 09:07 AM

Thanks for your suggestions. How do I transpose it and add the difference and percentages? I proc freq that table and got this dataset:

data WORK.WANT;
  infile datalines dsd truncover;
  input agerange:32. gender:$1. HCVScreening:$9. HCVresult:$27. COUNT:32. PERCENT:32.;
datalines4;
1,1,No,,1114,
1,1,No,Negative,41,0.1619209352
1,1,Yes,Negative,2770,10.939536353
1,1,Yes,Positive,10,0.039492911
1,2,No,,931,
1,2,No,Negative,46,0.1816673907
1,2,Yes,Negative,2319,9.1584060661
1,2,Yes,Positive,6,0.0236957466
2,1,,,1,
2,1,No,,193,
2,1,No,Negative,18,0.0710872398
2,1,No,Positive,2,0.0078985822
2,1,Yes,Negative,1147,4.5298368943
2,1,Yes,Positive,25,0.0987322776
2,2,No,,389,
2,2,No,Negative,105,0.4146755657
2,2,Yes,Negative,3076,12.148019431
2,2,Yes,Positive,22,0.0868844042
3,1,,,1,
3,1,No,,173,
3,1,No,Negative,23,0.0908336954
3,1,No,Positive,1,0.0039492911
3,1,Yes,Negative,860,3.3963903479
3,1,Yes,Positive,91,0.3593854903
3,2,No,,314,
3,2,No,Negative,72,0.2843489594
3,2,Yes,Negative,1796,7.0929268196
3,2,Yes,Positive,39,0.154022353
4,1,,,4,
4,1,No,,221,
4,1,No,Negative,18,0.0710872398
4,1,No,Positive,1,0.0039492911
4,1,Yes,Negative,884,3.4911733344
4,1,Yes,Positive,189,0.7464160183
4,2,No,,195,
4,2,No,Negative,50,0.1974645551
4,2,Yes,Negative,918,3.6254492319
4,2,Yes,Positive,27,0.1066308598
5,1,,,3,
5,1,No,,305,
5,1,No,Negative,27,0.1066308598
5,1,Yes,Negative,1388,5.4816160499
5,1,Yes,Positive,198,0.7819596382
5,2,No,,241,
5,2,No,Negative,40,0.1579716441
5,2,Yes,Negative,1090,4.3047273014
5,2,Yes,Positive,42,0.1658702263
6,1,,,2,
6,1,No,,825,
6,1,No,Negative,62,0.2448560483
6,1,Yes,Negative,3725,14.711109356
6,1,Yes,Positive,179,0.7069231073
6,2,,,1,
6,2,No,,846,
6,2,No,Negative,82,0.3238418704
6,2,Yes,,1,
6,2,Yes,Negative,3757,14.837486671
6,2,Yes,Positive,175,0.6911259429
;;;;

Is there no way to do this on proc sql? I tried to select and group by agerange and gender, but it gave me 31,000 entries. Not quite sure what happened.

RW9 · Posted 04-24-2017 09:25 AM

Well, you could do it in SQL, it just becomes a bit fiddly see below. Thats why its easier to do a proc freq, then in datastep move your data up (see methods to reshape data, a ton of results on Google). Note your test data doesn't seem to have any differences in it or maybe I misunderstand the total?

data SAMPLEUNIQUE;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
;;;;
run;

proc sql;
  create table WANT as
  select  AGERANGE,
          GENDER,
          count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE,
          count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED,
          count(*) as ALL,
          CALCULATED ALL - CALCULATED SCREENED as ALL_DIFF,
          CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF,
          (CALCULATED SCREENED / CALCULATED ALL) * 100 as SCREEN_RATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV
  from    SAMPLEUNIQUE
  group by AGERANGE,
           GENDER;
quit;

byeh2017 · Posted 04-24-2017 11:10 AM

There are no differences in the test data. The difference variables are basically the following:

All_Screened_Diff= Distinct count of EVERYONE in that agerange and gender minus distinct count of those SCREENED in that agerange and gender

Screened_Pos_Diff= Distinct count of SCREENED in that agerange and gender minus distinct count of those POSITIVE in result in that agerange and gender

byeh2017 · Posted 04-24-2017 11:18 AM

I did a roundabout way of getting to your solution. I separated genders into two different tables and then merged them together and proc sort them according to age range and gender. I'm not sure what exactly happened though. The output of your code and mine are off. Do you know what is wrong with the code? I added a bigger dataset so that all age ranges for all genders would have some kind of entry.

/*extended sample dataset of 30 entries*/

data sampleunique;
  infile datalines dsd truncover;
  input ID:BEST. Gender:BEST. AgeRange:BEST. Screen:$3. Result:$8.;
datalines4;
1,1,1,Yes,Positive
2,2,2,No,
3,1,3,Yes,Negative
4,2,4,No,
5,1,5,Yes,Positive
6,2,6,No,
7,1,1,Yes,Negative
8,2,2,No,
9,1,3,Yes,Positive
10,2,4,No,
11,1,5,Yes,Negative
12,2,6,Yes,Positive
13,1,1,Yes,Negative
14,2,3,Yes,Positive
15,1,1,Yes,Negative
16,1,2,Yes,Negative
17,1,3,Yes,Negative
18,1,4,No,
19,1,5,No,
20,1,6,No,
21,2,1,No,
22,2,2,No,
23,2,3,No,
24,2,4,No,
25,2,5,Yes,Positive
26,2,6,Yes,Positive
27,2,1,Yes,Positive
28,2,6,Yes,Negative
29,1,3,Yes,Negative
30,2,1,Yes,Positive
;;;;

/*this was your code*/

proc sql;
  create table WANT as
  select  AGERANGE,
          GENDER,
          count(case when char(RESULT,1)="P" then 1 else 0 end) as POSITIVE,
          count(case when char(SCREEN,1)="Y" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          CALCULATED TOTAL - CALCULATED SCREENED as TOTAL_SCREEN_DIFF,
          CALCULATED SCREENED - CALCULATED POSITIVE as SCREEN_POS_DIFF,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREEN_RATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POS_PREV
  from    SAMPLEUNIQUE
  group by AGERANGE,
           GENDER;
quit;

/*this is mine*/

proc sql;
  create table sampleMALE as
  select  agerange,
  		  sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    sampleunique
  where gender=1
  group by agerange;
quit;

data samplemale;
set samplemale;
gender=1;
run;

proc sql;
  create table sampleFEMALE as
  select  agerange,
  		  sum(case when char(result,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when screen="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    sampleunique
  where gender=2
  group by agerange;
quit;

data sampleFEMALE;
set sampleFEMALE;
Gender=2;
run;

data sampleMERGE;
retain agerange gender positive;
set sampleMALE sampleFEMALE;
run;

proc sort data=sampleMERGE;
by agerange gender;
run;

RW9 · Posted 04-24-2017 12:51 PM

You should not put where clauses with group by's. If you need to restrict it then (select * from have where xyz) subquery it. As for different counts, sure your doing different grouping, you will get different output?

byeh2017 · Posted 04-24-2017 02:21 PM

Thanks. How would you put the where statement here?

proc sql;
  create table janfeb.Febscreeningresults as
  select  agerange, gender,
  		  sum(case when char(hcvresult,1)="P" then 1 else 0 end) as POSITIVE,
          sum(case when hcvscreening="Yes" then 1 else 0 end) as SCREENED,
          count(*) as TOTAL,
          (CALCULATED TOTAL - CALCULATED SCREENED) as TotalScreenedDifference,
          (CALCULATED SCREENED - CALCULATED POSITIVE) as ScreenedPositiveDifference,
          (CALCULATED SCREENED / CALCULATED TOTAL) * 100 as SCREENINGRATE,
          (CALCULATED POSITIVE / CALCULATED SCREENED) * 100 as POSITIVEPREVALENCE
  from    janfeb.jfgmi
  where dcdeathdate >= dhms(mdy(02,01,2017),0,0,0) and dcdeathdate <= dhms(mdy(02,28,2017),0,0,0)
  group by agerange, gender;
quit;

It seems to work fine on sas when i put the where statement below the from statement. Thanks again.

Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Re: Creating an alternative table from unique counts and rates from original dataset

Classroom Training Available!