data w;
length acctno $13 deactreason $4 dealertype $2 Province $2;
infile datalines dsd delimiter = '|';
informat actdt mmddyy10. deactdt mmddyy10. sales dollar8.0;
input acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales;
datalines;
1176913194483|06/20/1999|||0|1|A1|58|BC|128
1176914599423|10/04/1999|10/15/1999|NEED|1|1|A1|45|AB|72
1176951913656|07/01/2000|||0|1|A1|57|BC|593
1176954000288|05/30/2000|||1|2|A1|47|ON|83
1176969186303|12/13/2000|||1|1|C1|82|BC|
1176991056273|08/31/1999|09/18/2000|MOVE|1|1|C1|92|QC|1041
1176991866552|05/24/2000|||1|1|A1|77|ON|
1176992889500|11/28/2000|||1|1|C1|68|AB|72
1177000067271|12/23/1999|||0|1|B1|75|ON|134
1177010940613|12/09/1999|||1|2|A1|42|NS|11
1177025997013|11/09/1999|||1|1|A1|26|BC|154
1177027515760|10/19/1999|||1|1|B1|73|BC|16
1177028996676|09/21/2000|||0|1|C1||QC|179
1177038747105|03/14/2000|||0|1|C1|41|ON|705
1177045857516|06/22/2000|||1|1|A1|53|QC|83
1177057406016|09/21/2000|||0|1|C1|50|ON|529
1177066422248|04/26/1999|01/15/2001|NEED|0|1|A2|55|NS|44
;
run;
I need to find the following:
My codes don’t produced the desired results; please pinpoint where need fixing
proc sort data=w out=w2 (keep=actdt deactdt);
by acctno ;
run;
data w; set wireless (keep= actdt deactdt);
by acctno;
*** sets the first new value to null ***;
if first.actdt then count=.;
Here I’m missing the code for the deactdt count
*** starts counting by 1 by actdt ***;
count+1;
new_var=_n_;
run;
Or
data w2; set w (keep= actdt deactdt);
by acctno;
*** holds the value of count and sets the default value ***;
retain count 0;
*** adds the value of count per each new value of actdt & deactdt ***;
if first.actdt then count=count+1;
Again I’m missing the code to for the deactdt count
run;
/* create a format to group missing and nonmissing */
proc format;
value $missfmt ' '='Missing' other='Not Missing';
value missfmt . ='Missing' other='Not Missing';
run;
Or
proc freq data=wireless;
format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */
tables _CHAR_ / missing missprint nocum nopercent;
format _NUMERIC_ missfmt.;
tables _NUMERIC_ / missing missprint nocum nopercent;
run;
Since I have a lot of variables, make sense to count the missing values using an ARRAY and the CMISS function, is this right? What if I want count a select variable set, such as actdt and deactdt?
data countmissing;
set w;
array vars(10) acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales _NUMERIC_;
missing=0;
do i = 1 to 10;
if vars(i)=' ' then missing+1;
end;
/* Using CMISS, only one statement needed rather than 5: */
c_miss_missing = cmiss(of vars[*]);
drop i;
run;
proc print;
run;
Or
proc iml;
use Missing;
read all var _NUM_ into x;
close Missing;
rowMiss = countmiss(x, "ROW"); /* returns 0,1,2 or 3 for each row */
print rowmiss;
proc sql;
create table w2 as
select min(thedates) as min_date format=mmddyy10.,
max(thedates) as max_date format=mmddyy10.
from w;
quit;
proc print; title from dataset &syslast;run;
proc transpose data=w2 out=new(rename=(_name_=date)) ;
run;
proc print; title from dataset &syslast;run;
Why can't you just run PROC FREQ on the data set and get the number deactivated each month/year based on the date?
I see nothing in the data about activation so no idea how you'd report on that. Remember we can only see your data so have no idea how it's collected or if the logic is correct.
proc freq data=w;
table deactdt/out=want;
format deactdt monyy7.;
run;
@dulce wrote:
data w;
length acctno $13 deactreason $4 dealertype $2 Province $2;
infile datalines dsd delimiter = '|';
informat actdt mmddyy10. deactdt mmddyy10. sales dollar8.0;
input acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales;datalines;
1176913194483|06/20/1999|||0|1|A1|58|BC|128
1176914599423|10/04/1999|10/15/1999|NEED|1|1|A1|45|AB|72
1176951913656|07/01/2000|||0|1|A1|57|BC|593
1176954000288|05/30/2000|||1|2|A1|47|ON|83
1176969186303|12/13/2000|||1|1|C1|82|BC|
1176991056273|08/31/1999|09/18/2000|MOVE|1|1|C1|92|QC|1041
1176991866552|05/24/2000|||1|1|A1|77|ON|
1176992889500|11/28/2000|||1|1|C1|68|AB|72
1177000067271|12/23/1999|||0|1|B1|75|ON|134
1177010940613|12/09/1999|||1|2|A1|42|NS|11
1177025997013|11/09/1999|||1|1|A1|26|BC|154
1177027515760|10/19/1999|||1|1|B1|73|BC|16
1177028996676|09/21/2000|||0|1|C1||QC|179
1177038747105|03/14/2000|||0|1|C1|41|ON|705
1177045857516|06/22/2000|||1|1|A1|53|QC|83
1177057406016|09/21/2000|||0|1|C1|50|ON|529
1177066422248|04/26/1999|01/15/2001|NEED|0|1|A2|55|NS|44
;
run;
I need to find the following:
- The number of activated and deactivated accounts?
My codes don’t produced the desired results; please pinpoint where need fixing
proc sort data=w out=w2 (keep=actdt deactdt);
by acctno ;
run;
data w; set wireless (keep= actdt deactdt);
by acctno;
*** sets the first new value to null ***;
if first.actdt then count=.;
Here I’m missing the code for the deactdt count
*** starts counting by 1 by actdt ***;
count+1;
new_var=_n_;
run;Or
data w2; set w (keep= actdt deactdt);
by acctno;
*** holds the value of count and sets the default value ***;
retain count 0;
*** adds the value of count per each new value of actdt & deactdt ***;
if first.actdt then count=count+1;Again I’m missing the code to for the deactdt count
run;
- Are there any missing activated and deactivated dates?
/* create a format to group missing and nonmissing */
proc format;
value $missfmt ' '='Missing' other='Not Missing';
value missfmt . ='Missing' other='Not Missing';
run;
Or
proc freq data=wireless;
format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */
tables _CHAR_ / missing missprint nocum nopercent;
format _NUMERIC_ missfmt.;
tables _NUMERIC_ / missing missprint nocum nopercent;
run;
Since I have a lot of variables, make sense to count the missing values using an ARRAY and the CMISS function, is this right? What if I want count a select variable set, such as actdt and deactdt?
data countmissing;
set w;
array vars(10) acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales _NUMERIC_;
missing=0;
do i = 1 to 10;
if vars(i)=' ' then missing+1;
end;
/* Using CMISS, only one statement needed rather than 5: */
c_miss_missing = cmiss(of vars[*]);
drop i;
run;
proc print;
run;
Or
proc iml;
use Missing;
read all var _NUM_ into x;
close Missing;
rowMiss = countmiss(x, "ROW"); /* returns 0,1,2 or 3 for each row */
print rowmiss;
- When is the earliest and latest activation and deactivation dates available?
proc sql;
create table w2 as
select min(thedates) as min_date format=mmddyy10.,
max(thedates) as max_date format=mmddyy10.
from w;
quit;
proc print; title from dataset &syslast;run;
proc transpose data=w2 out=new(rename=(_name_=date)) ;
run;
proc print; title from dataset &syslast;run;
- What is the age and province distributions of active and deactivated customers? (Use dashboards)
@ Reeza,
Thanks for responding. I appreciate your partial answer.
I'm a beginners SAS programmer; currently learning the programming language.
I gave all the specs relating to the small sample of the data set (in actuality it contains well over 1K records). You need to pick a record which contains all the variables. As noted previously, missing column variable values are denoted by delimiter = '|';
Each observation or row contains variable values: acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales;
I need to find
1. the number of accounts that were activated and deactivated?
2.any missing activated and deactivated dates?
3.when the earliest and latest activation and deactivation dates available?
4. the age and province distribution of active and deactivated customers?
I'm a novice programmer and I tried my best to come up with the codes to satisfy the questions. However, the codes don't produced the desired results. That's why I'm turning to the SAS community for help.
Please help fix my codes so they answer the above questions.
With appreciation.
D
I know how to read the raw data file and I'm a SAS beginners. You, on the other hand, a "Super User" who is clueless reading the dataset. I was hoping to get some assistance on this forum; unfortunately, I didn't receive any but left with disappointment and frustration.
My advice to you stay off of SAS Community. You're clueless. You're unhelpful.
@dulce wrote:
I know how to read the raw data file and I'm a SAS beginners. You, on the other hand, a "Super User" who is clueless reading the dataset. I was hoping to get some assistance on this forum; unfortunately, I didn't receive any but left with disappointment and frustration.
My advice to you stay off of SAS Community. You're clueless. You're unhelpful.
You're not entitled to my time or energy.
The first problem in your code is that you are overwriting datasets all over the place.
You create the "w" dataset with your raw data and then sort this dataset into the "w2" dataset. Your next DATA step overwrites the raw dataset (w) with some data elements of a dataset called "wireless."
Then your next DATA step overwrites the sorted "w2" dataset with some data elements from the unsorted "w" dataset. This DATA statement contains another error. You attempt to use the FIRST function on an unsorted variable. In order to use "first.actdt" the "actdt" variable must be one of the variables on which the dataset is sorted. Otherwise FIRST. and LAST. have no meaning.
It is unclear from your message whether you are just trying to get counts of things or want to create a an analytic dataset that contain the variables necessary to answer your four questions. If you just want counts, then the judicious use of PROC FREQ will suffice (and you will only need the first dataset you created).
You can create several tables in a single execution of PROC FREQ..
PROC FREQ usually sorts the Frequency from lowest to highest, so running PROC FREQ on your activation and deactivation dates will show you in the output the lowest and highest values for each date and the number of observations where the dates are missing. This will answer Questions 2 and 3.
Since it appears that your account numbers are unique, those same PROC FREQ outputs will answer question 1. The answer to question 4 will come from PROC FREQ outputs on the variables of interest.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.