BookmarkSubscribeRSS Feed
dulce
Calcite | Level 5

data w;
       length acctno $13  deactreason $4 dealertype $2 Province $2;
       infile datalines dsd delimiter = '|';
       informat actdt mmddyy10. deactdt mmddyy10. sales dollar8.0;
       input acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales;

datalines;

1176913194483|06/20/1999|||0|1|A1|58|BC|128

1176914599423|10/04/1999|10/15/1999|NEED|1|1|A1|45|AB|72

1176951913656|07/01/2000|||0|1|A1|57|BC|593

1176954000288|05/30/2000|||1|2|A1|47|ON|83

1176969186303|12/13/2000|||1|1|C1|82|BC|

1176991056273|08/31/1999|09/18/2000|MOVE|1|1|C1|92|QC|1041

1176991866552|05/24/2000|||1|1|A1|77|ON|

1176992889500|11/28/2000|||1|1|C1|68|AB|72

1177000067271|12/23/1999|||0|1|B1|75|ON|134

1177010940613|12/09/1999|||1|2|A1|42|NS|11

1177025997013|11/09/1999|||1|1|A1|26|BC|154

1177027515760|10/19/1999|||1|1|B1|73|BC|16

1177028996676|09/21/2000|||0|1|C1||QC|179

1177038747105|03/14/2000|||0|1|C1|41|ON|705

1177045857516|06/22/2000|||1|1|A1|53|QC|83

1177057406016|09/21/2000|||0|1|C1|50|ON|529

1177066422248|04/26/1999|01/15/2001|NEED|0|1|A2|55|NS|44
       ;
       run;

 

I need to find the following:

 

  1. The number of activated and deactivated accounts?

 

My codes don’t produced the desired results; please pinpoint where need fixing

 

proc sort data=w out=w2 (keep=actdt deactdt);
by acctno ;
run;

data w; set wireless (keep= actdt deactdt);
by acctno;
*** sets the first new value to null ***;
if first.actdt then count=.;

 

Here I’m missing the code for the deactdt count


*** starts counting by 1 by actdt ***;
count+1;
new_var=_n_;
run;

Or


data w2; set w (keep= actdt deactdt);
by acctno;
*** holds the value of count and sets the default value ***;
retain count 0;
*** adds the value of count per each new value of actdt & deactdt ***;
if first.actdt then count=count+1;

Again I’m missing the code to for the deactdt count
run;

 

  1. Are there any missing activated and deactivated dates?

 

/* create a format to group missing and nonmissing */
proc format;
 value $missfmt ' '='Missing' other='Not Missing';
 value  missfmt  . ='Missing' other='Not Missing';
run;

 

Or

 

proc freq data=wireless;
format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */
tables _CHAR_ / missing missprint nocum nopercent;
format _NUMERIC_ missfmt.;

tables _NUMERIC_ / missing missprint nocum nopercent;
run;

Since I have a lot of variables, make sense to count the missing values using an ARRAY and the CMISS function, is this right? What if I want count a select variable set, such as actdt and deactdt?
                                                                                                                              
data countmissing;                                                                                                                     
  set w;                                                                                                                             
  array vars(10)  acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales _NUMERIC_;                                                       
  missing=0;                                                                                                                           
  do i = 1 to 10;                                                                                                                        
    if vars(i)=' ' then missing+1;                                                                                                     
  end;                                                                                                                                 
  /* Using CMISS, only one statement needed rather than 5: */                                                                           
  c_miss_missing = cmiss(of vars[*]);                                                                                                  
  drop i;                                                                                                                               
run;                                                                                                                                   
proc print;                                                                                                                             
run;               

 

Or

 

proc iml;
use Missing;
read all var _NUM_ into x;
close Missing;
 
rowMiss = countmiss(x, "ROW"); /* returns 0,1,2 or 3 for each row */
print rowmiss;

 

  1. When is the earliest and latest activation and deactivation dates available?

 

proc sql;
create table w2 as
    select min(thedates) as min_date format=mmddyy10.,
                 max(thedates) as max_date format=mmddyy10.
             from w;
quit;
proc print; title from dataset &syslast;run;

proc transpose data=w2 out=new(rename=(_name_=date)) ;
run;

proc print; title from dataset &syslast;run;

 

  1. What is the age and province distributions of active and deactivated customers? (Use dashboards)
6 REPLIES 6
Reeza
Super User

Why can't you just run PROC FREQ on the data set and get the number deactivated each month/year based on the date?

I see nothing in the data about activation so no idea how you'd report on that. Remember we can only see your data so have no idea how it's collected or if the logic is correct.

proc freq data=w;
table deactdt/out=want;
format deactdt monyy7.;
run;

@dulce wrote:

data w;
       length acctno $13  deactreason $4 dealertype $2 Province $2;
       infile datalines dsd delimiter = '|';
       informat actdt mmddyy10. deactdt mmddyy10. sales dollar8.0;
       input acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales;

datalines;

1176913194483|06/20/1999|||0|1|A1|58|BC|128

1176914599423|10/04/1999|10/15/1999|NEED|1|1|A1|45|AB|72

1176951913656|07/01/2000|||0|1|A1|57|BC|593

1176954000288|05/30/2000|||1|2|A1|47|ON|83

1176969186303|12/13/2000|||1|1|C1|82|BC|

1176991056273|08/31/1999|09/18/2000|MOVE|1|1|C1|92|QC|1041

1176991866552|05/24/2000|||1|1|A1|77|ON|

1176992889500|11/28/2000|||1|1|C1|68|AB|72

1177000067271|12/23/1999|||0|1|B1|75|ON|134

1177010940613|12/09/1999|||1|2|A1|42|NS|11

1177025997013|11/09/1999|||1|1|A1|26|BC|154

1177027515760|10/19/1999|||1|1|B1|73|BC|16

1177028996676|09/21/2000|||0|1|C1||QC|179

1177038747105|03/14/2000|||0|1|C1|41|ON|705

1177045857516|06/22/2000|||1|1|A1|53|QC|83

1177057406016|09/21/2000|||0|1|C1|50|ON|529

1177066422248|04/26/1999|01/15/2001|NEED|0|1|A2|55|NS|44
       ;
       run;

 

I need to find the following:

 

  1. The number of activated and deactivated accounts?

 

My codes don’t produced the desired results; please pinpoint where need fixing

 

proc sort data=w out=w2 (keep=actdt deactdt);
by acctno ;
run;

data w; set wireless (keep= actdt deactdt);
by acctno;
*** sets the first new value to null ***;
if first.actdt then count=.;

 

Here I’m missing the code for the deactdt count


*** starts counting by 1 by actdt ***;
count+1;
new_var=_n_;
run;

Or


data w2; set w (keep= actdt deactdt);
by acctno;
*** holds the value of count and sets the default value ***;
retain count 0;
*** adds the value of count per each new value of actdt & deactdt ***;
if first.actdt then count=count+1;

Again I’m missing the code to for the deactdt count
run;

 

  1. Are there any missing activated and deactivated dates?

 

/* create a format to group missing and nonmissing */
proc format;
 value $missfmt ' '='Missing' other='Not Missing';
 value  missfmt  . ='Missing' other='Not Missing';
run;

 

Or

 

proc freq data=wireless;
format _CHAR_ $missfmt.; /* apply format for the duration of this PROC */
tables _CHAR_ / missing missprint nocum nopercent;
format _NUMERIC_ missfmt.;

tables _NUMERIC_ / missing missprint nocum nopercent;
run;

Since I have a lot of variables, make sense to count the missing values using an ARRAY and the CMISS function, is this right? What if I want count a select variable set, such as actdt and deactdt?
                                                                                                                              
data countmissing;                                                                                                                     
  set w;                                                                                                                             
  array vars(10)  acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales _NUMERIC_;                                                       
  missing=0;                                                                                                                           
  do i = 1 to 10;                                                                                                                        
    if vars(i)=' ' then missing+1;                                                                                                     
  end;                                                                                                                                 
  /* Using CMISS, only one statement needed rather than 5: */                                                                           
  c_miss_missing = cmiss(of vars[*]);                                                                                                  
  drop i;                                                                                                                               
run;                                                                                                                                   
proc print;                                                                                                                             
run;               

 

Or

 

proc iml;
use Missing;
read all var _NUM_ into x;
close Missing;
 
rowMiss = countmiss(x, "ROW"); /* returns 0,1,2 or 3 for each row */
print rowmiss;

 

  1. When is the earliest and latest activation and deactivation dates available?

 

proc sql;
create table w2 as
    select min(thedates) as min_date format=mmddyy10.,
                 max(thedates) as max_date format=mmddyy10.
             from w;
quit;
proc print; title from dataset &syslast;run;

proc transpose data=w2 out=new(rename=(_name_=date)) ;
run;

proc print; title from dataset &syslast;run;

 

  1. What is the age and province distributions of active and deactivated customers? (Use dashboards)



dulce
Calcite | Level 5

@ Reeza,

 

Thanks for responding. I appreciate your partial answer.

 

I'm a beginners SAS programmer; currently learning the programming language.

 

I gave all the specs relating to the small sample of the data set (in actuality it contains well over 1K records). You need to pick a record which contains all the variables. As noted previously, missing column variable values are denoted by delimiter = '|';  

 

Each observation or row contains variable values: acctno $ actdt deactdt deactreason $ goodcredit rateplan dealertype $ AGE Province $ sales;

 

I need to find

1. the number of accounts that were activated and deactivated?

2.any missing activated and deactivated dates?

3.when the earliest and latest activation and deactivation dates available?

4. the age and province distribution of active and deactivated customers?

 

I'm a novice programmer and I tried my best to come up with the codes to satisfy the questions. However, the codes don't produced the desired results. That's why I'm turning to the SAS community for help.

 

Please help fix my codes so they answer the above questions. 

 

With appreciation.

D

 

 

Reeza
Super User
You gave all the specs but I see nothing indicating which variable means what. Which variable is activation date? It may be clear to you and re-reading it this time it's likely actdt but that's a guess and I've learned well enough that guessing variable meaning based on names is not a good method.

You dumped your whole assignment here, I would recommend trying one question at a time and showing what you've tried and what you're getting versus what you need. If we look at your first question - # of accounts activated and deactivated. You're on the right approach with PROC FREQ and the missing format but it seems like you're throwing the kitchen sink at it.

You can specify the variables of interest in the TABLES statement you don't have to list them all.

_numeric_ and _character_ are wild card lists that list all numeric/character variables respectively. Make sure to apply the formats to the variables as well in the proc.

https://github.com/statgeek/SAS-Tutorials/blob/master/proc_format_example.sas

PROC MEANS tutorial
https://github.com/statgeek/SAS-Tutorials/blob/master/proc_means_basic.sas



If you're trying to learn SAS I highly recommend taking the first free SAS e-course. If you learn about PROC FREQ/MEANS/basic data step you should be able to answer all your questions.

Unfortunately that's all the time I have to spend on this tonight. Good Luck!
dulce
Calcite | Level 5

@Reeza 

 

I know how to read the raw data file and I'm a SAS beginners. You, on the other hand, a "Super User" who is clueless reading the dataset. I was hoping to get some assistance on this forum; unfortunately, I didn't receive any but left with disappointment and frustration.

 

 

My advice to you stay off of SAS Community. You're clueless. You're unhelpful. 

Reeza
Super User

@dulce wrote:

 

I know how to read the raw data file and I'm a SAS beginners. You, on the other hand, a "Super User" who is clueless reading the dataset. I was hoping to get some assistance on this forum; unfortunately, I didn't receive any but left with disappointment and frustration.

 

 

My advice to you stay off of SAS Community. You're clueless. You're unhelpful. 


You're not entitled to my time or energy. 

DetBrik
Fluorite | Level 6

The first problem in your code is that you are overwriting datasets all over the place.  

 

You create the "w" dataset with your raw data and then sort this dataset into the "w2" dataset.  Your next DATA step overwrites the raw dataset (w) with some data elements of a dataset called "wireless." 

 

Then your next DATA step overwrites the sorted "w2" dataset with some data elements from the unsorted "w" dataset.  This DATA statement contains another error.  You attempt to use the FIRST function on an unsorted variable.  In order to use "first.actdt" the "actdt" variable must be one of the variables on which the dataset is sorted.   Otherwise FIRST. and LAST. have no meaning.  

 

It is unclear from your message whether you are just trying to get counts of things or want to create a an analytic dataset that contain the variables necessary to answer your four questions.  If you just want counts, then the judicious use of PROC FREQ will suffice (and you will only need the first dataset you created). 

 

You can create several tables in a single execution of PROC FREQ..

 

PROC FREQ usually sorts the Frequency from lowest to highest, so running  PROC FREQ on your activation  and deactivation dates will show you in the output the lowest and highest values for each date and the number of observations where the dates are missing.  This will answer Questions 2 and 3.  

 

Since it appears that your account numbers are unique, those same PROC FREQ outputs will answer question 1.  The answer to question 4 will come from PROC FREQ outputs on the variables of interest.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1631 views
  • 0 likes
  • 3 in conversation