BookmarkSubscribeRSS Feed
dmarques1998
Obsidian | Level 7

Hi, 

 

Is it possible to count sucessive number os missing values, i.e in the 3rd line count the missing values between "E" from "x" to "E" from "x+1". Ideally, the ideia is to count every successive missing values, and if they are interupt by a name, i.e E, Ambas or D, start a new count. I have already the total of missing values as you can see in the attached photo, However I cannot find a specific code or Macro to use in that situation. 

 

14 REPLIES 14
Tom
Super User Tom
Super User

That is a beautiful picture of your baby desktop.  But can you share a description of your data and some actual data.  To share data just post a simple SAS data step that reads the data from in-line text (aka CARDS or DATALINES).  That way we can just copy and paste the text from your posting into our local SAS session and have example data to use to propose solutions.

dmarques1998
Obsidian | Level 7
Sorry, I'm new here. How can I copy the Data and put here?
Tom
Super User Tom
Super User

@dmarques1998 wrote:
Sorry, I'm new here. How can I copy the Data and put here?

There are some posts for how to do this.  I created this macro tool you could try and use.  

If you have a large dataset a simple data step can generate some text to use.  Take advantage of the OBS= dataset option to limit the number of observations.

data _null_;
  file log dsd ;
  set have (obs=10);
  put (_all_) (+0);
run;

Then just copy the text from the SAS log and you have the data for the first 10 observations in your existing HAVE dataset.

 

But you can just make up some simple data that demonstrates the issue. In that case why not jsut type it into the program editor.  

data example;
  input id var1 var2 var3;
cards;
1 100 . 200
2 . . 300
;

A real advantage doing it this way is you can create some example data that clearly show the types of issues that are making the problem hard for you to solve.  Much easier than trying to identify observations of the real data that have those situations represented.

 

Sometimes just taking the time to come up with good test data will help you clarify in your mind what the issue is and how to solve it.

dmarques1998
Obsidian | Level 7

As a hypothetical situation :

data example;
  input id var1 var2 var3 var4 var5 var6 var7 var8 var9 var10;
cards;
1 100 . 200 . . 300 . . . .
2 . . 300 . . . . 200 . . 
3 . . . . . . . . 100 . 
4 100 . . . . . . . 100 100
5 100 . . . 200 . . 200 . 
;

And, as a output I want something like that: 

var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 1stSeq LastSeq

1 100 . 200 . . 300 . . . .  0 4
2 . . 300 . . . . 200 . .    2 2
3 . . . . . . . . 100 .      8 1
4 100 . . . . . . . 100 100  0 0
5 100 . . . 200 . . 200 .    0 1

 

 

 

 

 

 

 

 

 

 

Ksharp
Super User
data example;
  input id var1 var2 var3 var4 var5 var6 var7 var8 var9 var10;
cards;
1 100 . 200 . . 300 . . . .
2 . . 300 . . . . 200 . . 
3 . . . . . . . . 100 . 
4 100 . . . . . . . 100 100
5 100 . . . 200 . . 200 . 
;

data want;
 set example;
 all=cats(of var:);
pid=prxparse('/\.+/');
s=1;e=length(all);
call prxnext(pid,s,e,all,p,l);
do i=1 by 1 while(p>0);
  if i=1 then FirstSeq=l;
  call prxnext(pid,s,e,all,p,l);
  if l ne 0 then LastSeq=l;
end;
drop all i s e p l pid;
run;
Kurt_Bremser
Super User

You have data (dates) in structure (variable names), which is always a bad idea.

Transpose your dataset to a long layout, and then you can use BY processing and a RETAINed variable:

data wide;
infile datalines dsd;
input id $ (fonte201912 fonte202001 fonte202002 fonte202003) ($);
datalines;
1,x,,x,x
2,x,x,,x
3,,,x,x
;

proc transpose data=wide out=long (rename=(col1=fonte));
by id;
var fonte:;
run;

data long2;
set long;
period = input(substr(_name_,6),yymmn6.);
format period yymmn6.;
drop _name_;
run;

data want;
set long2;
by id;
if first.id then counter = 0;
if fonte > ""
then counter = 0;
else counter + 1;
run;

Dataset LONG2 is how your datasets should be organized: long layout, date values stored as proper SAS data values with a display format to your liking.

dmarques1998
Obsidian | Level 7
The problem is that I have a long data, almost 1.3M of rows. It is more convinient to keep the data like that. Is it possible?
ballardw
Super User

Missing detail: When the missing are interrupted you now have two or more groups of missing values. So you have two or more counts of missing values possible to report. What is the actual rule for which number to keep? The largest missing sequence? The first? The last sequence? Something else?

Consider your row 10. There are three series of missing values.

 

When you need to process variables in a single observation in a specific order then an ARRAY is almost always involved in the solution.

 

Dummy code because I am not going to attempt to create data from a picture.

data want;
   set have;
   array v (*) <your list of variables to process>;
  do i= 1 to dim (v);
     if missing(v[i]) then misscount= sum(misscount,1);
     /* logic would go here about how to handle the 
        interruptions, and possibly before the IF*/
  end;
run;
   
dmarques1998
Obsidian | Level 7
For now, I want only the first and the last sequence, however to future's work I would like to understand how to do for all the possible sequences in a row.
ballardw
Super User

@dmarques1998 wrote:
For now, I want only the first and the last sequence, however to future's work I would like to understand how to do for all the possible sequences in a row.

If you have N variables to test then you have to consider up to N/2 intervals (separate variables to record the count), or if the data has an odd N number of variable N/2+1.

When you say "first" and "last" that means we need to add at least one temporary variable indicate where the current sequence is first or not.

Which may mean that @Kurt_Bremser's base approach is the most robust as you can set a "missing sequence" counter variable as well as the actual count.

There are lots of processes that transpose data to one shape for a process to be robust or easier code and then shape back into a needed format (if actually needed).

 

Best would be to manually process a smaller number of variables, 6 or 7 maybe, with different patterns of missing values, maybe 10 or 15 rows and show the expected output for those.

Hint: putting multiple values into a single variable is 1) much more work to do and 2) extremely hard to work with after it is done.

 

dmarques1998
Obsidian | Level 7

dmarques1998_1-1618572562640.jpeg

 

Hi @Kurt_Bremser @ballardw , have upload this data in excel as an example. In that specific case how can I count sucessive missing values in SAS, in the following names: Peter, Ann, Sarah and Bianca, which are the first and de last sets of missing values

 

 

Thanks for your help! 

Kurt_Bremser
Super User

Since you seem to have an issue with Excel, you should ask your questions on a forum for that.

If you have an issue with SAS data, post it as such (data step with datalines). I will positively NOT start typing values off a picture, I've got better things to do.

dmarques1998
Obsidian | Level 7
 Data example;
input Name	N01/2020	N02/2020	N03/2020	N04/2020	N05/2020	N06/2020	N07/2020	N08/2020	N09/2020	N10/2020	N11/2020	        N12/2020;
          
John	E	D	A	.	.	.	.	.	A	A	A	E
Peter	E	E	D	A	E	D	A	.	.	.	.	.
Ann		.	.	.	.	A	D	E	A	D	E	D   .
Steve	E	.	A	.	E	.	A	.	E	.	A	.
Frank	A	E	.	.	A	E	.	.	A	E	.	.
Joanne	E	D	A	.	.	.	.	.	A	A	A	E
Sarah	E	E	D	A	E	D	A	.	.	.	.	.
Bianca	.	.	.	.	.	A	D	E	A	D	E	D
Theresa	E	.	A	.	E	.	A	.	E	.	A	.
Beatric	A	E	.	.	A	E	.	.	A	E	.	.

Sorry for my mistake @Kurt_Bremser, here is the dataset. As I said, I want to count the successsive missing values, for the following names: Peter, Ann, Sarah and Bianca which are the first and de last sets of missing values. can you help me? 

 

thank you

ballardw
Super User

@dmarques1998 wrote:
 Data example;
input Name	N01/2020	N02/2020	N03/2020	N04/2020	N05/2020	N06/2020	N07/2020	N08/2020	N09/2020	N10/2020	N11/2020	        N12/2020;
          
John	E	D	A	.	.	.	.	.	A	A	A	E
Peter	E	E	D	A	E	D	A	.	.	.	.	.
Ann		.	.	.	.	A	D	E	A	D	E	D   .
Steve	E	.	A	.	E	.	A	.	E	.	A	.
Frank	A	E	.	.	A	E	.	.	A	E	.	.
Joanne	E	D	A	.	.	.	.	.	A	A	A	E
Sarah	E	E	D	A	E	D	A	.	.	.	.	.
Bianca	.	.	.	.	.	A	D	E	A	D	E	D
Theresa	E	.	A	.	E	.	A	.	E	.	A	.
Beatric	A	E	.	.	A	E	.	.	A	E	.	.

Sorry for my mistake @Kurt_Bremser, here is the dataset. As I said, I want to count the successsive missing values, for the following names: Peter, Ann, Sarah and Bianca which are the first and de last sets of missing values. can you help me? 

 

thank you


That data step doesn't run. / is not an acceptable character in variable names by default.

No Datalines; or Cards; or any of the acceptable variants to  tell SAS where the data begins.

And in general having data like dates in variable names is adding at least one additional headache to the problem.

Plus how to show the multiple interruptions is still not actually addressed. What would the output look like.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1091 views
  • 15 likes
  • 5 in conversation