BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
fieldsa83
Quartz | Level 8

But the text file is flat format so I have to specify their position; they don't exist until that's done.

 

Edit: plus I like seeing all the available variables and their input position listed, otherwise I forget about them and have to search in all the documentation to find their details. 

 

Anyway, thanks for the help. I will continue to think about these options...

 

I feel like I could do the formats by making a list of all vars, and then havng it do a sort of look up on a list of formats and then apply them if there's a match. 

 

Tom
Super User Tom
Super User

Are you saying that the format of the file is DIFFERENT every month?  That AGE appears in some location that depends on whether you are reading the data for January instead of March?

Look at the example program I posted before and pretend that the input statement defines and reads 10,000 variables instead of three. If the KEEP statement still only lists two of them then the generated DATASET will only have those two variables. Converting the text strings into variables is trivial in terms of execution time. It is the time it takes to read the text file from the disk that takes time. It doesn't save any time to read just two variables intead of a hundred.

fieldsa83
Quartz | Level 8

Oh I see, yes I was thinking reading the full text file would be long and the keep would not be as efficient as only reading in specific parts. This is not the case?

 

And so does the keep apply to the formats. Would I just list all the formats and the keep would just apply to the ones rom the same &keeplist?

Tom
Super User Tom
Super User

You should write a program that reads all of the variables and attaches the appropriate labels and formats.  Do not bother with conditionally defining or reading the variables. To read any of the variables SAS will need to read the whole text file from the disk. Instead just concentrate on makign sure that only the variables you want are written to the output dataset.  That will save time and space since the output dataset will be much smaller if it only has a few variables.

 

Making it flexible for reading mutliple input files will take a little more work. You could structure it as a macro like you posted.

But note that SAS can happily read multiple text files in one step if you want. A FILENAME statement can be defined that points to multiple files.  Or the FILE statement can list multiple members in an aggregate fileref like in the example I posted before.  Or you could even drive it from a dataset by using the FILEVAR option on the INFILE statement.

 

You could probable have it create the formats in the WORK library also if you want to work that way. Then you don't need to worry about storing the format catalog and setting the FMTSEARCH option.  As long as you don't have gigabytes of format definitions there is really not much reason to NOT define them all.

 

Set it up in a way that you do not need to EDIT the file to select the variables / data that you want. That is just asking for accidents. Either have it as a macro in a separate file (or autocall location) and call the macro with the list of variables and input files.

Or just save it as a normal program file and set the macro variables and then %INCLUDE the file.

 

 

Tom
Super User Tom
Super User
proc contents data=have noprint out=contents; run;
proc format noprint lib=mylib cntlout=formats; run;
proc sql noprint;
 create table list as
 select distinct a.name,b.fmtname
  from contents a , formats b
  where upcase(a.name) = upcase(b.fmtname)
  order by 1
;
quit;
fieldsa83
Quartz | Level 8

Yes, this looks like what I am trying to implement!

 

Would there be a final step of applying the formats to the "have" data set? 

Tom
Super User Tom
Super User

Depends on how much you want to automate. You can just use a PUT statement to write the values to the log and copy and paste them. You can try using PROC SQL to stick it into a macro variable.

 

proc sql noprint;
select catx(' ',name,strip(fmtname)||'.')
  into :fmtlist separated by ' '
  from list
;
quit;
...
format &fmtlist ;
...
fieldsa83
Quartz | Level 8

Awesome, thanks so much. I think this will do the trick!

fieldsa83
Quartz | Level 8

One more question: if the labels all have _en at the end (sex_en). How could I fit this into the proc sqls?

Tom
Super User Tom
Super User
I assume what you mean is that the format name for the format that should be attached to the variable AGE is AGE_EN instead of just AGE.
So include that in the join condition that you use to see if there exists a format for the variables in your dataset.
where upcase(name)||'_EN' = upcase(fmtname)
fieldsa83
Quartz | Level 8

Do you think there's a way to identify whether the variable is character, and if so add a $ to the beginning of the format?

Kurt_Bremser
Super User

@fieldsa83 wrote:

Yes, this looks like what I am trying to implement!

 

Would there be a final step of applying the formats to the "have" data set? 


You apply the formats once when reading the data in. After that they stay attached to the variables.

 

Once you have a well-defined import process, all other steps become easy.

Kurt_Bremser
Super User

@fieldsa83 wrote:

I have something like that already, but it's the value label formatting that is proving to be the most tedious part...

 

Edit: oh I see, you mean reference a central database. What I have now references the text files. Perhaps it would be advantageous to just have 1 single database but I fear it will be too large, and adding new data each month could add more difficulties.

 

 


Then create a dataset for each month, with the month being part of the dataset name. When doing analysis over a series of months, create a simple datastep view (with keep list) over those months and run your analysis from that. The assignment of formats and labels is done once per month when the initial import is done.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 28 replies
  • 1672 views
  • 2 likes
  • 6 in conversation