when I am importing a large .csv file into SAS by using following codes,
data SASDATA.Applications ;
infile 'R:/Li/PATSTAT/Applications.csv' DLM = ',' DSD missover lrecl=32767 firstobs = 3 ;
input
appln_id :29.
appln_auth :$29.
appln_nr :$29.
appln_kind :$29.
appln_filing_date :YYMMDD10.
appln_filing_year
appln_nr_epodoc :$50.
appln_nr_original :$150.
ipr_type :$29.
internat_appln_id :29.
int_phase :$29.
reg_phase :$29.
nat_phase :$29.
earliest_filing_date :YYMMDD10.
earliest_filing_year
earliest_filing_id :29.
earliest_publn_date :YYMMDD10.
earliest_publn_year
earliest_pat_publn_id :29.
granted :29.
docdb_family_id :29.
inpadoc_family_id :29.
docdb_family_size :29.
nb_citing_docdb_fam :29.
nb_applicants :29.
nb_inventors :29.
;
format
appln_filing_date :YYMMDDd10.
appln_filing_year :YEAR10.
earliest_filing_date :YYMMDDd10.
earliest_filing_year :YEAR10.
earliest_publn_date :YYMMDDd10.
earliest_publn_year :YEAR10.
;
run ;
the log shows information like following:
NOTE: Invalid data for internat_appln_id in line 3457514 61-62.
Doesn't PATSTAT also provide JSON files? Maybe that may be easier to parse out and won't have this issue. I'm surprised it does.
Can you upload some records in a CSV file that we can work with.
I think there is no JOSN files. Besides, could you tell me how to split a csv file please? the all file is too big to upload.
If you have powershell (Windows will automatically) you can use this approach:
gc -path file_name.csv - head N > output.txt
1.Could you explain what is the meaning of this code please?
for example, should I use the following code if I want to extract volume from 3457514 to 3780578 ?
gc - R:/Li/PATSTAT/Applications.csv - head 3457514 to 3780578> output.txt
2. https://data.epo.org/access-control/welcome?lg=en this is the link to PATSTAT database. you can click the 'PATSTAT Online' on that page, and then you can register for a free trial of PATSTAT Online. Please note that PATSTAT Online free trials are valid for one month.
But the patstat manual says it can save as xml
i think it is occurred in some records.This attribute is not created by PATSTAT but collect form different patent authorities. that might be the reason why these mistakes happened.
@France wrote:
i think it is occurred in some records.This attribute is not created by PATSTAT but collect form different patent authorities. that might be the reason why these mistakes happened.
Is this one of their free files? If so, post the link then.
@Reeza wrote:
@France wrote:
i think it is occurred in some records.This attribute is not created by PATSTAT but collect form different patent authorities. that might be the reason why these mistakes happened.
Is this one of their free files? If so, post the link then.
If you PAID for this file then insist that they fix it. They can either use a delimiter that does not appear in any field. Or add quotes around values that contain the delimiter. They could even use the goofy syntax that some database support of adding a backslash in front off delimiters that appear in the data. At minimum they need to provide a file that can be parsed by a rule, which this file cannot.
You don't show any value with imbedded commas. If there are none you might test a line to see if the number of expected is commas is too large or too small suppose that your data should have 27 variables with exactly 26 commas separating the values and NO commas inside any of the text field:
input @;
if countc(_infile_,',') ne 26 then do;
put "WARNING: Unexpected number of variables on line " _n_;
input;
end;
else input
<your variable list>
;
But this won't work if any of the text fields can have a comma in side a column. And really only cleans up the log as you would have to examine that data file closely to fix the text in whatever manner is needed.
Didn't you ask this question (in different words) already?
Did you try adapting the solution from the other question to this data?
You really need to talk to the creators of the CSV file and ask that takes steps to insure they are creating valid CSV files. That is that any value that contains the delimiter is quoted.
Or at least use some method that creates files that CAN be interpreted.
Doesn't PATSTAT provide the SQL queries? With some small modifications those will run in SAS...
PATSTAT Online provide SQL queries. but I am using 2016 autumn edition while the PATSTAT online only provides 2017 spring edition and 2017 autumn edition. is that ok?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.