We get so many source files and some time we have different files coming of different length and we would need to do a row count every time to see whether we have got the correct number of records in the target .
To solve this issue, I have written a small program that shows the File Length and Header or First Line of the file in the operator . Why First line so that I know whether the first line is an Header row or not .
JYTHON PROCEDURE CODE
# AUTHOR – KSHITIZ DEVENDRA
# LANG – JYTHON
# DESCRIPTION
# ——————————————————————————–
# THIS PROGRAM IS USED TO GET THE NUMBER OF RECORDS IN FILE AND
# ALSO DISPLAYS THE HEADER OF THE FILE
# THE CORRECT NO OF RECORDS ARE CALCULATED BY SUBTRACTING TOTAL LENGTH – HEADER
# ——————————————————————————–
filesrc = open(‘<— SPECIFY FILE PATH AND NAME HERE —-> ‘,’r’)
first=filesrc.readline()
lines = 0
while first:
#get the no of lines in the file
lines += 1
first=filesrc.readline()
#print lines
## THE ABOVE PART OF THE PROGRAM IS TO COUNT THE NUMBER OF LINES
## AND STORE IT INTO THE VARIABLE ‘ LINES ‘
def intWithCommas(x):
if type(x) not in [type(0), type(0L)]:
raise TypeError(“Parameter must be an integer.”)
if x < 0:
return ‘-‘ + intWithCommas(-x)
result = ”
while x >= 1000:
x, r = divmod(x, 1000)
result = “,%03d%s” % (r, result)
return “%d%s” % (x, result)
## THE ABOVE PROGRAM IS TO DISPLAY THE NUMBERS
## 1234567890 AS 1,234,567,890
s1=str(intWithCommas(lines))
s2=’ nn and the First Line of the File is —>> nn ‘
filesrc.seek(0)
s3=str(filesrc.readline())
final=s1 + s2 + s3
raise ‘ nn The Number of Lines in the File are —>> ‘ , final
## SINCE IN OPERATOR I CAN THROW ONLY SINGLE RAISE COMMAND I CONCATENATED ALL
## THE REQUIRED MESSAGE AND DISPLAYED IN THE OPERATOR AS ONE RAISE COMMAND
SAMPLE OUTPUT
The total execution time for 1.4 million records was 22 so i am expecting it to be fast
Please note i have attached the codes in two formats one for Procedure and another for LKM’s as i have done small changes so that when used in LKM it read the source file automatically , and also does the correct length calculation.
JYTHON CODE FOR LKM
# AUTHOR – KSHITIZ DEVENDRA
# LANG – JYTHON
# DESCRIPTION
# ——————————————————————————–
# THIS PROGRAM IS USED TO GET THE NUMBER OF RECORDS IN FILE AND
# ALSO DISPLAYS THE HEADER OF THE FILE
# THE CORRECT NO OF RECORDS ARE CALCULATED BY SUBTRACTING TOTAL LENGTH – HEADER
# ——————————————————————————–
filesrc = open(‘<%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%>’,’r’)
first=filesrc.readline()
lines = 0
while first:
#get the no of lines in the file
lines += 1
first=filesrc.readline()
#print lines
## THE ABOVE PART OF THE PROGRAM IS TO COUNT THE NUMBER OF LINES
## AND STORE IT INTO THE VARIABLE ‘ LINES ‘
def intWithCommas(x):
if type(x) not in [type(0), type(0L)]:
raise TypeError(“Parameter must be an integer.”)
if x < 0:
return ‘-‘ + intWithCommas(-x)
result = ”
while x >= 1000:
x, r = divmod(x, 1000)
result = “,%03d%s” % (r, result)
return “%d%s” % (x, result)
## THE ABOVE PROGRAM IS TO DISPLAY THE NUMBERS
## 1234567890 AS 1,234,567,890
s1=str(intWithCommas(lines-<%=odiRef.getSrcTablesList(“”, “[FILE_FIRST_ROW]”, “, “, “”)%>))
s2=’ nn and the First Line of the File is —>> nn ‘
filesrc.seek(0)
s3=str(filesrc.readline())
final=s1 + s2 + s3
raise ‘ nn The Number of Lines in the File <%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%> are —>> ‘ , final
## SINCE IN OPERATOR I CAN THROW ONLY SINGLE RAISE COMMAND I CONCATENATED ALL
## THE REQUIRED MESSAGE AND DISPLAYED IN THE OPERATOR AS ONE RAISE COMMAND
Comparing the above Procedure , LKM codes have small changes to get the File and the correct Length calculated smartly.
filesrc = open(‘<%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%>’,’r’)
This code will be the correct File name and would open it for reading .
s1=str(intWithCommas(lines-<%=odiRef.getSrcTablesList(“”, “[FILE_FIRST_ROW]”, “, “, “”)%>))
This code will subtract the Total lines – Heading(No of Lines) so that in operator it shows the correct value.
raise ‘ nn The Number of Lines in the File <%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%> are —>> ‘ , final
This code will display the File path and Name in the operator Message box.
As you can see it displayed the File Path and Name with Total Number of Lines , Comparing to the above procedure where it displayed 1,453,389 , here smartly it read from the Data store definition and found out that the Heading (no of lines ) is one and while displaying showing the correct output.
Please find the codes below in XML format and import in INSERT_UPDATE mode and also enable ” IGNORE ERRORS ” option as Jython raise command is used to throw warning message in the operator and I have used raise to display the output required.
To use in any LKM File to KM , just copy paste the code from -PROCEDURE FILE_LENGTH_HEADER TO BE USED IN LKM’s and enable the option -IGNORE ERRORS and move it to the first order so that as soon the load starts you would get the required details in the operator.
PROCEDURE FILE_LENGTH_HEADER
PROCEDURE FILE_LENGTH_HEADER TO BE USED IN LKM’s
http://cid-d7195c8badc40d0b.skydrive.live.com/embedicon.aspx/odiexperts.com/TRT_LKM_FILE_LENGTH_HEADER.xml
June 26, 2015 at 1:56 PM
I want file header in ODI variable , is there any way..
June 26, 2015 at 2:12 PM
Hi,
use this technique but get just the first line…
http://odiexperts.com/how-to-refresh-odi-variables-from-file-part-2-getting-all-lines-once-at-time/
June 29, 2015 at 6:00 AM
Thanks its working
June 29, 2015 at 11:11 AM
Hi Cezar,
I have tried it but its not working for Variables, My requirement is to take only File Header in ODI Variable.
select C1 C1_C1
from TABLE
/*$$SNPS_START_KEYSNP$CRDWG_TABLESNP$CRTABLE_NAME=BNM_FILE_HEADERSNP$CRLOAD_FILE=#GLOBAL.GV_DIR_FULL_PATH/#GLOBAL.GV_FILE_NAME SNP$CRFILE_FORMAT=DSNP$CRFILE_SEP_FIELD=0x007eSNP$CRFILE_SEP_LINE=0x000ASNP$CRFILE_FIRST_ROW= #GLOBAL.GV_FILE_ROW_NUMBER SNP$CRFILE_ENC_FIELD=”SNP$CRFILE_DEC_SEP=.SNP$CRSNP$CRDWG_COLSNP$CRCOL_NAME=C1SNP$CRTYPE_NAME=STRINGSNP$CRORDER=1SNP$CRLENGTH=4000SNP$CRPRECISION=4000SNP$CRDEC_SEP=.SNP$CR$$SNPS_END_KEY*/
July 2, 2011 at 12:38 AM
Hi,
I want to get the row count of the file before processing. I changed the above code like below
filesrc = open(‘#ERP_FEED_FILE_TO_DSS_STAGING_TABLES.vd_mstr_filename’,’r’)
first=filesrc.readline()
lines = 0
while first:
lines+=1
first=filesrc.readline()
s1=str(lines)
s2= ‘ in the file ——‘
s3=’#ERP_FEED_FILE_TO_DSS_STAGING_TABLES.vd_mstr_filename’
final = s1 + s2 + s3
raise ‘ nn The Number of Lines in the File are —‘ , final
where I am passing the filename from a variable. I have about 13 files for which count need to be identified. For some files it is working fine & I am getting the count for some files it keeps on failing. After the row count I am passing the same variable name to my file datastore & the interface is getting executed, but I am getting the below error.
org.apache.bsf.BSFException: exception from Jython:
Traceback (innermost last):
File “”, line 1, in ?
IOError: File not found – H:Data_FeedsDaily_LoadsWuxid_mstr.Wuxi.20110325.txt (The filename, directory name, or volume label syntax is incorrect)
where the filename is changed. I don’t know how the file name is getting changed in this step & the next step is working fine. Please help & suggest.
July 2, 2011 at 7:53 AM
Check how the variable is reading the file .The issue is some where the variable where the file name is reading . I believe once you figure out that then it will work .
March 29, 2011 at 10:48 PM
Still we hav not got the approximate no. of lines in the file. We are doing testing with a sample file containing only 50 lines.
If the file size is more whether the code will have any performance issue??
March 30, 2011 at 3:45 AM
should not , thats the reason i have used while loop so it can handle millions of rows as you can see with same data i have used around 100,000 records and still it was fine .
March 28, 2011 at 10:32 PM
hi,
after getting the footer value in the variable, i want to remove it from the source file. How it can be done??
March 29, 2011 at 7:58 AM
how big is the file ?
We need to rewrite the data into the file for this. i have written this code for smaller file . please change the dynamic_struc.txt with your file name .
read_file=open('c:/Dynamic_struc.txt','r')
lst=read_file.readlines()
footer= lst[len(lst)-1]
read_file.close()
read_file=open('c:/Dynamic_struc.txt','w')
read_file.writelines(lst[0:-1])
read_file.close()
/code>
March 25, 2011 at 7:40 AM
Thanks for the update.
Actually we have a source file with header & footer. The footer contains value which need to be insert into the target file footer.
I want the source file footer value to get inserted into a table columns or to pass the footer value into a ODI variable which can be user later.Is it possible??
March 25, 2011 at 10:53 AM
get the codes from http://forums.oracle.com/forums/messageview.jspa?messageID=9469937
March 25, 2011 at 6:58 AM
thanks for the code.
i created the procedure with the code & it got executed but, i was not able to get the values.
as I m new to ODI can you tell me the step by step configuration to get the footer value.
March 25, 2011 at 7:17 AM
if you are looking for diplaying the result in operator , then you can add
raise footer
read_file.close()
and check ignore errors option in the procedure. raise will throw the result into the operator . If you want to fetch the footer for something , can you please be elaborate what you are planning to do .
March 25, 2011 at 5:18 AM
I want to get the footer value. Is there any way to get the footer value from the file. If yes can you please explain the steps how to do it.
March 25, 2011 at 6:31 AM
read_file=open('file','r')
lst=read_file.readlines()
footer=lst[len(lst)-1]
read_file.close()
December 22, 2009 at 10:02 AM
Nice Add Kshitiz.. Thanks for sharing!