ODIExperts.com

The blog for Oracle Data Integrator ( ODI )

Get File Length and Header in Operator

We get so many source files and some time we have different files coming of different length and we would need to do a row count every time to see whether we have got the correct number of records in the target .

To solve this issue, I have written a small program that shows the File Length and Header or First Line of the file in the operator . Why First line so that I know whether the first line is an Header row or not .

JYTHON PROCEDURE CODE

# AUTHOR – KSHITIZ DEVENDRA
# LANG – JYTHON
# DESCRIPTION
# ——————————————————————————–
# THIS PROGRAM IS USED TO GET THE NUMBER OF RECORDS IN FILE AND
# ALSO DISPLAYS THE HEADER OF THE FILE
# THE CORRECT NO OF RECORDS ARE CALCULATED BY SUBTRACTING TOTAL LENGTH – HEADER
# ——————————————————————————–
filesrc = open(‘<— SPECIFY FILE PATH AND NAME HERE —-> ‘,’r’)
first=filesrc.readline()
lines = 0
while first:
#get the no of lines in the file
lines += 1
first=filesrc.readline()
#print lines
## THE ABOVE PART OF THE PROGRAM IS TO COUNT THE NUMBER OF LINES
## AND STORE IT INTO THE VARIABLE ‘ LINES ‘
def intWithCommas(x):
if type(x) not in [type(0), type(0L)]:
raise TypeError(“Parameter must be an integer.”)
if x < 0:
return ‘-‘ + intWithCommas(-x)
result = ”
while x >= 1000:
x, r = divmod(x, 1000)
result = “,%03d%s” % (r, result)
return “%d%s” % (x, result)
## THE ABOVE PROGRAM IS TO DISPLAY THE NUMBERS
## 1234567890 AS 1,234,567,890
s1=str(intWithCommas(lines))
s2=’ nn and the First Line of the File is —>> nn ‘
filesrc.seek(0)
s3=str(filesrc.readline())
final=s1 + s2 + s3
raise ‘ nn The Number of Lines in the File are —>> ‘ , final
## SINCE IN OPERATOR I CAN THROW ONLY SINGLE RAISE COMMAND I CONCATENATED ALL
## THE REQUIRED MESSAGE AND DISPLAYED IN THE OPERATOR AS ONE RAISE COMMAND

SAMPLE OUTPUT

image

The total execution time for 1.4 million records was 22 so i am expecting it to be fast

image

Please note i have attached the codes in two formats one for Procedure and another for LKM’s as i have done small changes so that when used in LKM it read the source file automatically , and also does the correct length calculation.

JYTHON CODE FOR LKM

# AUTHOR – KSHITIZ DEVENDRA
# LANG – JYTHON
# DESCRIPTION
# ——————————————————————————–
# THIS PROGRAM IS USED TO GET THE NUMBER OF RECORDS IN FILE AND
# ALSO DISPLAYS THE HEADER OF THE FILE
# THE CORRECT NO OF RECORDS ARE CALCULATED BY SUBTRACTING TOTAL LENGTH – HEADER
# ——————————————————————————–
filesrc = open(‘<%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%>’,’r’)
first=filesrc.readline()
lines = 0
while first:
#get the no of lines in the file
lines += 1
first=filesrc.readline()
#print lines
## THE ABOVE PART OF THE PROGRAM IS TO COUNT THE NUMBER OF LINES
## AND STORE IT INTO THE VARIABLE ‘ LINES ‘
def intWithCommas(x):
if type(x) not in [type(0), type(0L)]:
raise TypeError(“Parameter must be an integer.”)
if x < 0:
return ‘-‘ + intWithCommas(-x)
result = ”
while x >= 1000:
x, r = divmod(x, 1000)
result = “,%03d%s” % (r, result)
return “%d%s” % (x, result)
## THE ABOVE PROGRAM IS TO DISPLAY THE NUMBERS
## 1234567890 AS 1,234,567,890
s1=str(intWithCommas(lines-<%=odiRef.getSrcTablesList(“”, “[FILE_FIRST_ROW]”, “, “, “”)%>))
s2=’ nn and the First Line of the File is —>> nn ‘
filesrc.seek(0)
s3=str(filesrc.readline())
final=s1 + s2 + s3
raise ‘ nn The Number of Lines in the File <%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%> are —>> ‘ , final
## SINCE IN OPERATOR I CAN THROW ONLY SINGLE RAISE COMMAND I CONCATENATED ALL
## THE REQUIRED MESSAGE AND DISPLAYED IN THE OPERATOR AS ONE RAISE COMMAND

Comparing the above Procedure , LKM codes have small changes to get the File and the correct Length calculated smartly.

filesrc = open(‘<%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%>’,’r’)

This code will be the correct File name and would open it for reading .

s1=str(intWithCommas(lines-<%=odiRef.getSrcTablesList(“”, “[FILE_FIRST_ROW]”, “, “, “”)%>))

This code will subtract the Total lines – Heading(No of Lines) so that in operator it shows the correct value.

image

raise ‘ nn The Number of Lines in the File <%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%> are —>> ‘ , final

This code will display the File path and Name in the operator Message box.

image

As you can see it displayed the File Path and Name with Total Number of Lines , Comparing to the above procedure where it displayed 1,453,389 , here smartly it read from the Data store definition and found out that the Heading (no of lines ) is one and while displaying showing the correct output.

Please find the codes below in XML format and import in INSERT_UPDATE mode and also enable ” IGNORE ERRORS ” option as Jython raise command is used to throw warning message in the operator and I have used raise to display the output required.

To use in any LKM File to KM , just copy paste the code from -PROCEDURE FILE_LENGTH_HEADER TO BE USED IN LKM’s and enable the option -IGNORE ERRORS and move it to the first order so that as soon the load starts you would get the required details in the operator.

PROCEDURE FILE_LENGTH_HEADER

http://cid-d7195c8badc40d0b.skydrive.live.com/embedicon.aspx/odiexperts.com/TRT_PROC_FILE_LENGTH_HEADER.xml

PROCEDURE FILE_LENGTH_HEADER TO BE USED IN LKM’s
http://cid-d7195c8badc40d0b.skydrive.live.com/embedicon.aspx/odiexperts.com/TRT_LKM_FILE_LENGTH_HEADER.xml

17 Comments

Leave a Reply

Required fields are marked *.


This site uses Akismet to reduce spam. Learn how your comment data is processed.