ODIExperts.com

The blog for Oracle Data Integrator ( ODI )

Get File Length and Header in Operator

We get so many source files and some time we have different files coming of different length and we would need to do a row count every time to see whether we have got the correct number of records in the target .

To solve this issue, I have written a small program that shows the File Length and Header or First Line of the file in the operator . Why First line so that I know whether the first line is an Header row or not .

JYTHON PROCEDURE CODE

# AUTHOR – KSHITIZ DEVENDRA
# LANG – JYTHON
# DESCRIPTION
# ——————————————————————————–
# THIS PROGRAM IS USED TO GET THE NUMBER OF RECORDS IN FILE AND
# ALSO DISPLAYS THE HEADER OF THE FILE
# THE CORRECT NO OF RECORDS ARE CALCULATED BY SUBTRACTING TOTAL LENGTH – HEADER
# ——————————————————————————–
filesrc = open(‘<— SPECIFY FILE PATH AND NAME HERE —-> ‘,’r’)
first=filesrc.readline()
lines = 0
while first:
#get the no of lines in the file
lines += 1
first=filesrc.readline()
#print lines
## THE ABOVE PART OF THE PROGRAM IS TO COUNT THE NUMBER OF LINES
## AND STORE IT INTO THE VARIABLE ‘ LINES ‘
def intWithCommas(x):
if type(x) not in [type(0), type(0L)]:
raise TypeError(“Parameter must be an integer.”)
if x < 0:
return ‘-‘ + intWithCommas(-x)
result = ”
while x >= 1000:
x, r = divmod(x, 1000)
result = “,%03d%s” % (r, result)
return “%d%s” % (x, result)
## THE ABOVE PROGRAM IS TO DISPLAY THE NUMBERS
## 1234567890 AS 1,234,567,890
s1=str(intWithCommas(lines))
s2=’ nn and the First Line of the File is —>> nn ‘
filesrc.seek(0)
s3=str(filesrc.readline())
final=s1 + s2 + s3
raise ‘ nn The Number of Lines in the File are —>> ‘ , final
## SINCE IN OPERATOR I CAN THROW ONLY SINGLE RAISE COMMAND I CONCATENATED ALL
## THE REQUIRED MESSAGE AND DISPLAYED IN THE OPERATOR AS ONE RAISE COMMAND

SAMPLE OUTPUT

image

The total execution time for 1.4 million records was 22 so i am expecting it to be fast

image

Please note i have attached the codes in two formats one for Procedure and another for LKM’s as i have done small changes so that when used in LKM it read the source file automatically , and also does the correct length calculation.

JYTHON CODE FOR LKM

# AUTHOR – KSHITIZ DEVENDRA
# LANG – JYTHON
# DESCRIPTION
# ——————————————————————————–
# THIS PROGRAM IS USED TO GET THE NUMBER OF RECORDS IN FILE AND
# ALSO DISPLAYS THE HEADER OF THE FILE
# THE CORRECT NO OF RECORDS ARE CALCULATED BY SUBTRACTING TOTAL LENGTH – HEADER
# ——————————————————————————–
filesrc = open(‘<%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%>’,’r’)
first=filesrc.readline()
lines = 0
while first:
#get the no of lines in the file
lines += 1
first=filesrc.readline()
#print lines
## THE ABOVE PART OF THE PROGRAM IS TO COUNT THE NUMBER OF LINES
## AND STORE IT INTO THE VARIABLE ‘ LINES ‘
def intWithCommas(x):
if type(x) not in [type(0), type(0L)]:
raise TypeError(“Parameter must be an integer.”)
if x < 0:
return ‘-‘ + intWithCommas(-x)
result = ”
while x >= 1000:
x, r = divmod(x, 1000)
result = “,%03d%s” % (r, result)
return “%d%s” % (x, result)
## THE ABOVE PROGRAM IS TO DISPLAY THE NUMBERS
## 1234567890 AS 1,234,567,890
s1=str(intWithCommas(lines-<%=odiRef.getSrcTablesList(“”, “[FILE_FIRST_ROW]”, “, “, “”)%>))
s2=’ nn and the First Line of the File is —>> nn ‘
filesrc.seek(0)
s3=str(filesrc.readline())
final=s1 + s2 + s3
raise ‘ nn The Number of Lines in the File <%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%> are —>> ‘ , final
## SINCE IN OPERATOR I CAN THROW ONLY SINGLE RAISE COMMAND I CONCATENATED ALL
## THE REQUIRED MESSAGE AND DISPLAYED IN THE OPERATOR AS ONE RAISE COMMAND

Comparing the above Procedure , LKM codes have small changes to get the File and the correct Length calculated smartly.

filesrc = open(‘<%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%>’,’r’)

This code will be the correct File name and would open it for reading .

s1=str(intWithCommas(lines-<%=odiRef.getSrcTablesList(“”, “[FILE_FIRST_ROW]”, “, “, “”)%>))

This code will subtract the Total lines – Heading(No of Lines) so that in operator it shows the correct value.

image

raise ‘ nn The Number of Lines in the File <%=odiRef.getSrcTablesList(“”, “[SCHEMA]/[RES_NAME]”, “”, “”)%> are —>> ‘ , final

This code will display the File path and Name in the operator Message box.

image

As you can see it displayed the File Path and Name with Total Number of Lines , Comparing to the above procedure where it displayed 1,453,389 , here smartly it read from the Data store definition and found out that the Heading (no of lines ) is one and while displaying showing the correct output.

Please find the codes below in XML format and import in INSERT_UPDATE mode and also enable ” IGNORE ERRORS ” option as Jython raise command is used to throw warning message in the operator and I have used raise to display the output required.

To use in any LKM File to KM , just copy paste the code from -PROCEDURE FILE_LENGTH_HEADER TO BE USED IN LKM’s and enable the option -IGNORE ERRORS and move it to the first order so that as soon the load starts you would get the required details in the operator.

PROCEDURE FILE_LENGTH_HEADER

http://cid-d7195c8badc40d0b.skydrive.live.com/embedicon.aspx/odiexperts.com/TRT_PROC_FILE_LENGTH_HEADER.xml

PROCEDURE FILE_LENGTH_HEADER TO BE USED IN LKM’s
http://cid-d7195c8badc40d0b.skydrive.live.com/embedicon.aspx/odiexperts.com/TRT_LKM_FILE_LENGTH_HEADER.xml

17 Comments

  1. I want file header in ODI variable , is there any way..

  2. Hi,

    I want to get the row count of the file before processing. I changed the above code like below

    filesrc = open(‘#ERP_FEED_FILE_TO_DSS_STAGING_TABLES.vd_mstr_filename’,’r’)
    first=filesrc.readline()
    lines = 0

    while first:
    lines+=1
    first=filesrc.readline()

    s1=str(lines)
    s2= ‘ in the file ——‘
    s3=’#ERP_FEED_FILE_TO_DSS_STAGING_TABLES.vd_mstr_filename’
    final = s1 + s2 + s3
    raise ‘ nn The Number of Lines in the File are —‘ , final

    where I am passing the filename from a variable. I have about 13 files for which count need to be identified. For some files it is working fine & I am getting the count for some files it keeps on failing. After the row count I am passing the same variable name to my file datastore & the interface is getting executed, but I am getting the below error.

    org.apache.bsf.BSFException: exception from Jython:
    Traceback (innermost last):
    File “”, line 1, in ?
    IOError: File not found – H:Data_FeedsDaily_LoadsWuxi d_mstr.Wuxi.20110325.txt (The filename, directory name, or volume label syntax is incorrect)

    where the filename is changed. I don’t know how the file name is getting changed in this step & the next step is working fine. Please help & suggest.

  3. Still we hav not got the approximate no. of lines in the file. We are doing testing with a sample file containing only 50 lines.

    If the file size is more whether the code will have any performance issue??

  4. hi,
    after getting the footer value in the variable, i want to remove it from the source file. How it can be done??

    • how big is the file ?
      We need to rewrite the data into the file for this. i have written this code for smaller file . please change the dynamic_struc.txt with your file name .

      read_file=open('c:/Dynamic_struc.txt','r')
      lst=read_file.readlines()
      footer= lst[len(lst)-1]
      read_file.close()
      read_file=open('c:/Dynamic_struc.txt','w')
      read_file.writelines(lst[0:-1])
      read_file.close()
      /code>

  5. Thanks for the update.
    Actually we have a source file with header & footer. The footer contains value which need to be insert into the target file footer.
    I want the source file footer value to get inserted into a table columns or to pass the footer value into a ODI variable which can be user later.Is it possible??

  6. thanks for the code.
    i created the procedure with the code & it got executed but, i was not able to get the values.
    as I m new to ODI can you tell me the step by step configuration to get the footer value.

    • if you are looking for diplaying the result in operator , then you can add

      raise footer
      read_file.close()

      and check ignore errors option in the procedure. raise will throw the result into the operator . If you want to fetch the footer for something , can you please be elaborate what you are planning to do .

  7. I want to get the footer value. Is there any way to get the footer value from the file. If yes can you please explain the steps how to do it.

  8. Nice Add Kshitiz.. Thanks for sharing!

Leave a Reply

Required fields are marked *.