PostgreSQL :- Reading Files Through Large Objects

Background

Discovered someone has a Sacred Text available on the web.

It is an NIV version of the bible.

It is availed as an XML file.

 

Load Command

Table Definition


create schema if not exists bible
;

/*
    drop table if exists bible.nivtext;
*/
create table if not exists bible.nivtext
(

     contentText varchar null
   , contentXML  xml     null

)
;

Sample

Import into Text Column

SQL


truncate table bible.nivtext
;

-- Import into Text Column
COPY bible.nivtext
(
	  contentText
)

FROM 'C:\temp\bible\textfile\niv\niv.xml'

;

Output

bible.nivtext.copy.text.01.2019081.0232PM.

Explanation

  1. The XML file is read in
    • Broken into separate records

Import into XML Column

SQL


truncate table bible.nivtext
;

-- Import into XML Column
COPY bible.nivtext
(
	  contentXML
)

FROM 'C:\temp\bible\textfile\niv\niv.xml'

;

Output

Output – Image

bible.nivtext.copy.xml.01.2019081.0239PM

Output – Text

SQL Error [2200N]: ERROR: invalid XML content
  Detail: line 1: Premature end of data in tag bible line 1

       ^
  Where: COPY nivtext, line 2, column contentxml: ""
  ERROR: invalid XML content
  Detail: line 1: Premature end of data in tag bible line 1

       ^
  Where: COPY nivtext, line 2, column contentxml: ""
  ERROR: invalid XML content
  Detail: line 1: Premature end of data in tag bible line 1

       ^
  Where: COPY nivtext, line 2, column contentxml: ""

Explanation

  1. The Copy command is not able  to directly import into an XML column

Remediation

Large Object

SQL


/*
 * SQL to read XML from file into PostgreSQL database
 * https://dba.stackexchange.com/questions/8172/sql-to-read-xml-from-file-into-postgresql-database
 *
 */
create schema if not exists stackOverflow;

create or replace function stackOverflow.bytea_import
(
      p_path    text
    , p_result  out bytea
)
language plpgsql
as
$$
declare
  l_oid oid;
  r record;
begin

    -- initialize byte array
    p_result := '';

    -- Import large object into memory
    -- Memory handle to use is l_oid
    select pg_catalog.lo_import(p_path)
    into   l_oid
    ;

    -- Transverse Large Object table
    -- Use handle saved into variable l_oid
    -- Sequencing on Page Number
    -- Retrieve column data
    for r in
    (
        select tblLO.data 

        from   pg_catalog.pg_largeobject tblLO

        where  tblLO.loid = l_oid 

        order by
            pageno
    )
    loop

        -- concatenate read data into byte array
        p_result = p_result || r.data;

    end loop
    ;

    -- Delete large object from memory
    perform pg_catalog.lo_unlink(l_oid);

end;

$$;

;

grant execute on function stackoverflow.bytea_import to public
; 

Invoke


truncate table bible.nivtext
;

insert into bible.nivtext
(
    contentXML
)
select
        convert_from
        (
              p_result
            , 'UTF-8'
        )::xml
from   stackoverflow.bytea_import
        (
            'C:\sample\sample.db\bible\textfile\niv\niv.xml'
        )
;

Review Data

SQL

select *
from   bible.nivtext
limit 10
;

Output

readData.01.20190819.0303PM

Explanation

Data read has a single XML blob.

Dedicated

Dedicated to Jack Douglas via Stack Overflow.

SQL to read XML from file into PostgreSQL database
Asked By :- bignose
Answered By :- Jack Douglas
Link

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s