PostgreSQL :- XMLTABLE Function

Background

A far more easier choice to parse XML data is to use the XMLTable function.

Lineage

Here are past postings on XML parsing options in PostgreSQL :-

  1. XPath

 

Data

Here is a sample of the XML file.

Image

niv.xml.01.20190819.0527PM

XMLTABLE

Query

Outline

  1. XMLTable
    • Path
      • /bible/b/c/v
    • Passing
      • Column Name
        • contextxml
    • Columns
      • book
        • path ‘../../@n’
      • chapter
        • path ‘../@n’
      • verse
        • path ‘@n’
      • content
        • path ‘.’

SQL


select
        tblXML.*

from   bible.nivtext tbl

        , xmltable
            (
                -- path
                '/bible/b/c/v'

                -- column
                passing tbl.contentxml

                -- column verse Number
                columns 

                     -- book Name
                     book varchar(60)
                        path '../../@n'

                     -- chapter Number
                   , chapter int
                        path '../@n'

                    -- verse Number
                   , verse int
                        path '@n'

                    -- verse
                    , "content" varchar(4000)
                        path '.'

            )
            tblXML

limit 10 

Output

sql.xmltable.01.20190819.0557PM

Summary

XMLTable is quite promising.

One has to be on version 10 and above of PostgreSQL.

The best thing is that it is ANSI Complaint and it is very well supported across major database platforms such as IBM DB/2 and Oracle.

It is very unfortunate that SQL Server does not support it.

PostgreSQL :- XPATH Function

Background

We have our XML file imported into the database.

Objective

Let us start to make sense of of it.

XPATH

We can use the XPATH function to query it.

Data

Here is what our XML looks like.

Image

niv.xml.01.20190819.0527PM

Query

Outline

  1. Common Table Expression ( CTE )
    • To compartmentalize things a bit we use a CTE
    • XPATH
      • We issue XPATH using (/v) as the query
    • UNNEST
      • UNNEST converts the array returned by XPATH to a result set
  2. Query
    • To get the verse number we pass the query string //v/@n
    • To get the actual verse we pass along //v/text()

SQL


; with xmlNodeVerse
(
    node
)
as
(

    select 

            unnest
            (
                xpath
                (
                      '//v'
                    , contentxml
                )
            ) as verse

    from   bible.nivtext tblB

)
select
          tblXML
            as xmlFragment

        , (
            xpath
                (
                      '//v/@n'
                    , tblXML.node
                )
          )[1]::text
            as verseNumber

        , (
            xpath
            (
                  '//v/text()'
                , tblXML.node
            )
          )[1]::text
            as verse          

from   xmlNodeVerse tblXML

limit 20
; 

PostgreSQL :- Reading Files Through Large Objects

Background

Discovered someone has a Sacred Text available on the web.

It is an NIV version of the bible.

Load Command

Table Definition


create schema if not exists bible
;

/*
    drop table if exists bible.nivtext;
*/
create table if not exists bible.nivtext
(

     contentText varchar null
   , contentXML  xml     null

)
;

Sample

Import into Text Column

SQL


truncate table bible.nivtext
;

-- Import into Text Column
COPY bible.nivtext
(
	  contentText
)

FROM 'C:\temp\bible\textfile\niv\niv.xml'

;

Output

bible.nivtext.copy.text.01.2019081.0232PM.

Explanation

  1. The XML file is read in
    • Broken into separate records

Import into XML Column

SQL


truncate table bible.nivtext
;

-- Import into XML Column
COPY bible.nivtext
(
	  contentXML
)

FROM 'C:\temp\bible\textfile\niv\niv.xml'

;

Output

Output – Image

bible.nivtext.copy.xml.01.2019081.0239PM

Output – Text

SQL Error [2200N]: ERROR: invalid XML content
  Detail: line 1: Premature end of data in tag bible line 1

       ^
  Where: COPY nivtext, line 2, column contentxml: ""
  ERROR: invalid XML content
  Detail: line 1: Premature end of data in tag bible line 1

       ^
  Where: COPY nivtext, line 2, column contentxml: ""
  ERROR: invalid XML content
  Detail: line 1: Premature end of data in tag bible line 1

       ^
  Where: COPY nivtext, line 2, column contentxml: ""

Explanation

  1. The Copy command is not able  to directly import into an XML column

Remediation

Large Object

SQL


/*
 * SQL to read XML from file into PostgreSQL database
 * https://dba.stackexchange.com/questions/8172/sql-to-read-xml-from-file-into-postgresql-database
 *
 */
create schema if not exists stackOverflow;

create or replace function stackOverflow.bytea_import
(
      p_path    text
    , p_result  out bytea
)
language plpgsql
as
$$
declare
  l_oid oid;
  r record;
begin

    -- initialize byte array
    p_result := '';

    -- Import large object into memory
    -- Memory handle to use is l_oid
    select pg_catalog.lo_import(p_path)
    into   l_oid
    ;

    -- Transverse Large Object table
    -- Use handle saved into variable l_oid
    -- Sequencing on Page Number
    -- Retrieve column data
    for r in
    (
        select tblLO.data 

        from   pg_catalog.pg_largeobject tblLO

        where  tblLO.loid = l_oid 

        order by
            pageno
    )
    loop

        -- concatenate read data into byte array
        p_result = p_result || r.data;

    end loop
    ;

    -- Delete large object from memory
    perform pg_catalog.lo_unlink(l_oid);

end;

$$;

;

grant execute on function stackoverflow.bytea_import to public
; 

Invoke


truncate table bible.nivtext
;

insert into bible.nivtext
(
    contentXML
)
select
        convert_from
        (
              p_result
            , 'UTF-8'
        )::xml
from   stackoverflow.bytea_import
        (
            'C:\sample\sample.db\bible\textfile\niv\niv.xml'
        )
;

Review Data

SQL

select *
from   bible.nivtext
limit 10
;

Output

readData.01.20190819.0303PM

Explanation

Data read has a single XML blob.

Dedicated

Dedicated to Jack Douglas via Stack Overflow.

SQL to read XML from file into PostgreSQL database
Asked By :- bignose
Answered By :- Jack Douglas
Link

PostgreSQL :- Load text data through a View using “Copy” statement

Background

Some database platforms allow one to load data through views.

Error

Upon creating a view across our simple table and attempting to load data through the view, prompted with a guiding error message.

Error Message

Error occurred during SQL script execution

Reason:
SQL Error [42809]: ERROR: cannot copy to view "vw_kjv"
  Hint: To enable copying to a view, provide an INSTEAD OF INSERT trigger.

Error Image

insteadOfTriggerNeeded.01.20190817.1009PM.PNG

Remediation

Outline

  1. Create/Review view definition
  2. Create a function
  3. Create a trigger
    • Bind the trigger to the function we created

Tasks

View

View Definition

create schema if not exists bible;

create or replace view bible.vw_kjv
(
      bookID
    , chapterID
    , verseID
    , verse
)
as

select
      bookID
    , chapterID
    , verseID
    , verse    

from  bible.kjv
;

Function

Function Definition

create schema if not exists bible;

CREATE OR REPLACE FUNCTION bible.function_vw_kjv_insert()
RETURNS trigger
as
$func$
begin

   /*
    * Add record into bible.book
    * a) if book is not yet registered
   */
   INSERT INTO bible.book
   (
           bookAbbrev
         , bookName
   )
   select
           new.bookid
         , new.bookid

  where not exists
                (
                    select 1
                    from   bible.book tblB
                    where  tblB.bookabbrev = new.bookid
                )
   ;

   /*
    * Add record into bible.kjv
   */
   INSERT INTO bible.kjv
   (
          bookid
        , chapterid
        , verseid
        , verse
   )
   VALUES
   (
          new.bookid
        , new.chapterid
        , new.verseid
        , new.verse
    );

    /*
        Return new record
    */
    return new;

end;

$func$ 

LANGUAGE plpgsql;
;

COMMENT ON FUNCTION bible.function_vw_kjv_insert
    IS
    'Function that manages instead of trigger/insert on vw_kjv'
    ;

Trigger

Trigger Definition


drop trigger if exists tr_vw_kjv_insert on bible.vw_kjv
;

create trigger tr_vw_kjv_insert
    instead of insert on bible.vw_kjv
    for each row
    EXECUTE procedure bible.function_vw_kjv_insert ()
    ;

COMMENT ON TRIGGER tr_vw_kjv_insert on bible.vw_kjv
    IS
    'Instead of trigger/insert on vw_kjv'
    ;   

Copy Statement

Sample Execution


/*
    On bible.kjv => Remove Existing Data
*/
truncate table bible.kjv
     RESTART identity
;

/*
    On bible.kjv => Restart identity column
*/
ALTER TABLE bible.kjv
    ALTER COLUMN id
        RESTART WITH 1
;

/*
    On bible.kjv => Copy Data

        a) Specify columns
        b) Delimiter is |
*/
COPY bible.vw_kjv
(
      bookID

    , chapterID

    , verseID  

    , verse
)

FROM 'C:\sample\sample.db\bible\textfile\kjv\kjvdat.txt'

delimiter AS '|'

;

/*
    On bible.kjv => Copy Data

        a) Column verse

            i) Remove leading spaces
**/
update bible.vw_kjv
set    verse = ltrim(verse)
;

/*
    On bible.kjv => Update statistics
*/
analyze bible.kjv
;

Summary

One of the good things about PostgreSQL is that the error messages are very good and guiding.

PostgreSQL :- Load text data using “Copy” statement

Background

Let us load text into a PostgreSQL table.

Command

The command we will be using is the Copy Command.

The Copy Command is able to get data out of table.

And, it is able to add data in.

Table

Table Declaration

Code

create schema if not exists bible
;

/*
    drop table if exists bible.kjv cascade
*/

create table if not exists bible.kjv
(
      id        int     not null
      GENERATED
      ALWAYS AS identity

    , bookID    char(3) not null

    , chapterID smallint not null

    , verseID   smallint not null

    , verse     varchar(4000) not null

    , constraint "bible.kjv.PK"
      primary key
      (
          bookID
        , chapterID
        , verseID
      )

)
;

Copy Command

Outline

  1. Identify Destination and Source tables and views
    • Review columns
  2. Identify data files
    • Our Case
      • In our case we will be using text files availed through Sacred Text
        • The URL is sacred-texts.com ( Link )
    • Review column delimiters
    • Header lines
    • Data Encoding
  3. Import or Export data
  4. Clean data
  5. Update Statistics

 

Syntax

COPY (table-name)
(
   column-list
)

FROM (file-name)

delimiter AS (delimiter)

;

Sample

/*
On bible.kjv => Remove Existing Data
*/
truncate table bible.kjv
RESTART identity
;

/*
	On bible.kjv => Restart identity column
*/
ALTER TABLE bible.kjv
    ALTER COLUMN id
        RESTART WITH 1
;

/*
On bible.kjv => Copy Data
a) Specify columns
b) Delimiter is |
*/
COPY bible.kjv
(
	  bookID

	, chapterID

	, verseID  

	, verse
)

FROM 'V:\\sample\sample.db\bible\textfile\kjv\kjvdat.txt'

delimiter AS '|'

;

/*
On bible.kjv => Data Cleansing
a) Column verse
i) Remove leading spaces
**/
update bible.kjv
set    verse = ltrim(verse)
;

/*
On bible.kjv => Update statistics
*/
analyze bible.kjv
;

 

Summary

Importing and Exporting text data is straight forward.

PostgreSQL :- Listing Column Names

Background

Let us expose the columns in each of our tables.

Code

Outline

  1. Column Length
    • The Column length is one of the more perplexing attribute
    • The reason being the existence of a couple of columns that intertwine to properly decode it
      • Attributes
        • attlen ( Attribute Length )
        • atttypid ( Attribute Type ID )
        • atttypmod ( Record Type Specific Data )
          • Varchar
            • base length (4) + defined length
  2. pg_catalog.format_type
    • Get SQL name of a Data Type
    • Accepts type ID ( atttypid )and Type Mode ( atttypmod )
  3. The data returned by pg_catalog.format_type can be parsed using :-
    • Exception Handling
    • Position Function

SQL


; with ctePGAttributeBaseline
(
      attrelid

    , attnum  

    , atttypid

    , atttypmod

    , columnType

)
as
(
    select

          tblPA.attrelid

        , tblPA.attnum  

        , tblPA.atttypid

        , tblPA.atttypmod

        , pg_catalog.format_type
                (
                      tblPA.atttypid
                    , tblPA.atttypmod
                )
            as "columnType"

    from pg_catalog.pg_attribute tblPA

)
, ctePGAttribute
(
      attrelid

    , attnum  

    , atttypid

    , atttypmod

    , columnType

    , lengthViaRegEx

    , lengthViaPos
)

as
(
    select

          attrelid

        , attnum  

        , atttypid

        , atttypmod

        , columnType

        , (
            regexp_matches
            (
                 columnType
                ,'\(([^()]*)\)'
            )
        )[1] as "lengthViaRegEx"

        , case
            when
                position('(' in columnType ) > 0
                and position(')' in columnType ) > 0
                then substring
                        (
                              columnType
                            , position('(' in columnType )
                            , position(')' in columnType )

                        )
            else null
         end as "lengthViaPostion"

    from ctePGAttributeBaseline ctePGAB

)

select
          tblNS.nspname
            as "schema"

        , tblC.relname
            as "object"

        , tblPA.attnum
            as "columnNumber"

        , tblPA.attname
            as "column"

        , tblPT.typname
            as "columnType"

        /*  

            , tblPA.attlen
                as "columnLength"

            , tblPA.atttypid

            , tblPA.atttypmod

        */

        , tblPT.typlen  

        , columnType

        , case
            when (tblPT.typlen>0)
                then tblPT.typlen::text

            when (tblPT.typlen=-1)
                then ctePGA.lengthViaRegEx
          end
              as "columnLenViaRegEx"

        , case
            when (tblPT.typlen>0)
                then tblPT.typlen::text

            when (tblPT.typlen=-1)
                then ctePGA.lengthViaPos
          end
              as "columnLenViaPosition"

        , case tblPA.attnotnull
            when true then 'No'
            when false then 'Yes'
          end
          as "nullability"

        /*
          , tblPTD.adsrc
            as "default"
        */

        , pg_catalog.pg_get_expr
            (
                  tblPTD.adbin
                , tblPTD.adrelid
            )
            as "columnExpr"

        , tblPA.attidentity
            as "identityID"

        , case tblPA.attidentity
            when '' then ''
            when 'a' then 'Generate Always'
            when 'd' then 'Generate By Default'
          end as "identity"

from pg_catalog.pg_class tblC

join pg_catalog.pg_namespace tblNS

    on tblC.relnamespace = tblNS.oid

join pg_catalog.pg_attribute tblPA

    on tblC.oid = tblPA.attrelid

join pg_catalog.pg_type tblPT

    on tblPA.atttypid = tblPT.oid

left outer join pg_catalog.pg_attrdef  tblPTD

    on tblPA.atthasdef = true
    and tblPA.attrelid = tblPTD.adrelid
    and tblPA.attnum = tblPTD.adnum

left outer join ctePGAttribute ctePGA

    on  tblPA.attrelid = ctePGA.attrelid

    and tblPA.attnum  = ctePGA.attnum

where  tblC.relkind = 'r'

/*
 * The number of the column.
 * Ordinary columns are numbered from 1 up.
 * System columns, such as oid, have (arbitrary) negative numbers.
 */
AND    tblPA.attnum > 0 

/*
 * This column has been dropped and is no longer valid.
*/
and   tblPA.attisdropped = false

and   tblNS.nspname not in
        (
              'information_schema'
            , 'pg_catalog'
        )

order by

          tblNS.nspname
        , tblC.relname
        , tblPA.attnum
        , tblPA.attname

Output

pg_catalog.pg_attribute.01.20190817.0124PM.PNG

Summary

The most difficult part is trying to figure out a column’s defined length.

PostgreSQL :- Datatypes

Background

Datatypes is one of key building blocks of a database system.

Common Ones

Here are the some of the common ones in PostgreSQL.

  1. Bool
    • Values
      • true
      • false
  2. Character
    • pg_catalog::name
      • Internal Type for Object Names
      • 64 bytes Character
    • information_schema::sql_identifier
      • ANSI Compatible
    • Char
      • Fixed Length
    • Varchar
      • Variable Length
    • text
      • Variable Length unlimited in size
  3. Datetime
    • Date
    • Time
    • timestamp
    • timestamptz
  4. uuid
    • Unique Identifier
  5. Money
  6. JSON
    • jsonb
  7. XML
  8. Defined Objects such as Tables

Code

SQL

select 

            tblPT.oid::text
                as oid

          , tblPT.typname
                as "type"

        --, tblPT.typnamespace as typeNS
          , tblNS.nspname
                as "namespace"

          , tblPA.rolname
                as "owner"

          , tblPT.typcategory
                as "categoryID"

          , case tblPT.typcategory
                when 'A' then 'Array'
                when 'B' then 'Boolean'
                when 'C' then 'Composite'
                when 'D' then 'Date/time'
                when 'E' then 'Enum'
                when 'G' then 'G'
                when 'I' then 'Network Address'
                when 'N' then 'Numeric'
                when 'P' then 'Pseudo'
                when 'S' then 'String'
                when 'T' then 'Timespan'
                when 'U' then 'User Defined'
                when 'V' then 'Bit String'
                when 'X' then 'Unknown'
                else null
            end as category     

          , tblPT.typlen
                as "length"

          , tblPC.relname
                as "correspondentObject"

          --, tblPT.typelem as "typelem"
          , tblPTA.typname
                as "ancenstor"

from   pg_catalog.pg_type tblPT

left outer join pg_catalog.pg_type tblPTA

    on tblPT.typelem = tblPTA.oid

inner join pg_catalog.pg_namespace tblNS

    on tblPT.typnamespace = tblNS.oid

left outer join pg_catalog.pg_class tblPC

    on tblPT.typrelid = tblPC.oid

left outer join pg_catalog.pg_authid tblPA

    on tblPT.typowner = tblPA.oid

where (
            -- If typarray is not 0 then it identifies another row in pg_type
            -- which is the "true" array type having this type as element
            ( tblPT.typarray != 0 )

     )      

-- and ( tblPT.typcategory = 'S')

order by
    tblPT.oid

Output

pg_catalog.pg_type.01.20190816.1110PM

References

  1. PostgreSQL
    • Data Types
    • System Catalog