Introduction
Having fun with Hadoop; specifically exporting data with Sqoop.
Pre-requisites
Hopefully, you have installed\validated that Hadoop\Sqoop is installed and running properly.
If not, please read Technical: Hadoop – Sqoop on Cloudera (CDH) – Is Sqoop Set up and Configured for MS SQL Server (https://danieladeniji.wordpress.com/2013/05/03/technical-hadoop-sqoop-on-cloudera-cdh/ )
Generate Sample Data
Data, data everywhere but none to share without running afoul of regulatory overlords.
So took to the web and googled for Sample data.
Got a few hits and chose to go with generateData.com (http://www.generatedata.com/#generator)
Sample Data – generateData.com
Chose to generate a pretty generic dataset consisting of :
- CustomerID
- CustomerName
- Address
- City
- State
- ZipCode
- EmailAddress
- PhoneNumber
Other Choices:
- Result Type –> CSV
- Country Specific Data –> US
- Number of results –> 5000
- CSV options –> Delimiter Characters (|)
Generated data and save unto working machine as randomData.csv
Copy Sample Data to Hadoop \ HDFS Node
Copied Data from desktop to Hadoop Node
Syntax:
scp <file-name> username@<hadoop-hdfs-node>:/folder
Sample:
scp /downloads/randomdata.csv dadeniji@hadoop-node:/tmp
Copied Data to HDFS File System
Syntax:
sudo -u hdfs hadoop fs -mkdir hdfs://hadoopHDFSNode:portNumber/folder suo -u hdfs hadoop fs -put \ /tmp/randomdata.csv \ hdfs://hadoopHDFSNode:portNumber/folder
Sample:
--make folder /sampleData/GenerateData.com/customerList sudo -u hdfs hadoop fs -mkdir "/sampleData/GenerateData.com/customerList" --copy data from local /tmp/randomdata.csv --to hdfs system hadoopHDFS -- hdfs/sampleData/GenerateData.com/customerList sudo -u hdfs hadoop fs -put \ /tmp/randomdata.csv \ "hdfs://hadoopHDFS:8020/sampleData/GenerateData.com/customerList"
Export Hadoop \ HDFS file to Database Table
Export Hadoop \ HDFS file to database table
Syntax:
sudo -u hdfs sqoop export \
--connect "jdbc:sqlserver://<DBServer>;database=<database>" \
--username dbUsername \
--password dbPassword \
--driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" \
--export-dir \
hdfs://URL:PortNumber/<file-name> \
--table "<table-name>" \
--fields-terminated-by "<field-terminator>"
Sample:
sudo -u hdfs sqoop export \
--connect "jdbc:sqlserver://DBLAB;database=DEMO" \
--username "sammie" \
--password "p1smith" \
--driver "com.microsoft.sqlserver.jdbc.SQLServerDriver" \
--export-dir \
'hdfs://hdfsNode:8020/sampleData/GenerateData.com/customerList/randomdata.csv' \
--table "dbo.customerList" \
--fields-terminated-by "|"
Error - Remediation
Errors - Invalid object name
Text Capture:
INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM dbo.customerList AS t WHERE 1=0
ERROR manager.SqlManager: Error executing statement:
com.microsoft.sqlserver.jdbc.SQLServerException:
Invalid object name 'dbo.customerList'.
com.microsoft.sqlserver.jdbc.SQLServerException:
Invalid object name 'dbo.customerList'.
Screen Dump:

Fix:
Please create corresponding database table beforehand.
Errors - Extra\Blank lines at the bottom of the HFDS Data feed
Text Capture:
java.io.IOException: Can't export data, please check task tracker logs
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:673)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:331)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:794)
at dbo_customerList.__loadFromFields(dbo_customerList.java:355)
at dbo_customerList.parse(dbo_customerList.j
Screen Dump:

Fix:
"Screen" datafeed and make sure there are no header nor footer lines.
Hi Daniel ,
I am facing the “Errors – Extra\Blank lines at the bottom of the HFDS Data feed” issue as you have describe and I have checked my data input and it does not contain any header or footer field and still I am facing the same issue.
Can you please help me out ?
I need to know how to send the log files to hdfs using sqoop 1.4.1 Please let me know the sqoop import comand.