README.md in csv-import-analyzer-0.0.5 vs README.md in csv-import-analyzer-0.0.6
- old
+ new
@@ -1,12 +1,10 @@
# Csv::Import::Analyzer
-Perform datatype analysis on desired chunk
-Calculate min-max bounds for each column
-Determine which coulmns are nullable in the csv file
+CsvImportAnalyzer is intended to help perform data analysis on csv (comma seperated), tsv (tab seperated) or ssv (semi-colon seperated) files. It can be used to process large datasets in desired chunk sizes (defaults to 200 rows), gives you a comprehensive analysis on each column with possible datatype, minimum and manimum bounds, if the column can be set to nullable for each column.
-Note: This gem expects the first line to be definitve header, as in like column names if the csv file has to be imported to database.
+<b>Note</b>: This gem expects the first line to be definitve header, as in like column names if the csv file has to be imported to database.
## Installation
Add this line to your application's Gemfile:
@@ -22,33 +20,126 @@
$ gem install csv-import-analyzer
## Usage
-Calling process on a filename would generate a metadata_output.json which has the Delimiter, Datatype Analysis and SQL (create and import) statements for both PostgreSQL and MySQL
+Calling process on a filename would generate metadata for the sample file and return it as a json object. This metadata would have the following
+<ul>
+ <li> High level stats for the given file (E.g. filename, file size, number of rows, number of columns).</li>
+ <li> Data manipulation done for pre-processing the file.</li>
+ <li> Data analysis on each column as key value pairs.</li>
+ <li> By default you would also have MySQL queries that you need to import the file to database.</li>
+</ul>
+```ruby
+ CsvImportAnalyzer.process(filename)
+```
+## Demo
+
+Below is a sample test.csv file
+
+```
+Year ID,Make ID,Model ID,Description ID,Price ID
+1997,Ford,E350,"ac, abs, moon","3000.00"
+1999,Chevy,"Venture ""Extended Edition""",,4900.00
+1999,"Chevy","Venture ""Extended Edition, Very Large""","",5000.00
+1996,Jeep,Grand Che'rokee,"MUST SELL!air, moon roof, loaded",4799.00
+```
+To get the data analysis of above file, you can use CsvImportAnalyzer to process the file.
+
```ruby
+metadata = CsvImportAnalyzer.process("test.csv", {:distinct => 2})
+```
+### Result
+Now the metadata would hold the json object of the comprehensive analysis. Below is what the metadata would be for the sample csv file
+```ruby
+puts metadata
+```
+```json
+{
+ "csv_file": {
+ "filename": "sampleTab.csv",
+ "file_size": 276,
+ "record_delimiter": ",",
+ "rows": 6,
+ "columns": 5,
+ "processed_filename": "processed_sampleTab.csv",
+ "processed_file_path": "/tmp/processed_sampleTab.csv",
+ "processed_file_size": 279,
+ "error_report": "/tmp/error_report_sampleTab.csv"
+ },
+ "data_manipulations": {
+ "replace_nulls": true,
+ "replace_quotes": true
+ },
+ "csv_headers": {
+ "year_id": {
+ "datatype": "int",
+ "datatype_analysis": {
+ "int": 4
+ },
+ "distinct_values": "2+"
+ },
+ "make_id": {
+ "datatype": "string",
+ "datatype_analysis": {
+ "string": 4
+ },
+ "distinct_values": "2+"
+ },
+ "model_id": {
+ "datatype": "string",
+ "datatype_analysis": {
+ "string": 4
+ },
+ "distinct_values": "2+"
+ },
+ "description_id": {
+ "datatype": "string",
+ "datatype_analysis": {
+ "string": 2
+ },
+ "distinct_values": [
+ "ac, abs, moon",
+ "MUST SELL!air, moon roof, loaded"
+ ],
+ "nullable": true
+ },
+ "price_id": {
+ "datatype": "float",
+ "datatype_analysis": {
+ "float": 4
+ },
+ "distinct_values": "2+"
+ }
+ },
+ "sql": {
+ "mysql": {
+ "create_query": "create table processed_sampletab.csv ( year_id int not null, make_id varchar(255) not null, model_id varchar(255) not null, description_id varchar(255), price_id float not null);",
+ "import_query": "COPY processed_sampletab.csv FROM '/tmp/processed_sampleTab.csv' HEADER DELIMITER ',' CSV NULL AS 'NULL';"
+ }
+ }
+}
- CsvImportAnalyzer.process(filename)
-
```
## TODO:
<ul>
- <li> Handle control of processed input file to user </li>
- <li> Return the analysis as Json object.</li>
<li> Better - Structuring the analysis outputted to csv</li>
<li> Add support to convert and import xlsx files to csv </li>
+ <li> Handle control of processed input file to user </li>
</ul>
## Additional Information
### Dependencies
+
<ul><li><a href="https://github.com/tilo/smarter_csv">smarter_csv</a> - For processing the csv in chunks</li></ul>
## Contributing
1. Fork it ( https://github.com/avinash-vllbh/csv-import-analyzer/fork )
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create a new Pull Request
+