GSoC Blog 1

Jun 7 — Jun 17

Kush Kothari
4 min readJun 21, 2021


What was done in the community bonding period?

During the community bonding period, I set up my coding environment (for retrieverdash and retriever-recipes) and the virtual environments required for the same. A testing environment was also setup using PostgreSQL, pgAdmin4 and QGIS.

As a proof-of-work on my project, I had written a few scripts that made use of the retriever library and a few other corresponding libraries like geopandas. These demonstrated a rough idea about how the final dashboard script would work on the Django Project.

The following scripts were prepared:

  • Download, extract and sort the data of various data sets into appropriate folders.
  • Install function to directly use the PostgreSQL engine from retriever.
  • Use the convert_to_csv from both retriever and geopandas to save the data into csv so they can be used to create diffs as required.

During the period, I also created a data set script for retriever-recipes and created a pull request here that solved the issue here. This was done after appropriately testing the commands for the new data set, according to the screenshots mentioned in the pull request.

On June 7

June 7 was the date of the first meet with the mentor. After getting in touch with him and the other students I was working with, I started working on the testing phase of the existing PR. This PR, introduces the PostgreSQL engine into the retriever dashboard, for spatial datasets involving both raster and vector data. Due to this, the PR adds more imports, and 1 more function (at that time) into the existing dashboard script. Apart from the usual IGNORE LIST, I also added the developers’ IGNORE LIST responsible for making the dashboard script run-able in a local developing environment like a PC or a laptop. For efficient testing, I also added a CHECK DATABASES list so we could easily check the newly added spatial datasets to dashboard so they can be easily developed.

By June 10

By June 10 I had completed the basic script that needed to be added to give the dashboard the ability to install spatial datasets. I also began with testing data sets inside the dashboard (using the CHECK DATABASES list mentioned before). I started with harvard-forest (vector dataset) and bioclim (raster dataset). During initial testing, there was an issue because even though the script seemed to be going perfectly, there was no output showing the tables in pgAdmin or QGIS. The data was being installed but was not being shown for some example.

The answer lied in the postgres engine options, the original script accepted the database tables with the name formatted as {db}_{table} so we could save the diff as a similar name. However, the script requires the tables be stored in the format of {db}.{table}, so I had to revert it to that which solved the problem.

There was another problem regarding the csv file limit which we discussed on the meet that happened on June 10.

By June 14

By Jun 14, I had tested a variety of spatial datasets including ones that had been introduced newly to the retriever-recipes repository.

The following datasets had been tested by now.

  • ‘harvard-forest’
  • ’fire-occurrence-firestat-yearly’
  • ‘mtbs-burn-area-boundary’
  • ‘mtbs-fire-occurrence’
  • ‘ecoregions-us’
  • ‘national-usfs-finalfire-perimeter’
  • ‘usa-activity-silvreforestation’
  • ‘bioclim’
  • ‘activity-silviculture-timber-stand-improvement’
  • ‘activity-timberharvest’
  • ‘activity-range-vegetation-improvement’

The following two datasets gave errors while installing:

  • ‘npn’
  • ‘usgs-elevation’

This was discussed in the meet on June 14 and 2 issues ( #1595 and #1596) were opened based on the observations that had been made during the meet.

By June 17

In order to solve the csv_extend_size error, I opened this PR, which was then merged. On reinstalling the retriever library in the dashboard again (the PR now in effect), the ‘mtbs-burn-area-boundary’ dataset now works correctly.

However, on further inspection, the datasets were not being converted to csv. This was because in a earlier effort to find the csv_extend_size error, I had commented a piece of code, and forgotten to uncomment it. On un-commenting and rerunning the script, I discovered that multiple datasets had the same csv_extend_size error and thought that multiple PRs would have to be sent in order to solve these issues.

On discussing with the mentor on June 17, I understood that the error was caused by an internal function that was not going to be used by the user. Hence, there was no need for a change in the dataset scripts.

This problem was circumvented by adding a few lines of code to the dashboard that automatically increases the field size limit.

This has been my work so far in the first 2 weeks of GSoC.