TIGER

I haven't been posting often.

Currently I'm trying to interface TIGER/Line files with GPS data to translate lat/long coordinates into street data, and consequently census blocks.

I may have to use this code later.

Posted by provolot on July 9, 2007 3:18 PM | | Comments (0) | TrackBacks (0)


Boundaries

Okay, slowly getting there.

There are 'cartographic boundary files' downloadable here. This is the technical documentation.

Incredible. Seems to be a giant collection of nodes, later creating boundaries. I'm not sure how to process this..

Posted by provolot on July 9, 2007 5:05 PM | | Comments (0) | TrackBacks (0)


spatial database extensions

I installed mysql on osx yesterday, and finished setting root passwords for all hostnames. Now i need to utilize the spatial extensions and correlate them with the census block shapes (from the TIGER/Line data).

Next to do:
- Load data into mysql.
- Figure out how to do queries -- x contains y?
- Write Perl or Processing script to process queries. This Processing-MySql library may work.

Posted by provolot on July 13, 2007 2:59 PM | | Comments (0) | TrackBacks (0)


mysql, perl packages, fink, shp2mysql

To import shp files into a MySQL database, someone made a perl script called shp2mysql. Unfortunately, when I ran this, it told me that my computer required several different modules. After running around, I decided to install Fink, a sort of apt-get thing for OSX that enables you to use Unix software on OSX. I then installed a few Perl libs, but realized that Fink didn't have all of them available.

I then tried to use CPAN, which I suppose is like Fink/apt-get, but just for Perl libraries. CPAN's ftp wouldn't work however, which may be a combination between using active vs. passive ftp. To fix this, I set

&rt; export FTP_PASSIVE=1
in the bash shell of a NON-ROOT user, then did
sudo perl -MCPAN -e 'install Bundle::DBD'
It still doesn't work, however. Here are the error messages:

Can't load '/Library/Perl/5.8.6/darwin-thread-multi- 2level/auto/DBD/mysql/mysql.bundle' for module DBD::mysql: dlopen(/Library/Perl/5.8.6/darwin-thread-multi- 2level/auto/DBD/mysql/mysql.bundle, 1): Library not loaded: /usr/local/mysql/lib/mysql/libmysqlclient.15.dylib
  Referenced from: /Library/Perl/5.8.6/darwin-thread-multi-  2level/auto/DBD/mysql/mysql.bundle
  Reason: image not found at /System/Library/ Perl/5.8.6/darwin-thread-multi- 2level/DynaLoader.pm line 230.
 at -e line 1
Compilation failed in require at -e line 1.
BEGIN failed--compilation aborted at -e line 1.

Gah.


update: I fixed this eventually; check the solution process here: DBD::mysql fixed!, here: MySQL Spatial abilities = bad, here: YEAH! and NO!, here: PostgreSQL & PostGIS! , here: PostGIS continued, but..., and finally fixed it here: success.

Posted by provolot on July 16, 2007 3:53 AM | | Comments (0) | TrackBacks (0)


DBD::mysql fixed!

Finally fixed the whole DBD::mysql shenaningans by creating root-owned simlinks from /usr/local/mysql/lib/mysql (where the system was looking) to /usr/local/mysql/lib (where it was) or so.

Also, the script wouldn't work, giving me 'cannot connect' errors, until I went and used the GRANT command to give the user permission. Now it works!


update: I fixed this eventually; check the solution process here: MySQL Spatial abilities = bad, here: YEAH! and NO!, here: PostgreSQL & PostGIS! , here: PostGIS continued, but..., and finally fixed it here: success.

Posted by provolot on July 16, 2007 9:42 PM | | Comments (0) | TrackBacks (0)


MySQL Spatial abilities = bad

Holy shit this stuff is so convoluted. Apparently MySQL doesn't have good spatial/GIS support, compared to PostGIS/PostgreSQL. On top of that, nobody seems to have some sort of documentation -- there are all these tools and scripts lying around that nobody knows how to use. There's a 140+ page pdf lying around explaining some standard of representation of sorts.

I'm sure it shouldn't be this difficult. I'll try to use a library called libmygis, and if that doesn't work out, then I'm just going to switch to PostGIS, dammit..




update:
I fixed this eventually; check the solution process here: YEAH! and NO!, here: PostgreSQL & PostGIS! , here: PostGIS continued, but..., and finally fixed it here: success.

Posted by provolot on July 18, 2007 11:26 PM | | Comments (0) | TrackBacks (0)


YEAH! and NO!

Awesome! and Oh No!

So I finally got it working. The libmygis library in the previous post had a tool called 'mysqlgisimport', that I used to import my shapefiles with. I had to tell it not to use a PRJ file (whatever that is) since I didn't have one, and outputted the results into a temporary file:

./mysqlgisimport --no-prj tr36_d00  -o foo

Notice that the shapefiles' extensions aren't written. After that, I took foo, and inputted it into mysql to run mysql in batch mode:
mysql -upb -p < foo

('pb' is my username, so -upb significates that the user is pb, and -p makes it ask for a password.)
And that worked! All the files went into the mysql database, into a table called tr36_d00! What worked this time vs. using the shp2mysql.pl tool is that a column holding geometry values was created. To test the creation of these geometries, I tried querying:
select AsText(tr36_d00.geo) FROM tr36_d00 limit 1;

and got entries like
POLYGON((-73.519309 45.006876,-73.452584 45.008767,-73.437371 45.009198,-73.411169 45.009655,-73.3883 45.010053,-73.367388 45.010417,-73.357332 45.010592,-73.352228 45.010681,-73.343124 45.01084,-73.350188 44.994304,-73.353355 44.990259,-73.353429 44.990165,-73.354633 44.987352,-73.354112 44.984062,-73.353716 44.982959,-73.352886 44.980644,-73.350218 44.976222,-73.345974 44.971764,-73.34474 44.970468,-73.339451 44.966433,-73.338734 44.965886,-73.338243 44.96475,-73.337906 44.960541,-73.339603 44.94337,-73.338482 44.924112,-73.338939 44.918194,-73.338979 44.917681,-73.339142 44.917447,-73.339534 44.916885,-73.396083 44.911148,-73.396627 44.913411,-73.450909 44.908153,-73.486053 44.904716,-73.507148 44.911338,-73.521172 44.911536,-73.520173 44.941891,-73.519375 44.97616,-73.519309 45.006876))

Awesome! So I tried testing to see if it could correlate between GPS data and census tract. Using a geocoder, I got the lat/long of 116th st and Broadway in NYC (40.8080, -73.963956). Using these examples as help, I constructed this query and ran it:


SELECT t.TRACT
FROM tr36_d00 t
WHERE Contains(t.geo, GeomFromText('Point(-73.963956 40.808502)')) = 1;

and with my tired eyes I saw in slowly-dawning-hope-growing-turning-into-joy:

+-------+
| TRACT |
+-------+
| 0315 |
| 0205 |
| 0203 |
+-------+

YAY HOORAY AWESOME OH BOY Oh boy oh oh....

...unfortunately, as you can see, there are three TRACT answers, which doesn't make sense, as the census tracts don't overlap. Alas, the function Contains, which checks to see if an object contains another objects, is not accurate. As this page of documentation says, it instead returns the value from the MBRContains function. And the MBRContains function doesn't actually check, it checks to see if the smallest bounding box of an object contains the smallest bounding box of another's.

Damn! Because Manhattan doesn't run exactly north-south, but its census tracts do, the bounding rectangles (which presumably have vertical and horizontal lines) are probably woefully inaccurate:
censusblockmbrexample.gif
Here are the minimal bounding rectangles; there's a third one you can't see because the census block is thin and tall, and so its MBR is too large. The black dot is the location of the lat/long coordinates on the map. Since the dot is within both MBRs, the function treats it as being contained, and that's why we get three census tracts back -- because the coordinates are within the MBRs of three census tracts: 315, 205, 203.

GAH! so what now? Some guy wrote a function to check for true 'contains' by getting the coordinates of the census tract, drawing a line from the desired point outwards, and counting how many times the line crosses the boundaries of the census tract? If the number of times is odd, its contained; if not, then it isn't. Described here and implemented in PHP(why?) here.

But no, that's sloppy. I don't want to do that! I think I'll end up installing PostGIS and using that instead, which unlike MySQL has excellent spatial/GIS support -- or so I hear.

All that time spent on MySQL! Wasted! AHHHH!




update:
I fixed this eventually; check the solution process here: PostgreSQL & PostGIS! , here: PostGIS continued, but..., and finally fixed it here: success.

Posted by provolot on July 19, 2007 1:14 AM | | Comments (0) | TrackBacks (0)


PostgreSQL & PostGIS!

I installed readline with fink, and then compiled PostgreSQL from source (as helpfully explained in this page.)
Compiling PostGIS from source..
http://postgis.refractions.net/docs/ch02.html
To Be Continued...




update:
I fixed this eventually; check the solution process here: PostGIS continued, but..., and finally fixed it here: success.

Posted by provolot on July 19, 2007 4:55 PM | | Comments (0) | TrackBacks (0)


PostGIS continued, but...

So things were going faster than expected. I installed PostGIS, then in the source directory fount a tool called shp2pgsql. Using the command

shp2pgsql -w -g geom tr36_d00 cshape >
tr36_d00.pgresimport

I turned the shapefiles into batch files that are executable by PostgreSQL. The -w turns the format into wkt format (well-known-text), which I just liked because it was readable. I had trouble with settings, and fudged around this area for a while.

I then created the standard OpenGIS tables as dictated in this page: the SPATIAL_REF_SYS and GEOMETRY_COLUMNS tables. It seems that they're mandatory. I then imported the file that I had previously created:

psql -Upostgres -d census_shapes -f tr36_d00.pgresimport

postgres is the user, census_shapes is the database, and tr36_d00.pgresimport is the import file. This worked, and all the entries started importing! Hurrah. After that, I tried querying:

 select AsText(geom) from cshape limit 1;
And I got an output that looked like:
MULTIPOLYGON(((-74.922368 44.960301,-74.917243 44.963743,-74.902403 44.947712,-74.901045 44.94551,-74.90028 44.944099,-74.898672 44.941034,-74.898415 44.940851,-74.898125 44.940097,-74.894169 44.935157,-74.893477 44.93416,-74.893223 44.933621,-74.89314 44.933076,-74.89314 44.932367,-74.890155 44.921848,-74.888541 44.916816,-74.887571 44.91426,-74.885153 44.911527,-74.88254 44.908058,-74.881842 44.898385,-74.886792 44.904059,-74.888457 44.905923,-74.892535 44.910287,-74.895901 44.913879,-74.89628 44.914283,-74.900616 44.918999,-74.90068 44.919068,-74.908067 44.926972,-74.908321 44.927241,-74.911828 44.930953,-74.912202 44.931349,-74.914032 44.933287,-74.919177 44.938664,-74.919808 44.939322,-74.920282 44.939845,-74.922369 44.942046,-74.924504 44.944292,-74.930209 44.950191,-74.931129 44.951419,-74.932078 44.95242,-74.934773 44.955263,-74.924425 44.959972,-74.922368 44.960301)))
Great! And now for the final test: the query that I ran earlier on MySQL and didn't work --
SELECT t.TRACT
FROM cshape t
WHERE Contains(t.geom, GeomFromText('Point(-73.963956 40.808502)')) = 1;
And I get:
ERROR:  contains:: operation not implemented - compile PostGIS with JTS or GEOS support

Uhh, what?

It turns out that I didn't have GEOS support compiled. GEOS (Geometry Engine Open Source) is a C/C++ port of the JTS (JTS Topological Suite), which handles topological calculations with polygons/lines/etc, spatial analysis, etc. Of course, this means that JTS and GEOS provide the capability to answer my crucial question: "Is point X inside shape Y"?

So -- I downloaded the newest version of GEOS, and am compiling it now. I believe (according to this page) that I need to recompile PostGIS by using the options:

LDFLAGS=-lstdc++
./configure --with-geos=PATH

I'm soo close.




update:
I finally fixed it here: success.

Posted by provolot on July 21, 2007 4:28 PM | | Comments (0) | TrackBacks (0)


success

I compiled geos -- standard './configure', 'make', 'make install'. I then compiled PostGIS again:

LDFLAGS=-lstdc++ ./configure --with-geos=/usr/local/bin/geos-config

which was pretty quick, surprisingly. 'make', 'make install', and then done! Bingo!

But the query wouldn't work, wouldn't return anything. After messing around for a while, I realized that TRACT was lowercase, and Contains should be ST_Contains. It still didn't work, until I realized that ST_Contains returns a boolean value, and tried comparing to 'TRUE' instead of '1':

SELECT t.tract FROM cshape t WHERE ST_Contains(t.geom, GeomFromText('Point( -73.965331 40.806129)')) = TRUE;

The address values are different this time, because the previous address was right on the boundary of two census tracts and I just wanted to be sure. I got:

 tract 
-------
 0199

Awesome! American FactFinder verifies this.

GPS->Census Tract info correlation success! Now for the data/information stuff.

Posted by provolot on July 21, 2007 7:38 PM | | Comments (0) | TrackBacks (0)