| 05:14:25 | milovanderlinden: | Paul Ramsey pointed to a nice presentation about mapserver ( the presentation with the oxes? ) anyone knows a url to that? |
| 05:24:24 | mloskot: | milovanderlinden: http://s3.cleverelephant.ca/geoweb-mapserver.pdf |
| 05:37:00 | milovanderlinden: | mloskot: Great! Thanks for the link! |
| 07:06:09 | CIA-6: | pramsey * r2913 /trunk/ ( 6 files in 2 dirs ): Spelling changes from Mark Kolybabi. |
| 11:03:06 | iggy: | so, I've got a rather large database of addresses with lat/lon and I've added a geom column with POINTs for the lat lon, I've got indexes on the zip column and on the geom column, but querying for a Buffer around a point still takes nearly 30 minutes |
| 11:04:49 | iggy: | I need it to be a little faster |
| 11:04:55 | iggy: | any ideas? |
| 11:33:47 | jlivni: | iggy: paste your sql query |
| 11:34:09 | jlivni: | hint: if you haven't indexed on the buffer, then it's not using the index |
| 11:36:33 | iggy: | select count( * ) from addresses where zip = '77006' and addr_point && ST_Buffer( ST_SetSRID( ST_MakePoint( 29.742028, -095.399748 ), 32661 ), .0014482259 ); |
| 11:36:51 | iggy: | the buffer is what I'm searching for, how would I index that? |
| 11:38:05 | jlivni: | wow thats a tiny buffer |
| 11:38:51 | jlivni: | epsg:32661 is in meters, no? your buffer untis should be meters too. thats aside the point of qry speed tho |
| 11:39:20 | iggy: | yes, I'm pretty sure I'm using meters |
| 11:39:35 | jlivni: | try: select count( * ) from addresses where addr_point && st_expand( ST_SetSRID( ST_MakePoint( 29.742028, -095.399748 ), 32661 ), .0014482259 ) |
| 11:39:43 | jlivni: | you're buffering by .0014 meters |
| 11:39:43 | iggy: | that should be about .01 miles converted to meters |
| 11:39:46 | jlivni: | which is not a big space |
| 11:39:55 | iggy: | oh |
| 11:39:57 | iggy: | huh |
| 11:40:05 | iggy: | can't be |
| 11:40:14 | jlivni: | select st_area( ST_MakePoint( 29.742028, -095.399748 ), 32661 ), .0014482259 ) |
| 11:40:27 | jlivni: | that should be results in sq. meters |
| 11:40:30 | iggy: | that query returns 144 addresses |
| 11:41:07 | jlivni: | well, you haven't asked to only find addresses within the buffer - just those that match the bbox ( rtree index ) of your buffered pt |
| 11:41:48 | iggy: | I think my head will explode soon |
| 11:41:56 | jlivni: | you probably want: select * from addresses where st_within( addr_point, somebuffer ) and addr_point && somebuffer |
| 11:42:29 | jlivni: | oh wait. i am sorry - i am lying to you a little here, i forgot the st_stuff negates the need for the whole && business. please ignore my above - i am just waking up |
| 11:42:32 | jlivni: | all you need is st_within |
| 11:42:38 | jlivni: | it will do the index automatically if you ahve an index |
| 11:43:10 | jlivni: | so try just select * from addresses where st_within( addr_point,yourbuffer ) |
| 11:43:29 | jlivni: | where yourbuffer is the whole st_buffer( st_setsrid.... ) ) |
| 11:43:58 | jlivni: | and of course we are assuming you have a GiST index ( not btree ) on yuor addr_point geometry field |
| 11:44:05 | jlivni: | and that you ran vacuum analyze |
| 11:44:31 | jlivni: | it should be very quick |
| 11:45:13 | jlivni: | if it is slow, please try replace the st_buffer with st_expand and see if it is a lot quicker |
| 11:46:44 | iggy: | I'll run a vacuum analyze real quick ( well, relatively speaking, 200 mil addresses doesn't do anything quick I think ), I've run it a couple of times, but I don't remember if I did it after my last changes to the table |
| 11:47:01 | iggy: | then I'll try your other suggestions and let you know |
| 11:47:53 | jlivni: | as long as your last changes weren't adding the rtree index, or adding a relatively huge amt of new pts, you should not need to rerun |
| 11:48:59 | iggy: | the last changes were adding the regular index on the zip column and the gist index on the addr_point column |
| 11:49:41 | jlivni: | yeh, you want to run vacuum anaylze after adding the gist index |
| 11:50:16 | jlivni: | given your # of points, that might be the prob |
| 11:50:53 | jlivni: | back to the buffer, note that select st_area( st_buffer( st_setsrid( ST_MakePoint( 29.742028, -095.399748 ), 32661 ), .0014482259 ) ) is a very small area ( result in meters ) |
| 11:51:13 | iggy: | for some reason I was thinking it was km |
| 11:51:41 | jlivni: | no - results are always in units of the projection |
| 11:51:43 | iggy: | since it was returning insane numbers when I tried making it meters |
| 11:52:04 | jlivni: | i am assuming your points are also in the same projection, not in lat/lng? |
| 11:52:16 | iggy: | they are lat/lon |
| 11:52:16 | jlivni: | or, rather, not in geographic coords |
| 11:52:23 | jlivni: | then you won't get a match... |
| 11:52:43 | jlivni: | your buffer also needs to be in the same projection |
| 11:53:08 | iggy: | I don't get it |
| 11:53:27 | jlivni: | well, you are trying to compare location in projection A with location in projection B. that won't work |
| 11:53:34 | jlivni: | you need to transform one of them to the others projection |
| 11:54:20 | iggy: | I don't follow |
| 11:54:51 | iggy: | how can you convert a radius around an address into lat/lon coordinates |
| 11:55:20 | iggy: | or I guess I thought that's what st_buffer did |
| 11:55:57 | iggy: | and how could you have points designated by anything other than coordinates? |
| 11:56:06 | jlivni: | st_buffer just gives you the buffer of the coords you are in ... wait a sec. i am cofusing you again for no reason |
| 11:56:18 | jlivni: | why are you using st_setsrid in there |
| 11:56:27 | jlivni: | let me see your orig query |
| 11:56:33 | iggy: | becase it complained that my srid's were different |
| 11:56:38 | jlivni: | yeah |
| 11:56:39 | jlivni: | they are |
| 11:56:43 | jlivni: | your points are probably in srid 4326 |
| 11:56:55 | iggy: | select count( * ) from addresses where zip = '77006' and addr_point && ST_Buffer( ST_MakePoint( 29.742028, -095.399748 ), .0014482259 ); |
| 11:57:17 | jlivni: | so you need to setsrid to 4326, not to 32661 or whatever it was |
| 11:57:21 | iggy: | update addresses set addr_point = GeomFromText( 'POINT( lat lon )', 32661 ); |
| 11:57:46 | jlivni: | where are you getting 32661 from? |
| 11:58:08 | iggy: | when I was testing that was the only thing I could get to work |
| 11:58:31 | jlivni: | ok .. but 32661 refers to a projection that is in meters -- how did you decide to use 32661 instead of say, 4326? |
| 11:59:08 | iggy: | because out of the 14 or so that I tried, that was the first one that worked |
| 11:59:27 | iggy: | 4326 didn't work |
| 11:59:29 | jlivni: | ah .. guess and check method. 32661 isnt well defined for most of the earth http://spatialreference.org/ref/epsg/32661/ |
| 11:59:30 | sigq: | Title: WGS 84 / UPS North: EPSG Projection -- Spatial Reference ( at spatialreference.org ) |
| 11:59:49 | jlivni: | whats your error with 4326 |
| 12:00:26 | iggy: | I'm guessing with what you told me the 4326 wasn't working because they were different projections or something |
| 12:00:38 | jlivni: | well anyway, for this particular case of checking within a buffer, using the totally wrong projection MAY not be an issue. i would recommend getting your points in there correctly, and then running the right qry |
| 12:00:50 | jlivni: | yes - that is why my stuff was not working, i was assuming, incorrectly, your points were in the database as 4326 |
| 12:01:11 | jlivni: | which they should be, assuming again of course that is your source data projection ( a reasonable guess ) |
| 12:01:24 | iggy: | they are US addresses |
| 12:01:31 | iggy: | if that helps |
| 12:01:49 | jlivni: | well, that doesn't mean anything regarding the projection of the lat/lng, but again i'm making a reasonable guess. |
| 12:01:56 | jlivni: | certainly they're not in 32661 |
| 12:02:15 | jlivni: | you should be able to add them doing GeomFromText( 'POINT( lat lon )', 4326 ) |
| 12:02:53 | iggy: | but I can just change the srid in geometry_columns and yeah, just update the points the same way I did the first time |
| 12:03:24 | jlivni: | yes - you should do that |
| 12:03:39 | jlivni: | then you can run the qry using 4326 |
| 12:03:51 | iggy: | should I drop the gist index first? |
| 12:03:54 | jlivni: | this is probably unrelated to your speed issue, but given you have a problem it is best to get things set up right from the start |
| 12:04:03 | iggy: | agreed |
| 12:04:05 | jlivni: | yes, you will need to recreate it anyway, since all your geoms will have changed |
| 12:05:10 | jlivni: | i dont know if geomfromtext( .. ) is faster or slower than setsrid( st_makepoint( .. ) ) but if it takes a long time to update your addresses you could test a little and see if one saves much time over the other |
| 12:05:50 | jlivni: | anyway. after updating, create the GiST index, and then run vacuum analyze, and then you should have quick results |
| 12:06:26 | iggy: | yeah, it took nearly 2 hours last time to update the addr_point column |
| 12:06:48 | iggy: | I've still got my test database sitting around that is only 50k or so records that I can test on |
| 12:06:53 | iggy: | it'll be a bit faster |
| 12:08:47 | jlivni: | i've gotten subsecond results with ~5M pts ... so 30 minutes is just WAY off and i think it wasn't using your indexes right for some reason. definitely check back here if you have troubles after getting your srids in order |
| 12:09:20 | jlivni: | remember if you need it within your actual buffer, use st_within |
| 12:10:47 | jlivni: | as an aside, i find that "st_distance( yourpts,buffer ) = 0" is also slightly faster than "st_within( yourpts, buffer )", but not to a huge degree |
| 12:19:58 | iggy: | I just gave it about 8+ hours of work to do, so you probably won't hear back from me till tomorrow |
| 15:20:55 | iggy: | jlivni: on my test data, I went through and made all those changes we talked about earlier, but I still have to set the radius of st_expand/st_buffer to that very small number |
| 15:22:21 | iggy: | and I remember where that number came from, it's $miles / 69.05 |
| 15:23:55 | iggy: | so it's supposed to represent .1 miles in degrees |
| 15:48:54 | jlivni: | yes - thats true now because you are using epsg:4326 which is in decimal degrees. previously you said ( unintentially ) your data was in units of meters |
| 15:49:21 | jlivni: | note that distance in decimal degrees varies depending where you are, so it's not at all exact |
| 15:49:45 | jlivni: | maybe +/- 30% difference north vs south of US ... something like that |
| 15:50:09 | jlivni: | so if you need it exact, transform to something other than decimal degrees, but anyway focus on that after getting everything fast |
| 15:51:21 | iggy: | I'm really starting to admire the people that do this for a living and know what they are doing |
| 15:51:37 | iggy: | like my brother does this stuff all day long for some oil company |
| 15:52:45 | iggy: | in any case, I was having to use that small number before because I thought it was in degrees too |
| 15:53:32 | iggy: | I get the same results with both srids |
| 15:55:25 | iggy: | I'm also wondering if there are memory settings I can change to help out |
| 15:55:53 | iggy: | the biggest postgres process is only using ~40M of memory |
| 15:56:12 | iggy: | I haven't found much info about tuning the memory settings in postgres |
| 15:56:28 | tazle: | iggy: what are you doing? |
| 15:57:00 | iggy: | pretending to be a dba with a geographical specialty |
| 15:57:36 | iggy: | I have a database with 200 million addresses in it that I need to query to find addrs within a radius of other addrs |
| 15:58:31 | iggy: | my first try took over 30mins to run one query |
| 15:58:45 | iggy: | which isn't really going to work for what we are trying to do |
| 15:59:57 | tazle: | did you already get the quer to use the index? |
| 16:00:09 | iggy: | that's what I'm waiting on right now |
| 16:00:15 | tazle: | "waiting"? |
| 16:00:21 | iggy: | we think it wasn't before |
| 16:00:38 | tazle: | use explain to see the query plan - there's no waiting to be done |
| 16:00:39 | iggy: | well, it takes a little while to index 200 mil records |
| 16:00:53 | tazle: | oh, you're reindexing |
| 16:00:59 | iggy: | oh, I'm redoing the index because I had messed up something else |
| 16:01:22 | tazle: | ah right, the SRID error |
| 16:02:41 | iggy: | yeah, so I dropped the geom column, recreated it, refilled it using the lat and lon columns, now it is indexing, then it has to do a vacuum analyze |
| 16:02:51 | iggy: | then I'll know whether it helps |
| 16:03:20 | iggy: | this database is pretty large |
| 16:05:11 | iggy: | it was sitting at 68G now it says it's 204G... I have the feeling that I'm going to have to reload all the data from scratch |
| 16:11:28 | darkblue_B: | hmm thats bigger than OpenStreetMap |
| 16:12:40 | darkblue_B: | sensor data.. or cell phones or somesuch, I'd guess |
| 16:13:45 | tazle: | well, if it's one row per address, it sounds reasonable |
| 16:14:01 | tazle: | however, that's a silly way to store addresses :P |
| 16:14:53 | tazle: | well, a silly way to store just addresses anyway - if there's some other data involved it's probably not so silly |
| 16:15:35 | iggy: | there's other data |
| 16:15:46 | iggy: | demographics, names, phone numbers, etc |
| 16:20:13 | iggy: | tazle: what would be a better way to store just addresses? |
| 16:21:14 | tazle: | iggy: normally addresses are stored by storing the first and last address numbers on each side of a road segment |
| 16:21:55 | darkblue_B: | like TIGER |
| 16:23:36 | tazle: | iggy: however, that is really useful just for converting an address text into a point, or a point into nearest address - I'm not sure if you're trying to do either |
| 16:23:47 | iggy: | interesting |
| 16:25:09 | iggy: | we have users that go on to our site, they type in an address, we look that up and get the lat/lon then find either the closest N addresses or we find the number within a certain mile radius |
| 16:27:17 | tazle: | iggy: but that is of addresses that you have some data for? |
| 16:27:33 | tazle: | iggy: ie. not just any valid addresses |
| 16:28:46 | iggy: | yes |
| 16:29:23 | iggy: | it's very common that they will limit the results based off of demographics that we have |
| 16:30:44 | iggy: | i.e. the 20 closest address to 123 anywhere street that the head of household makes more than 150k/yr and haven't moved within the last 3 years |
| 16:31:07 | iggy: | it's a marketing tool for realtors |
| 16:31:55 | iggy: | those goofy little cards you get in the mail trying to sell houses or letting you know that such and such realtor just sold a house in your neighborhood for the high price of X |
| 16:32:40 | iggy: | and yes, I know I'm going to hell for helping them fill people's mail boxes with junk mail |
| 16:48:28 | darkblue_B: | well.. targeted marketing mailers is annoying, but more efficeinet than blankets of *un*targeted marketing mailers :- ) |
| 16:49:18 | iggy: | all the docs I can find consider 4G of ram to be huge... I think they are a little outdated |
| 16:49:18 | darkblue_B: | I'm interested in writing things like you are, but for sustainability/efficiency/carbon pfftprint reasons |
| 16:49:41 | darkblue_B: | RAM is moving along |
| 16:49:57 | darkblue_B: | there has been talk of caharcterizing your server usage.. for example |
| 16:50:13 | darkblue_B: | you can pump up the RAM for a search, if its just you searching |
| 16:50:30 | darkblue_B: | but probably a bad idea if you get a few dozen searches going at once, ever |
| 16:50:33 | iggy: | unfortunately, this data ended up being a lot bigger than I thought, I was expecting ~30G of data, so the server has 48G |
| 16:50:48 | iggy: | the data ended up being 70G, so it won't all fit in linux's fs cache |
| 16:50:50 | darkblue_B: | 'cause you wil cause the server to begin disk thrashing |
| 16:51:15 | darkblue_B: | you are in the 'data warehousing' size of DBs there I'd say |
| 16:51:47 | darkblue_B: | so.. at OSCon I went to one of the Pg data warehousng talks |
| 16:52:10 | darkblue_B: | they said that for their system, they built a queue, so that no more than one query can execute, ever |
| 16:52:24 | iggy: | we have a seperate server that does all our transactional stuff ( i.e. people buying this data that we lookup, etc. ) |
| 16:52:25 | darkblue_B: | based on that, they can tune their memory footprint |
| 16:52:37 | iggy: | I considered that |
| 16:53:14 | darkblue_B: | it seems worthwhile for your situation |
| 16:53:53 | darkblue_B: | there is a Pg day on Tuesday at Linux World SF |
| 16:54:02 | darkblue_B: | if you are in the California Bay Are |
| 16:54:05 | darkblue_B: | Area |
| 16:54:35 | darkblue_B: | Pg is holding its own against MySQL for just this sort of purpose |
| 16:54:41 | iggy: | in texas unfortunately |
| 16:54:43 | darkblue_B: | .. by any objective measure |
| 16:55:08 | darkblue_B: | ah so.. well IRC it is then |
| 16:55:15 | iggy: | from what I saw when doing my research pg was doing better than mysql for this type of stuff |
| 16:55:21 | darkblue_B: | yes |
| 16:55:26 | iggy: | that's why I went with it |
| 16:55:34 | darkblue_B: | confirmed by industry gossip ;- ) |
| 16:55:52 | iggy: | our other db is actually mysql because that's what everybody was familiar with |
| 16:56:17 | darkblue_B: | super sized servers... strict data constraints.. fancy DB language tricks.. and &^&*( ^* huge datasets |
| 16:56:38 | darkblue_B: | MySQL is catching up on a lot of other fronts.. |
| 16:57:09 | darkblue_B: | ..and of course, PostGIS.. which is neat-o p ) |
| 16:57:38 | iggy: | seemed to be more mature than mysql's spatial stuff ( although I'm not sure which is actually older ) |
| 16:57:49 | darkblue_B: | indeed it is |
| 16:58:12 | darkblue_B: | I dont gather that spatal is a big target for MySQL now.. I saw no menton of it in 6.x stuff |
| 16:58:28 | darkblue_B: | PostGIS works rather well |
| 16:59:23 | darkblue_B: | iggy: you can see some things I have written here htt://24.7.79.159/portfolioMisc/ |
| 16:59:30 | darkblue_B: | http |
| 16:59:52 | darkblue_B: | also the video "Dynamic KML" filed at Google |
| 16:59:56 | darkblue_B: | filmed |
| 17:03:02 | iggy: | darkblue_B: nice, lots of stuff there that I'll be reading for the next week or so |
| 17:03:24 | iggy: | should print some of it out for my flight Wed. |
| 17:03:38 | tazle: | I'm not sure if fitting all the data into memory is that big of an issue, as long as you can fit all the necessary indexes in memory |
| 17:04:19 | darkblue_B: | good point tazle |
| 17:05:20 | iggy: | which it may not considering my dataset is ~70G and the tablespace dir with indexes is ~208G so far |
| 17:13:14 | tazle: | did you already vacuum the old index away? |
| 17:13:51 | iggy: | nein, it's indexing now then it will vacuum analyze |
| 17:16:48 | tazle: | well, given some characteristics of the tree used for indexing, and the number of objects, the space taken by the index would fairly easy to estimate |
| 17:17:35 | iggy: | I'm patient enough to wait |
| 17:17:46 | iggy: | any more knowledge and my head might explode |
| 18:14:26 | jlivni: | iggy: your db size will go down after vacuum ( actually a vacuum full will definitely compact it .. i think vacuum analyze does too but not sure offhand ) |
| 18:14:57 | jlivni: | once indexing is done, try your query. it should be fast. if its not, repost your sql and someone will be able to help i am sure |
| 18:15:40 | jlivni: | iggy: you can read the postgis tuning guide, an old copy of which is here http://postgis.refractions.net/pipermail/postgis-users/2006-March/011539.html for hints on upping mem for your use case |
| 18:15:42 | sigq: | Title: [postgis-users] Problem with Intersection ( at postgis.refractions.net ) |
| 18:16:59 | iggy: | jlivni: thanks for that url, looks very helpful |
| 18:17:36 | iggy: | I'm sure you'll hear from me every once in a while now that I've found out this place exists |
| 19:10:06 | bitnerd: | jlivni, iggy vacuum and vacuum analyze just mark areas to be reused -- it takes a vacuum full to actually compact the db |
| 19:10:17 | iggy: | rgr |
| 19:12:20 | bitnerd: | http://www.postgresql.org/docs/8.3/static/routine-vacuuming.html#VACUUM-FOR-SPACE-RECOVERY |
| 19:12:22 | sigq: | Title: PostgreSQL: Documentation: Manuals: PostgreSQL 8.3: Routine Vacuuming ( at www.postgresql.org ) |
| 19:14:42 | iggy: | I think I'm going to try clustering based on the zip code too |
| 19:16:47 | bitnerd: | clustering should do the same as vacuum full too |
| 19:17:11 | iggy: | that's what the documentation seemed to indicate |
| 19:21:00 | jlivni: | bitnerd: ah right, thx for clarifying that. |
| 21:31:07 | darkblue_B: | TheSteveMonkey: hey |