Monday, July 28, 2014

Denormalization using the Google Pipeline API

I've been using Google's MapReduce for AppEngine for some time. It uses the Pipeline API to connect its map, shuffle and reduce phases. It gave me ideas.

I grabbed that pipeline API and implemented a denormalizing pipeline. It'd receive data from each table from a relational database and denormalize the data to Appengine's Datastore (non relational). What the pipeline would help with is wait for the missing table so it could do the joins to complete the denormalization. It worked, but datastore writes were soaring, quickly making my app hit the daily budget.

I decided to run some tests. Yes, I should've ran them before coding the whole thing.



The first call to the run method writes 32 times to the datastore. Summing up all the writes and we have a total of 104. Each call to the run method that has a child writes around 30 times to DS, and the last one without a child writes just 8 times.



Now writes get down to business: 108 writes on the generator pipeline, 8 on the other calls to run without a child. 162 total on the rest, summing it all up to 270. Ouch.

By checking the RPC on that pricey one here's what I see:


98 writes on one put. Now what's in there? I clicked on evaluate and found a dp.put(entities_to_put), and on entities_to_put there are _SlotRecord entities, _PipelineRecord entities and _BarrierRecord entities, the whole pipeline pack.

This is what was letting loose the writing frenzy. I had set up a Generator, that would start some other generator and then they'd all feed data back up the pipeline.

Well. The pipeline API does an amazing job for mapreduce, but clearly not usefull at all for what I had in mind here. My plan ended up being overkill. I was a little deceived by the really simple samples at the getting started guide. They just show how simple it is to actually set one up, as you see in the gists above, they truely are. Denormalizing at the source is the way to go in this case. Data comes in denormalized and Datastore saves it, done.

Friday, June 13, 2014

BigQuery CSV Export getting cached?

The only way around this is randomizing the CSV filename. You'll have to do some house cleaning once in a while to get rid of the flood of files.

Creating a dummy empty CSV on Google Cloud Storage before the export with no cache control doesn't work. Don't even try. BigQuery overwrites it and changes the cache control to the default one.

Friday, March 16, 2012

I9100UHKG4 sucks

Android 2.3.4 on modem version I9100UHKG4 sucked severely on my SGS2. My data connection would get stuck uploading data without ever getting data downloaded back. The upload arrow was the only one lighting up. I needed to turn data connection OFF and then back ON to get internet working again.

I upgraded android to 2.3.6 and it came with modem version UHKE2. It worked like a charm, 100%, not a single time I had the problem I was constantly having before.

Two days ago I upgraded the OS to the Polish version of the 4.0.3 (ICS) and now I'm getting problems again. I'm getting "no service" after a whole day so I'll downgrade just the modem to the previous version. The current one is XXLPQ.

Friday, November 18, 2011

Decodificador NET HD é uma torradeira de HDMI

Minha Time Machine 2 tem duas portas HDMI atrás e uma porta lateral. Primeiro a porta HDMI onde estava conectado o decodificador da NET parou de funcionar. Troquei, coloquei na segunda porta traseira onde estava ligado o PS3. Essa também pifou. Comprei um cabo HDMI novo e conectei na última porta que me restou: a lateral. Essa também se foi.

O decodificador da NET destruiu todas as portas num intervalo de mais ou menos um mês cada uma. O técnico veio até minha casa e disse que o problema só podia ser na rede elétrica porque o decodificador tem proteção contra isso. Deixou outro aparelho e agora só tenho portas component disponíveis.

Não achei relatos semelhantes online então deixo aqui o meu.

Thursday, July 08, 2010

Amazon, take my dead tree books!

Now that I have a kindle what I want most is having all my books in it. All of them, including the ones I already have.

Amazon, take my dead tree books and give me a kindle edition of them in return.

That would make me happy, kill my book lendings thus making people get their own copies and it'd make you a profit on recycling.

Monday, March 15, 2010

Facebook Connect Cookies problem

I've been using Elliot Haughin's Facebook Connect Library for Codeigniter and I have an annoying has-to-be-fixed-asap issue going on with it. Whenever I'm away from the site and facebook for a long time, it behaves oddly with its own facebook cookies with the name beggining with the session hash. If I delete any one of those cookies, functionality is restored and my site loads again.

These are the troublemakers:

It's not related to my own session cookie since it can be set to timeout in 5 minutes, and even be removed and it'd all be working fine.

I tried contacting Elliot, but got no response at all. Still investigating...


Update (march 16th): I got no access to facebook today, I killed all their cookies and got my access back.

Update (march 31st): Fixed with the help of the codeigniter community.