Socorro Testing
Overview
This is a step by step guide to check your installation of Socorro and see if it works properly.
For the installation, refer to : Socorro Installation
Test each component
Check our product in the database
select * from products where product_name='PlaneShift'; select * from product_versions where product_name='PlaneShift'; select * from product_version_builds where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
The build_id is an important reference to keep in mind when troubleshooting as crashes will be associated to this information.
Generating a crash
The crash minidump should be generated by breakpad on the real app. The URL should be like: http://194.116.72.94/crash-reports/submit
When generating a crash there are some minimum fields you have to pass from your application, in particular:
- StartupTime : the time when the crash happened
- ProductName : the name of your product, example 'PlaneShift', should match products.product_name
- ReleaseChannel : the type of release, example 'alpha' or 'release'. Should match product_versions.build_type
- Version : the version, example '0.6.0', should match product_versions.release_version
- BuildID : the build id , example '201305051111', should match product_versions.build_date
BuildID may be skipped if you want. In our case we do not want to change the code to add the BuildID, so we created a transformation rule to add it automatically.
To create a transformation rule you need to add this:
INSERT INTO transform_rules VALUES (1, 'processor.json_rewrite',1,'socorro.processor.processor.json_PlaneShift_transform_action',,,'socorro.processor.processor.json_PlaneShift_transform_predicate',,);
Then edit
vi /data/socorro/application/socorro/processor/processor.py
and add these functions at the end of the file.
(Workaround) Generating a crash manually
Crash should be generate by you app through breakpad, but in case you do not have yet that part running, and you want to start configuring some of the server side, you can use the following just for a test.
Edit /data/socorro/application/socorro/collector/throttler.py adding these lines to def throttle(self, raw_crash):
       # check if user submitted the crash with minidump_upload and adjust parameters accordingly
       if 'prod' in raw_crash:
           raw_crash['ProductName'] = raw_crash['prod']
           raw_crash['Version'] = raw_crash['ver']
           raw_crash['ReleaseChannel'] = 'release'
           raw_crash['BuildID'] = '201305051111'
           raw_crash['CrashTime'] = '1377724842'
Please put in BuildID something you get from :
select * from product_version_builds where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
Please put in CrashTime a recent date, you can convert it with: http://www.timestampconvert.com/
It just remaps those 2 parameters as those are different from breakpad to minidump_upload program.
> cd /data/socorro/stackwalk/bin/ > ./minidump_upload -p PlaneShift -v 0.5.12 6cc10361-c469-1504-1d91efef-7b8e750c.dmp http://yoursite.org/crash-reports/submit
If the upload works you should get something like this:
Successfully sent the minidump file. Response: CrashID=bp-9e3504c5-0967-4b40-9563-aeadc2130829
Check if Collector receives your dump
In a working installation the URL: yoursite.org/crash-reports/submit should return a "None" value. If you get an internal error 500 or 404 then there is a problem. In both cases you should check the /var/log/apache2/error.log and see what is the problem.
When collector properly receives a dump you should see on the error.log this line:
> tail -f -n100 /var/log/apache2/error.log [Thu Aug 29 08:22:43 2013] [error] 2013-08-29 08:22:43,932 INFO - MainThread - 9e3504c5-0967-4b40-9563-aeadc2130829 received
where the hex string is the id of your dump.
Also you should see new files created in /home/socorro/primaryCrashStore, example for the dump above:
> ls /home/socorro/primaryCrashStore/20130828/name/9e/a8/9ea81c35-f847-4951-93a5-f75a92130828
-rw-rw-rw- 1 www-data socorro 309960 Aug 28 02:04 9ea81c35-f847-4951-93a5-f75a92130828.dump -rw-rw-rw- 1 www-data socorro 225 Aug 28 02:04 9ea81c35-f847-4951-93a5-f75a92130828.json
If in the log you see something like:
[Thu Aug 29 08:30:19 2013] [error] 2013-08-29 08:30:19,052 DEBUG - MainThread - deferring Planeshift 0.5.12
this means the crash has been 'throttled' in other words not processed due to the throttle rules you have in place.
If the dump is not throttled, so accepted, you should see this on the log
[Thu Aug 29 09:23:32 2013] [error] 2013-08-29 09:23:32,821 DEBUG - MainThread - not throttled Planeshift 0.5.12
Check if Monitor/Processor read the dump from disk 
Remember Monitor/Processor are started with Supervisor process. So ensure you have it running.
If the monitor finds the new file to process it should print something like this:
> tail -f -n100 /var/log/socorro/monitor-stderr.log:
2013-08-29 09:23:42,066 DEBUG - standard_job_thread - new job: 9ea81c35-f847-4951-93a5-f75a92130828
2013-10-04 17:54:06,877 DEBUG - standard_job_thread - list_of_processors_and_loads: 1, 0L, 'planeshift-2_23318' 2013-10-04 17:54:06,897 INFO - standard_job_thread - 9ea81c35-f847-4951-93a5-f75a92130828 assigned to processor planeshift-2_23318 (1)
If the processor finds the new job it should print something like this:
> tail -f -n100 /var/log/socorro/processor-stderr.log:
2013-08-29 09:23:43,919 DEBUG - QueuingThread - incomingJobStream yielding normal job 59cfcb3a-a1a0-492b-9c73-8ac032130829 2013-08-29 09:23:43,920 INFO - Thread-4 - starting job: 59cfcb3a-a1a0-492b-9c73-8ac032130829 2013-08-29 09:23:44,011 DEBUG - Thread-4 - skunk_classifier: reject - not a plugin crash 2013-08-29 09:23:44,012 INFO - Thread-4 - finishing successful job: 59cfcb3a-a1a0-492b-9c73-8ac032130829
It should also create a .jsonz file in the same dir of the crash, example:
> ls /home/socorro/primaryCrashStore/20130828/name/9e/a8/9ea81c35-f847-4951-93a5-f75a92130828
-rw-rw-rw- 1 www-data socorro 309960 Aug 28 02:04 9ea81c35-f847-4951-93a5-f75a92130828.dump -rw-rw-rw- 1 www-data socorro 225 Aug 28 02:04 9ea81c35-f847-4951-93a5-f75a92130828.json -rw-r--r-- 1 socorro socorro 8731 Aug 28 02:04 9ea81c35-f847-4951-93a5-f75a92130828.jsonz
If you see an error like this:
ProgrammingError: relation "raw_crashes_20130930" does not exist LINE 1: insert into raw_crashes_20130930 (uuid, raw_crash, date_proc...
it's because the crontabber is not running properly, see above "Create partitioned reports_* tables"
If you see just this line, means it's not working: INFO - QueuingThread - there is nothing to do.
Check if the dump information lands in postgres
The crash will go into the database, you can check it with a query:
select * from raw_crashes where uuid='59cfcb3a-a1a0-492b-9c73-8ac032130829'; select * from reports where uuid='59cfcb3a-a1a0-492b-9c73-8ac032130829';
Check if minidump_stackwalk works 
minidump_stackwalk is the program called by Processor to extract the information from the dump, like the stacktrace.
First you need to ensure your binaries (exe) have been compiled with debug symbols. On linux you can use gdb
$ gdb psclient.bin ... Reading symbols from /var/www/html/debugclients/linux64/psclient.bin...(no debugging symbols found)...done. ...
This means you didn't compile with debug info (-g on linux).
You can run minidump_stackwalk from the command line to verify first if it works.
Start with:
/data/socorro/stackwalk/bin/minidump_stackwalk -m /home/socorro/primaryCrashStore/20131021/name/8c/d9/8cd95e65-0cc7-4109-a677-50eff2131021/8cd95e65-0cc7-4109-a677-50eff2131021.dump /home/socorro/symbols 2>&1 | grep psclient
This will tell you which symbol files minidump is trying to load:
2013-10-22 12:05:12: minidump.cc:2227: INFO: MinidumpModule could not determine version for /home/davide/opt/PlaneShift-testing/psclient.bin 2013-10-22 12:05:12: simple_symbol_supplier.cc:196: INFO: No symbol file at /home/socorro/symbols/psclient.bin/7BBD939023653ABEB8C3B53E6801A10F0/psclient.bin.sym 2013-10-22 12:05:12: stackwalker.cc:92: INFO: Couldn't load symbols for: psclient.bin|7BBD939023653ABEB8C3B53E6801A10F0
In the example above you see he cannot find the symbol file. I have to ensure it's in the right dir and has the proper name.
Another thing you can check is if the generated .jsonz actually contains your stacktrace.
> cd /home/socorro/primaryCrashStore/20131021/name/8c/d9/8cd95e65-0cc7-4109-a677-50eff2131021/ > cp 8cd95e65-0cc7-4109-a677-50eff2131021.jsonz test.gz > gunzip test.gz > more test
You should see in the middle of the file the stacktrace.
Check if crontabber is doing his job
The following checks should be performed after the crontabber is executed at least one time
Check if he populated the adu tables for your product versions
select count(*) from product_adu where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift'); select count(*) from build_adu where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
select count(*) from crashes_by_user_build where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
select count(*) from crashes_by_user where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
IMPORTANT: The following tables are populated by socorro.cron.jobs.matviews.ReportsCleanCronApp which analyzes the crashes from 2 hours ago to 1 hour ago. So you need to wait 1 hour after the crash submission for these lines to be filled out.
Check if he populated the reports and reports_clean tables
select uuid,date_processed from reports where product = 'PlaneShift';
select uuid,date_processed from reports_clean where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
Find the signature of your crash:
select signature from reports where uuid='59cfcb3a-a1a0-492b-9c73-8ac032130829';
Then check if it landed in the signatures table:
select count(*) from signatures where signature='psclient@0x15201c';
Check if any of your crashes went into the reports_bad table.
select release_channel,signature from reports_bad, reports where reports_bad.uuid=reports.uuid and product='PlaneShift';
in this case it means something is wrong with the data of your crash. WARNING if you used miniupload_dump it may not work on this step.
IMPORTANT: The following tables are populated by socorro.cron.jobs.matviews.TCBSCronApp which analyses the previous day data, so if you don't have data in the previous day, it will result in empty lines.
Check if he populated the tcbs table
select count(*) from tcbs where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift'); select count(*) from tcbs_build where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
Check if your stats per day are populated correctly
For this function to work you need your raw_adu populated first. See Installation.
select * from raw_adu where product_name='PlaneShift'; select * from product_adu where product_version_id=23; select * from build_adu where product_version_id=23; tables involved seems to be: home_page_graph, home_page_graph_build, crashes_by_user, crashes_by_user_build, tcbs, tcbs_build
select uuid,date_processed from reports_clean where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
Build date has to be valid or the TCBS will fail:
select build from reports where product='PlaneShift'; select build_date(build) FROM reports_clean where product_version_id IN (select product_version_id from product_versions where product_name='PlaneShift');
select * from home_page_graph where product_version_id>16; select * from home_page_graph_build where product_version_id>16; select * from crashes_by_user where product_version_id>16; select * from crashes_by_user_build where product_version_id>16; select * from tcbs where product_version_id>16; select * from tcbs_build where product_version_id>16;
Generates signatures manually
SELECT update_signature_summary('2013-10-22')
Didn't do much... to be checked better.
Access the web UI
http://194.116.72.94/crash-stats/home/products/WaterWolf
Troubleshooting
Postgres useful commands:
> /etc/init.d/postgresql start > /etc/init.d/postgresql stop > psql -U planeshift -d breakpad breakpad# \dt (show tables) breakpad# \d products (describes table products)
unable to open database file
http://194.116.72.94/crash-stats/home/frontpage_json?product=PlaneShift&versions=0.5.12 unable to open database file
Request Method: GET Request URL: http://194.116.72.94/crash-stats/home/frontpage_json?product=PlaneShift&versions=0.5.12 Django Version: 1.4.5 Exception Type: OperationalError Exception Value:
unable to open database file
Exception Location: /data/socorro/webapp-django/vendor/lib/python/django/db/backends/sqlite3/base.py in _sqlite_create_connection, line 278
Answer: this is for authenticated sessions, it does not need to be the socorro postgres db but needs to be somewhere with write access :) either sqlite db or postgres/mysql/anything django supports
Solution: edit /data/socorro/webapp-django/crashstats/settings/base.py for database setting
'NAME': '/home/socorro/sqlite.crashstats.db'
> cp /data/socorro/webapp-django/sqlite.crashstats.db /home/socorro > chown www-data:socorro /home/socorro/sqlite.crashstats.db
raw_adu has not been updated
2013-08-29 11:32:49,656 DEBUG - MainThread - error when running <class 'socorro.cron.jobs.matviews.BuildADUCronApp'> on None Traceback (most recent call last):
File "/data/socorro/application/socorro/cron/crontabber.py", line 703, in _run_one for last_success in self._run_job(job_class, config, info): File "/data/socorro/application/socorro/cron/base.py", line 174, in main function(when) File "/data/socorro/application/socorro/cron/base.py", line 213, in _run_proxy self.run(connection, date) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 52, in run self.run_proc(connection, [target_date]) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 22, in run_proc cursor.callproc(self.get_proc_name(), signature)
InternalError: raw_adu has not been updated for 2013-08-28
This is caused by the raw_adu table not populated properly
crontabber skipping ... because it's not time to run
You are getting stuck because crontabber doesn't want to run the apps: it gives "skipping ... because it's not time to run"
Change /data/socorro/application/socorro/cron/crontabber.py to add more debug
   
    def time_to_run(self, class_, time_):
        """return true if it's time to run the job.
        This is true if there is no previous information about its last run
        or if the last time it ran and set its next_run to a date that is now
        past.
        """
        app_name = class_.app_name
        _debug = self.config.logger.debug
        try:
            info = self.database[app_name]
        except KeyError:
            if time_:
                h, m = [int(x) for x in time_.split(':')]
                # only run if this hour and minute is < now
                now = utc_now()
                _debug("LUCADEBUG time to run %s , curren time %s", time_, now)
                if now.hour > h:
                    return True
                elif now.hour == h and now.minute >= m:
                    return True
                return False
            else:
                # no past information, run now
                return True
        next_run = info['next_run']
        _debug("LUCADEBUG next run %s , current time %s", next_run, utc_now())
        if next_run < utc_now():
            return True
        return False
If you want to force your crontabber to run anyway:
 > rm /home/socorro/persistent/crontabbers.json
 > psql -U planeshift -d breakpad
 breakpad=# update crontabber_state set state='{}';
 > sudo -u socorro /data/socorro/application/scripts/crons/crontabber.sh
 > tail -f -n100 /var/log/socorro/crontabber.log
crontabber not populating product_adu
The current implementation of /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_build_adu.sql parses only the new builds of the last 6 days. If the build date of a product is older, then it skips it. Also it considers only the build_types 'nightly','aurora'. This is hardcoded for Firefox.
To ensure you index also old products and builds, do the following:
 > vi /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_build_adu.sql
 remove this clause from the first query (the second is for beta releases):
       --AND updateday <= ( bdate + 6 )
To accept also your build_type, do the following:
 > vi /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_build_adu.sql
 change this line:
 AND product_versions.build_type IN ('nightly','aurora','release')
Then reload the function:
> psql -U planeshift -d breakpad breakpad=# \i /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_build_adu.sql
same thing for:
/data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_build_adu.sql /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_crashes_by_user_build.sql /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_tcbs_build.sql /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_home_page_graph_build.sql /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_signatures.sql /data/socorro/application/socorro/external/postgresql/raw_sql/procs/update_nightly_builds.sql
crontabber reportsclean error
2013-12-04 10:00:01,963 DEBUG - MainThread - error when running <class 'socorro.cron.jobs.matviews.ReportsCleanCronApp'> on None Traceback (most recent call last): File "/data/socorro/application/socorro/cron/crontabber.py", line 703, in _run_one for last_success in self._run_job(job_class, config, info): File "/data/socorro/application/socorro/cron/base.py", line 174, in main function(when) File "/data/socorro/application/socorro/cron/base.py", line 213, in _run_proxy self.run(connection, date) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 168, in run self.run_proc(connection, [date]) File "/data/socorro/application/socorro/cron/jobs/matviews.py", line 22, in run_proc cursor.callproc(self.get_proc_name(), signature) IntegrityError: null value in column "os_version_id" violates not-null constraint DETAIL: Failing row contains (1281, null, 20131117000000, 2013-11-26 16:34:29+00, null, 2013-11-26 16:29:52.581116+00, 2, null, null, 1170, null, 00:00:00, Unknown, null, Browser, 24, 1245, Release, 13, 00:00:00, 5abb16a3-31f1-413c-9d72-46aa62131126, null). CONTEXT: SQL statement "INSERT INTO reports_clean_20131125 ...
select * from reports where signature like 'EMPTY%'; select * from reports where uuid='5abb16a3-31f1-413c-9d72-46aa62131126';
NO SOLUTION FOUND YET
problems with tcbs and home graph
You can try this URL to see which data should be populated, you can change the duration to go back in time more:
http://194.116.72.94/crash-stats/home/frontpage_json?product=PlaneShift&versions=0.5.9.10&duration=14 it calls CrashesPerAdu() then calls get_daily() in /data/socorro/application/socorro/external/postgresql/crashes.py which reads from home_page_graph table in the db this table is updated by HomePageGraphCronApp cron job in crontabber which launches /data/application/socorro/external/postgresql/raw_sql/procs/update_home_page_graph_build.sql queries reports_clean table, expecting data populated by /data/socorro/application/socorro/external/postgresql/raw_sql/procs/001_update_reports_clean.sql
References
Json crash example
 
  {
    "InstallTime": "1357622062",
    "Theme": "classic/1.0",
    "Version": "4.0a1",
    "id": "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}",
    "Vendor": "Mozilla",
    "EMCheckCompatibility": "true",
    "Throttleable": "0",
    "URL": "http://code.google.com/p/crashme/",
    "version": "20.0a1",
    "CrashTime": "1357770042",
    "ReleaseChannel": "nightly",
    "submitted_timestamp": "2013-01-09T22:21:18.646733+00:00",
    "buildid": "20130107030932",
    "timestamp": 1357770078.646789,
    "Notes": "OpenGL: NVIDIA Corporation -- GeForce 8600M GT/PCIe/SSE2 -- 3.3.0 NVIDIA 313.09 -- texture_from_pixmap\r\n",
    "StartupTime": "1357769913",
    "FramePoisonSize": "4096",
    "FramePoisonBase": "7ffffffff0dea000",
    "Add-ons": "%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D:20.0a1,crashme%40ted.mielczarek.org:0.4",
    "BuildID": "20130107030932",
    "SecondsSinceLastCrash": "1831736",
    "ProductName": "WaterWolf",
    "legacy_processing": 0,
    "ProductID": "{ec8030f7-c20a-464f-9b0e-13a3a9e97384}"
  }
Other older notes, DO NOT consider
Create a screen startup file “launchScorro” that'll be used for the Socorro scripts:
cd /home/planeshift/socorro . socorro-virtualenv/bin/activate export PYTHONPATH=. startup_message off autodetach on defscrollback 10000 termcap xterm 'Co#256:AB=\E[48;5;%dm:AF=\E[38;5;%dm' screen -S processor python socorro/processor/processor_app.py --admin.conf=./config/processor.ini screen -S monitor python socorro/monitor/monitor_app.py --admin.conf=./config/monitor.ini screen -S middleware python socorro/middleware/middleware_app.py --admin.conf=config/middleware.ini [NOT NEEDED as it runs inside Apache] screen -S collector python socorro/collector/collector_app.py --admin.conf=./config/collector.ini
Tweaking the database
 > INSERT INTO products VALUES ('PlaneShift','0.1','0.1','PlaneShift','0');
 > INSERT INTO product_versions VALUES (17,'PlaneShift','0.5','0.5.10','0.5.10',0,'0.5.10','2013-08-23','2013-12-23','f','Release','f','f',null);
If something goes wrong and you want to delete your product and versions
> DELETE from products where product_name='PlaneShift'; > DELETE from product_versions where product_name='PlaneShift'; > DELETE from releases_raw where product_name='PlaneShift'; > DELETE from product_productid_map where product_name='PlaneShift'; > DELETE from product_release_channels where product_name='PlaneShift';