Socorro Installation: Difference between revisions
| No edit summary | No edit summary | ||
| (24 intermediate revisions by the same user not shown) | |||
| Line 4: | Line 4: | ||
| This guide illustrates how to install Socorro with the minimum components needed for a medium sized project. Socorro is used at Mozilla to manage thousands of crashes a day and spanning multiple applications. The full architecture allows distributed loads, throttling, queuing and other optimization techniques required for such a large amount of crashes per day. Medium size projects like PlaneShift, may use a simplified architecture, which is described in this installation guide. | This guide illustrates how to install Socorro with the minimum components needed for a medium sized project. Socorro is used at Mozilla to manage thousands of crashes a day and spanning multiple applications. The full architecture allows distributed loads, throttling, queuing and other optimization techniques required for such a large amount of crashes per day. Medium size projects like PlaneShift, may use a simplified architecture, which is described in this installation guide. | ||
| This guide has been written from August 2013 to  | This guide has been written from August 2013 to November 2013. I've used a Debian Squeeze (6.0.7) server. | ||
| The official reference documentation at this point is pretty minimal and not complete enough to succeed in the install, anyway if you want to have a look it's located here: http://socorro.readthedocs.org/en/latest/installation.html | The official reference documentation at this point is pretty minimal and not complete enough to succeed in the install, anyway if you want to have a look it's located here: http://socorro.readthedocs.org/en/latest/installation.html | ||
| First you need to understand how the overall architecture looks like: | |||
| [[Image:breakpad_socorro.jpg|400px]] | |||
| The first step is to embed Breakpad in your client application, which is the component responsible to catch the crash and send the information to the server (Socorro). To allow Socorro to interpret the crash and give you a stacktrace as a result you will have to use "Dump Sym", a program which takes a debug file (like .pdb) and convert it into a .sym file for Socorro. | |||
| The results will then be displayed in a web app on the Socorro server, where you can see for each crash the stacktrace, the frequency, the platforms, ... | |||
| The full architecture schema is here: http://socorro.readthedocs.org/en/latest/generalarchitecture.html | The full architecture schema is here: http://socorro.readthedocs.org/en/latest/generalarchitecture.html | ||
| Line 12: | Line 20: | ||
| The architecture we are going to use is the following: | The architecture we are going to use is the following: | ||
| [[Image: | [[Image:socorro_wo_hbase3.jpg|600px]] | ||
| == Components == | == Components == | ||
| Line 36: | Line 44: | ||
| * The collector uses a filesystem to write the dumps to. | * The collector uses a filesystem to write the dumps to. | ||
| * Running inside Apache, its log files are written in the apache logs: /var/log/apache2/error.log | * Running inside Apache, its log files are written in the apache logs: /var/log/apache2/error.log | ||
| * main app is here: /data/socorro/application/socorro/collector/collector_app.py | |||
| * real app is here: /data/socorro/application/socorro/collector/wsgi_breakpad_collector.py | |||
| * storage class code is here: /data/socorro/application/socorro/external/fs/crashstorage.py | |||
| Collector writes the dumps in the primaryCrashStore directories configured in his ini file, then also creates 2 symlinks (date->name, name->date) to be used by Monitor process. | |||
| '''Monitor''' | '''Monitor''' | ||
| Line 46: | Line 59: | ||
| * real app is here: /data/socorro/application/socorro/monitor/monitor.py | * real app is here: /data/socorro/application/socorro/monitor/monitor.py | ||
| * is log file is here: /var/log/socorro/monitor.log | * is log file is here: /var/log/socorro/monitor.log | ||
| Note: the monitor traverses the date branch (example: /home/socorro/primaryCrashStore/20131121/date ) of the tree looking for links from date to name (example: /home/socorro/primaryCrashStore/20131121/name ) spanning symlink.  If it finds one, it knows that it has found a "new" crash.  It pushes the crash_id from that symlink into the 'jobs' table.  On doing that, it is supposed to delete the back link from name->date, then delete the forward link date->name. | |||
| '''Processor''' | '''Processor''' | ||
| Line 54: | Line 69: | ||
| * is main configuration file is /etc/socorro/processor.ini | * is main configuration file is /etc/socorro/processor.ini | ||
| * starting app is here: /data/socorro/application/socorro/processor/processor_app.py | * starting app is here: /data/socorro/application/socorro/processor/processor_app.py | ||
| * real app is here: /data/socorro/application/socorro/processor/ | * real app is here: /data/socorro/application/socorro/processor/legacy_processor.py | ||
| * is log file is here: /var/log/socorro/processor.log | * is log file is here: /var/log/socorro/processor.log | ||
| Line 85: | Line 100: | ||
|    socorro.cron.jobs.matviews.TCBSCronApp|1d|10:00 |    socorro.cron.jobs.matviews.TCBSCronApp|1d|10:00 | ||
|    socorro.cron.jobs.matviews.ADUCronApp|1d|10:00 |    socorro.cron.jobs.matviews.ADUCronApp|1d|10:00 | ||
|   socorro.cron.jobs.matviews.NightlyBuildsCronApp|1d|10:00 | |||
|   socorro.cron.jobs.matviews.DuplicatesCronApp|1h | |||
|   socorro.cron.jobs.matviews.ReportsCleanCronApp|1h | |||
|   #socorro.cron.jobs.bugzilla.BugzillaCronApp|1h | |||
|   socorro.cron.jobs.matviews.BuildADUCronApp|1d|10:00 | |||
|   socorro.cron.jobs.matviews.CrashesByUserCronApp|1d|10:00 | |||
|   socorro.cron.jobs.matviews.CrashesByUserBuildCronApp|1d|10:00 | |||
|   #socorro.cron.jobs.matviews.CorrelationsCronApp|1d|10:00 | |||
|   socorro.cron.jobs.matviews.HomePageGraphCronApp|1d|10:00 | |||
|   socorro.cron.jobs.matviews.HomePageGraphBuildCronApp|1d|10:00 | |||
|   socorro.cron.jobs.matviews.TCBSBuildCronApp|1d|10:00 | |||
|   socorro.cron.jobs.matviews.ExplosivenessCronApp|1d|10:00 | |||
|   socorro.cron.jobs.matviews.SignatureSummaryCronApp|1d|10:00 | |||
|   #socorro.cron.jobs.ftpscraper.FTPScraperCronApp|1h | |||
|   #socorro.cron.jobs.automatic_emails.AutomaticEmailsCronApp|1h | |||
|   #socorro.cron.jobs.modulelist.ModulelistCronApp|1d | |||
|    ... |    ... | ||
| Line 143: | Line 175: | ||
| == Database Structure == | == Database Structure == | ||
| I tried to generate a schema of the database...  | I tried to generate a schema of the database... click on it, and then click on the image again to see it full size. | ||
| [[Image:socorro_postgres.png| | [[Image:socorro_postgres.png|600px]] | ||
| == How to proceed == | == How to proceed == | ||
| Line 210: | Line 242: | ||
|    timezone = 'UTC' |    timezone = 'UTC' | ||
| Ensure your database is using UTF-8 encoding or you will not be able to load special characters. Check in the conf file above where your data files are located, then: | |||
|   > export PATH=/usr/lib/postgresql/9.2/bin/:$PATH | |||
|   > initdb -D /var/lib/postgresql/9.2/main -E utf8 | |||
|    > service postgresql restart |    > service postgresql restart | ||
| Line 297: | Line 334: | ||
|    > export PYTHONPATH=. |    > export PYTHONPATH=. | ||
|    > (execute here your command) |    > (execute here your command) | ||
| '''Populate PostgreSQL Database''' | '''Populate PostgreSQL Database''' | ||
| Line 354: | Line 392: | ||
|    > screen -S middleware python socorro/middleware/middleware_app.py --admin.conf=config/middleware.ini |    > screen -S middleware python socorro/middleware/middleware_app.py --admin.conf=config/middleware.ini | ||
|    > screen -S collector python socorro/collector/collector_app.py --admin.conf=./config/collector.ini |    > screen -S collector python socorro/collector/collector_app.py --admin.conf=./config/collector.ini | ||
| == Deploy the apps for production usage == | == Deploy the apps for production usage == | ||
| Line 360: | Line 397: | ||
| '''Install prerequisites''' | '''Install prerequisites''' | ||
|    > apt-get install supervisor rsyslog libapache2-mod-wsgi memcached |    > apt-get install supervisor rsyslog libapache2-mod-wsgi memcached locales | ||
| '''Setup directory structure''' | '''Setup directory structure''' | ||
| Line 447: | Line 484: | ||
|    #socorro.cron.jobs.automatic_emails.AutomaticEmailsCronApp|1h |    #socorro.cron.jobs.automatic_emails.AutomaticEmailsCronApp|1h | ||
|    #socorro.cron.jobs.modulelist.ModulelistCronApp|1d |    #socorro.cron.jobs.modulelist.ModulelistCronApp|1d | ||
| Fix umask if necessary. I have seen the umask is important to avoid problems on file access rights. I was getting files with "drwxr-sr-x" access under primaryCrashStorage/date/ , and this was preventing monitor/processor to work properly. Edit collector.ini and processor.ini and ensure | |||
|   umask='0' | |||
| '''Define your throttling conditions''' | '''Define your throttling conditions''' | ||
| Line 474: | Line 516: | ||
| Reference: https://code.google.com/p/google-breakpad/wiki/LinuxStarterGuide#Producing_symbols_for_your_application | Reference: https://code.google.com/p/google-breakpad/wiki/LinuxStarterGuide#Producing_symbols_for_your_application | ||
| Check the first line of your psclient.bin.sym file. | Check the first line of your psclient.bin.sym file. Always use the name you see in the first line. | ||
|    MODULE Linux x86 8875D5CEB1B52779813E4DBC39125CCA0 psclient.bin |    MODULE Linux x86 8875D5CEB1B52779813E4DBC39125CCA0 psclient.bin | ||
|    $ mkdir -p /home/socorro/symbols/psclient.bin/8875D5CEB1B52779813E4DBC39125CCA0 |    $ mkdir -p /home/socorro/symbols/psclient.bin/8875D5CEB1B52779813E4DBC39125CCA0 | ||
|    $ mv /home/planeshift/wwwdoc/debugclients/linux32/psclient.bin.sym /home/socorro |    $ mv /home/planeshift/wwwdoc/debugclients/linux32/psclient.bin.sym /home/socorro/symbols/psclient.bin/6EDC6ACDB282125843FD59DA9C81BD830 | ||
|    MODULE Linux x86_64 1DAF92E44396B0BAB23CF47416609AB60 psclient.bin |    MODULE Linux x86_64 1DAF92E44396B0BAB23CF47416609AB60 psclient.bin | ||
|    $ mkdir -p /home/socorro/symbols/psclient.bin/1DAF92E44396B0BAB23CF47416609AB60 |    $ mkdir -p /home/socorro/symbols/psclient.bin/1DAF92E44396B0BAB23CF47416609AB60 | ||
|    $ mv /home/planeshift/wwwdoc/debugclients/linux64/psclient.bin.sym /home/socorro |    $ mv /home/planeshift/wwwdoc/debugclients/linux64/psclient.bin.sym /home/socorro/symbols/psclient.bin/1DAF92E44396B0BAB23CF47416609AB60 | ||
|   MODULE windows x86 064E641AD2E849CF927A2C3AEB828E804 psclient_static.pdb | |||
|   $ mkdir -p /home/socorro/symbols/psclient_static.pdb/064E641AD2E849CF927A2C3AEB828E804 | |||
|   $ mv /home/planeshift/wwwdoc/debugclients/win32/psclient_static.sym /home/socorro/symbols/psclient_static.pdb/064E641AD2E849CF927A2C3AEB828E804 | |||
| If you want to check if ministackwalk works, look at the testing section below. | If you want to check if ministackwalk works, look at the testing section below. | ||
| Line 507: | Line 554: | ||
|    > psql -U planeshift -d breakpad |    > psql -U planeshift -d breakpad | ||
|    > SELECT add_new_product('PlaneShift', '0.5.10',' |    > SELECT add_new_product('PlaneShift', '0.5.10'); | ||
|    > SELECT add_new_release ('PlaneShift','0.5.10','Release', |   > SELECT add_new_release ('PlaneShift','0.5.10','Release',20130505000000,'Windows',NULL,'release','f','f'); | ||
|    > SELECT add_new_release ('PlaneShift','0.5.10','Release',20130505000000,'Linux',NULL,'release','f','f'); | |||
|    > select update_product_versions(200);  // generates products version info for older releases, 200 days. |    > select update_product_versions(200);  // generates products version info for older releases, 200 days. | ||
| Definition of add_new_product: | |||
|   CREATE OR REPLACE FUNCTION add_new_product(prodname text, initversion major_version, prodid text DEFAULT NULL::text, ftpname text DEFAULT NULL::text, release_throttle numeric DEFAULT 1.0, rapid_beta_version numeric DEFAULT 999.0) | |||
| Definition of add_new_release: | |||
|   CREATE OR REPLACE FUNCTION add_new_release(product citext, version citext, release_channel citext, build_id numeric, platform citext, beta_number integer DEFAULT NULL::integer, repository text DEFAULT 'release'::text, update_products boolean DEFAULT false, ignore_duplicates boolean DEFAULT false) | |||
| The last line inserts the data into 'product_versions' table. The default update_product_versions() checks only the new releases within a 30 days timeframe from current date. But if you call it like: select update_product_versions(200) it will then do it for previous 200 days. In my case the release date was older than 30 days, so was not showing up in the web UI. | The last line inserts the data into 'product_versions' table. The default update_product_versions() checks only the new releases within a 30 days timeframe from current date. But if you call it like: select update_product_versions(200) it will then do it for previous 200 days. In my case the release date was older than 30 days, so was not showing up in the web UI. | ||
| Please note the BuildID HAS to be 14 digits like shown above. | |||
| Set the default version for your product, this is required or you will get errors in the front page when there is no version selected. | Set the default version for your product, this is required or you will get errors in the front page when there is no version selected. | ||
| Line 551: | Line 607: | ||
| After you have done this the crontabber will populate the other tables for you, including: product_adu, build_adu, ... | After you have done this the crontabber will populate the other tables for you, including: product_adu, build_adu, ... | ||
| You can make this insertion automatic with a script like this: | |||
|   #!/bin/sh | |||
|   dbname="breakpad" | |||
|   username="planeshift" | |||
|   datevalue=$(date +"%Y-%m-%d") | |||
|   psql $dbname $username << EOF | |||
|    INSERT INTO raw_adu ( | |||
|    SELECT 1000,'$datevalue', v.product_name, platform, platform, release_version, build_id, release_channel, 'dummy_product_guid', now() | |||
|    FROM product_versions v, product_version_builds b, product_release_channels c | |||
|    WHERE v.product_version_id=b.product_version_id AND v.product_name=c.product_name AND v.product_name='PlaneShift' | |||
|   ); | |||
|   EOF | |||
| '''Add 'Unknown' os and os_version''' | |||
| At the time of this writing, the base dataset is not having the Unknown os name and version, and this is causing some dumps to fail. To fix this you need to add it: | |||
|   INSERT INTO os_names (os_name,os_short_name) VALUES('Unknown','unk'); | |||
|   INSERT INTO os_versions (major_version,minor_version,os_name,os_version_id,os_version_string) values (0,0,'Unknown',76,'Unknown'); | |||
| If you have those 2 entries already, then you are fine. | |||
| '''Configure the link to socorro home page''' | |||
| If you want to change the link at the top of the home page, when you click on the logo or the "mozilla crash reports" text, you can: | |||
|   Edit /data/socorro/webapp-django/crashstats/base/templates/crashstats_base.html | |||
|   Change the href at line 18: | |||
|   <a href="/crash-stats/home/products/PlaneShift"> | |||
|   Change the text at line 20: | |||
|   <span class="title">PlaneShift Crash Reports</span> | |||
| '''Disable Crashmover''' | |||
| We do not use Crashmover, so we want to disable it: | |||
|   > mv /etc/supervisor/conf.d/2-socorro-crashmover.conf /etc/supervisor/conf.d/2-socorro-crashmover.conf_deleted | |||
| '''Cronjobs for Socorro''' | '''Cronjobs for Socorro''' | ||
| Line 559: | Line 655: | ||
|    edit crontab: |    edit crontab: | ||
|    > crontab -e |    > crontab -e | ||
|    */5 * * * *  |    */5 * * * * /data/socorro/application/scripts/crons/crontabber.sh | ||
| '''Configure and start daemons''' | '''Configure and start daemons''' | ||
| Line 636: | Line 732: | ||
|    > cp /data/socorro/webapp-django/sqlite.crashstats.db /home/socorro |    > cp /data/socorro/webapp-django/sqlite.crashstats.db /home/socorro | ||
|    > chown www-data:socorro /home/socorro/sqlite.crashstats.db |    > chown www-data:socorro /home/socorro/sqlite.crashstats.db | ||
| '''Showing line numbers in the stacktrace also without an active source code link''' | |||
| The current UI will display the source code lines in the stacktrace only in case the source code is linked to socorro (I don't know how to do this). | |||
| In case you don't have it linked, it will just display the name of the file. If you want to have the name of the file and the line number do the following: | |||
| edit /data/socorro/webapp-django/crashstats/crashstats/utils.py and at line 167 add the following: | |||
|   frame['source_link'] = 'dummy' | |||
|   frame['source_filename'] = '%s:%s' % (source,source_line) | |||
| This should be outside of the "if source:" block. | |||
| == Test each component == | == Test each component == | ||
Latest revision as of 11:46, 14 December 2013
Overview
This guide illustrates how to install Socorro with the minimum components needed for a medium sized project. Socorro is used at Mozilla to manage thousands of crashes a day and spanning multiple applications. The full architecture allows distributed loads, throttling, queuing and other optimization techniques required for such a large amount of crashes per day. Medium size projects like PlaneShift, may use a simplified architecture, which is described in this installation guide.
This guide has been written from August 2013 to November 2013. I've used a Debian Squeeze (6.0.7) server.
The official reference documentation at this point is pretty minimal and not complete enough to succeed in the install, anyway if you want to have a look it's located here: http://socorro.readthedocs.org/en/latest/installation.html
First you need to understand how the overall architecture looks like:
The first step is to embed Breakpad in your client application, which is the component responsible to catch the crash and send the information to the server (Socorro). To allow Socorro to interpret the crash and give you a stacktrace as a result you will have to use "Dump Sym", a program which takes a debug file (like .pdb) and convert it into a .sym file for Socorro.
The results will then be displayed in a web app on the Socorro server, where you can see for each crash the stacktrace, the frequency, the platforms, ...
The full architecture schema is here: http://socorro.readthedocs.org/en/latest/generalarchitecture.html
The architecture we are going to use is the following:
Components
This section lists the components and their usage. There is a reference to their configuration files after you have "deployed for production" (this will be explained later). You don't need to read through all of this chapter now, but you can use it later on as a reference when you are troubleshooting.
Supervisor
Supervisor is the process which starts the needed components of Socorro, namely he starts Processor and Monitor. You can start and stop supervisor with:
> /etc/init.d/supervisor stop > /etc/init.d/supervisor start
- His configuration file is located in /etc/supervisor
- The configuration files of the processes he starts are located in /etc/supervisor/conf.d
- His log files are located in /var/log/supervisor/
Collector
- This is the application which receives the dump file from your application. This is the first piece we would like to have working.
- In our simplified install it runs inside apache, so it’s not started by supervisor. There are ways to have it run separately, but we are not interested in this type of configuration.
- His main configuration file is /etc/socorro/collector.ini
- The collector uses a filesystem to write the dumps to.
- Running inside Apache, its log files are written in the apache logs: /var/log/apache2/error.log
- main app is here: /data/socorro/application/socorro/collector/collector_app.py
- real app is here: /data/socorro/application/socorro/collector/wsgi_breakpad_collector.py
- storage class code is here: /data/socorro/application/socorro/external/fs/crashstorage.py
Collector writes the dumps in the primaryCrashStore directories configured in his ini file, then also creates 2 symlinks (date->name, name->date) to be used by Monitor process.
Monitor
Monitor polls the file system and then queues up the crashes in Postgres.
- Monitor is started by supervisor
- His main configuration file is /etc/socorro/monitor.ini
- starting app is here: /data/socorro/application/socorro/monitor/monitor_app.py
- real app is here: /data/socorro/application/socorro/monitor/monitor.py
- is log file is here: /var/log/socorro/monitor.log
Note: the monitor traverses the date branch (example: /home/socorro/primaryCrashStore/20131121/date ) of the tree looking for links from date to name (example: /home/socorro/primaryCrashStore/20131121/name ) spanning symlink. If it finds one, it knows that it has found a "new" crash. It pushes the crash_id from that symlink into the 'jobs' table. On doing that, it is supposed to delete the back link from name->date, then delete the forward link date->name.
Processor
Processor polls Postgres and consumes the job that Monitor just queued, also it uses the filesystem to process some data on the dumps and runs minidump_stackwalk to process the actual dump
- Processor is started by supervisor
- is main configuration file is /etc/socorro/processor.ini
- starting app is here: /data/socorro/application/socorro/processor/processor_app.py
- real app is here: /data/socorro/application/socorro/processor/legacy_processor.py
- is log file is here: /var/log/socorro/processor.log
Middleware
Middleware is a layer of API which is called by the various components to execute their operations, in particular is used by the webapp UI to query the database.
- Runs inside Apache with wsgi, so it's not started by supervisor
- Real app is here : /data/socorro/application/socorro/middleware/middleware_app.py
- Configuration file: /etc/socorro/middleware.ini
- Running inside Apache, its log files are written in the apache logs: /var/log/apache2/error.log
Webapp
This is the UI you use to visualize the latest crashes and stacktraces.
- Doesn't have a specific process running, as it's made of web pages calling the middleware
- It's a Django application (you can google it).
- Located here: /data/socorro/webapp-django
- Configuration file: /data/socorro/webapp-django/crashstats/settings/base.py
- Configuration file: /data/socorro/webapp-django/crashstats/settings/local.py
Crontabber
This is the batch process which executes a number of jobs from crontab. If you want a full list of what he does look at the file: /data/socorro/application/socorro/cron/crontabber.py . Examples are:
socorro.cron.jobs.weekly_reports_partitions.WeeklyReportsPartitionsCronApp|7d socorro.cron.jobs.matviews.ProductVersionsCronApp|1d|10:00 socorro.cron.jobs.matviews.SignaturesCronApp|1d|10:00 socorro.cron.jobs.matviews.TCBSCronApp|1d|10:00 socorro.cron.jobs.matviews.ADUCronApp|1d|10:00 socorro.cron.jobs.matviews.NightlyBuildsCronApp|1d|10:00 socorro.cron.jobs.matviews.DuplicatesCronApp|1h socorro.cron.jobs.matviews.ReportsCleanCronApp|1h #socorro.cron.jobs.bugzilla.BugzillaCronApp|1h socorro.cron.jobs.matviews.BuildADUCronApp|1d|10:00 socorro.cron.jobs.matviews.CrashesByUserCronApp|1d|10:00 socorro.cron.jobs.matviews.CrashesByUserBuildCronApp|1d|10:00 #socorro.cron.jobs.matviews.CorrelationsCronApp|1d|10:00 socorro.cron.jobs.matviews.HomePageGraphCronApp|1d|10:00 socorro.cron.jobs.matviews.HomePageGraphBuildCronApp|1d|10:00 socorro.cron.jobs.matviews.TCBSBuildCronApp|1d|10:00 socorro.cron.jobs.matviews.ExplosivenessCronApp|1d|10:00 socorro.cron.jobs.matviews.SignatureSummaryCronApp|1d|10:00 #socorro.cron.jobs.ftpscraper.FTPScraperCronApp|1h #socorro.cron.jobs.automatic_emails.AutomaticEmailsCronApp|1h #socorro.cron.jobs.modulelist.ModulelistCronApp|1d
...
- It's run through crontab
- Located here: /data/socorro/application/socorro/cron/crontabber.py
- has it's execution directory here: /home/socorro/persistent/
- Configuration file: /etc/socorro/crontabber.ini
- is log file is here: /var/log/socorro/crontabber.log
database_file='/home/socorro/persistent/crontabbers.json'
CrashMover (DO NOT USE)
Crashmover is an additional component not used in our simplified install. Just forget about him and all related config files.
- Just for reference, it's started by supervisor
- starting app is here: /data/socorro/application/scripts/newCrashMover.py
- real app is here: /data/socorro/application/socorro/storage
We may want to disable it (Need to understand how)
Directory structure
Before proceeding with the installation, it's important you understand the directory structure which will be created by the standard install, so you can troubleshoot more easily the installation if needed.
/home/planeshift/socorro
This is where I’ve checked out the sources and did the initial Socorro installation. Our project is called "planeshift" and our default user on that server was "planeshift" as well. This is the initial enviroment where everything gets built and tested. It's called the "development environment" as it's used for internal testing and not for production usage. When the installation is completed, you will deploy the necessary pieces to the other directories with the procedure "deploying in production" (see below). After the production deployment is done, none of the files (including configs) in this dir will be used anymore.
/etc/supervisor/conf.d
Contains supervisor ini files, like 1-socorro-processor.conf 2-socorro-crashmover.conf 3-socorro-monitor.conf
These scripts point the supervisor to execute the apps in /data/socorro/application/
/etc/socorro
Contains all .ini files like: collector.ini crashmover.ini monitor.ini processor.ini
/home/socorro
This is the primary storage location for uploaded minidumps, no configuration files are present
/data/socorro
Contains all applications as executed in the production environment
Please note there are configuration files under /data/socorro/application/config like collector.ini crashmover.ini monitor.ini processor.ini, but those are NOT used in the final install, as one step of the install is to copy those under /etc/socorro, where the final configuration files will reside.
/var/log/socorro
Contains the logs from the applications like Monitor and Processor.
/var/log/supervisor
Contains the log of the supervisor.
Database Structure
I tried to generate a schema of the database... click on it, and then click on the image again to see it full size.
How to proceed
These are the steps we are going to follow for the installation:
- install all components as per Mozilla instructions
- deploy the components to the "production environment", which is nothing else than other directories
- Test and troubleshoot each of the component installed
Install all components
For this chapter we assume you just have a clean operating system install and none of the components is actually installed. In your case some of the components may already be there, just check the versions in case.
I've taken notes of the versions which were installed on my system. (the "Setting ..." lines)
Install build essentials
> apt-get install build-essential subversion (already present)
Install python software
> apt-get install python-software-properties
Setting up python-apt-common (0.7.100.1+squeeze1) ... Setting up python-apt (0.7.100.1+squeeze1) ... Setting up iso-codes (3.23-1) ... Setting up lsb-release (3.2-23.2squeeze1) ... Setting up python-gnupginterface (0.3.2-9.1) ... Setting up unattended-upgrades (0.62.2) ... Setting up python-software-properties (0.60.debian-3) ...
> apt-get install libpq-dev python-virtualenv python-dev
Setting up libpython2.6 (2.6.6-8+b1) ... Setting up python2.6-dev (2.6.6-8+b1) ... Setting up python-dev (2.6.6-3+squeeze7) ... Setting up python-pkg-resources (0.6.14-4) ... Setting up python-setuptools (0.6.14-4) ... Setting up python-pip (0.7.2-1) ... Setting up python-virtualenv (1.4.9-3squeeze1) ...
> apt-get install python2.6 python2.6-dev
Install postgres 9.2
In my case I was using Debian squeeze. On this release the default postgres is 8.4, which is too old to work with socorro because it doesn't have JSON support. So I needed to update the repos to have postgres 9.2
Create /etc/apt/sources.list.d/pgdg.list and add this line: deb http://apt.postgresql.org/pub/repos/apt/ squeeze-pgdg main
> wget --quiet -O - http://apt.postgresql.org/pub/repos/apt/ACCC4CF8.asc | sudo apt-key add -
> sudo apt-get update
> apt-get install postgresql-9.2 postgresql-plperl-9.2 postgresql-contrib-9.2 postgresql-server-dev-9.2
Ensure that timezone is set to UTC
> vi /etc/postgresql/9.2/main/postgresql.conf
timezone = 'UTC'
Ensure your database is using UTF-8 encoding or you will not be able to load special characters. Check in the conf file above where your data files are located, then:
> export PATH=/usr/lib/postgresql/9.2/bin/:$PATH > initdb -D /var/lib/postgresql/9.2/main -E utf8
> service postgresql restart
Install other needed components
> apt-get install rsync libxslt1-dev git-core mercurial > apt-get install python-psycopg2 > apt-get install libsasl2-dev
Add a new superuser account to postgres
(executed as root)
> su - postgres -c "createuser -s planeshift"
Remove security layer for postgres (this is to avoid the PostgreSQL Error "FATAL: Peer authentication failed")
Edit /etc/postgresql/9.2/main/pg_hba.conf and change the following line from 'peer' to 'trust': host all all 127.0.0.1/32 peer host all all 127.0.0.1/32 trust
> service postgresql restart
Download and install Socorro
(executed as planeshift)
> cd > git clone --depth=1 https://github.com/mozilla/socorro socorro > cd socorro > git fetch origin --tags --depth=1 > git checkout 56 (chosen release 56 as the stable one)
Node/Nmp is required, install it:
> apt-get install openssl libssl-dev > git clone https://github.com/joyent/node.git > cd node > git tag > git checkout v0.9.12 > ./configure --openssl-libpath=/usr/lib/ssl > make > make test > sudo make install > node -v # this line checks if its running!
Update python-pip (as root):
> pip install --upgrade pip > /home/planeshift/socorro/socorro-virtualenv/bin/pip install --upgrade pip
Install lessc
> npm install less -g
Install json_extensions for use with PostgreSQL
From inside the Socorro checkout
> export PATH=$PATH:/usr/lib/postgresql/9.2/bin > make json_enhancements_pg_extension
Run unit/functional tests
From inside the Socorro checkout
> make test
Install minidump_stackwalk
From inside the Socorro checkout
This is the binary which processes breakpad crash dumps into stack traces:
> make minidump_stackwalk
Setup environment
> make bootstrap-dev (this line is needed only one time)
Everytime you want to run socorro commands you will have to:
> . socorro-virtualenv/bin/activate > export PYTHONPATH=. > (execute here your command)
Populate PostgreSQL Database
as user root
> cd socorro > psql -f sql/roles.sql postgres
as user planeshift
> cd socorro > . socorro-virtualenv/bin/activate > export PYTHONPATH=.
You cannot start from an empty database, as there are multiple tables (example list of operating systems) which have to be populated for the system to work. For this reason you need to use --fakedata, which loads some of those tables together with some sample products (WaterWolf, NightTrain).
I don't remember which one of the two below worked, but it's one of the two, the other should give you an error:
> ./socorro/external/postgresql/setupdb_app.py --database_name=breakpad --fakedata --dropdb --database_superusername=breakpad_rw --database_superuserpassword=bPassword > ./socorro/external/postgresql/setupdb_app.py --database_name=breakpad --fakedata --database_superusername=planeshift --dropdb
Create partitioned reports_* tables
Socorro uses PostgreSQL partitions for the reports table, which must be created on a weekly basis.
Normally this is handled automatically by the cronjob scheduler crontabber but can be run as a one-off:
(inside the virtual environment, see above) > python socorro/cron/crontabber.py --job=weekly-reports-partitions --force
I needed to run it as I was getting an error on Processor saying:
ProgrammingError: relation "raw_crashes_20130826" does not exist LINE 1: insert into raw_crashes_20130826 (uuid, raw_crash, date_proc...
After running the command above, the table raw_crashes_20130826 was created.
Run socorro in dev mode
The dev mode is basically a version of socorro which runs only on the local server, launching the services manually, and running on ports 8882 and 8883. Usually you don't want this in production, but can be useful just to make a test if everything works.
Copy default config files
> cp config/collector.ini-dist config/collector.ini > cp config/processor.ini-dist config/processor.ini > cp config/monitor.ini-dist config/monitor.ini > cp config/middleware.ini-dist config/middleware.ini
Run the apps
> cd socorro > . socorro-virtualenv/bin/activate > export PYTHONPATH=.
> screen -S processor python socorro/processor/processor_app.py --admin.conf=./config/processor.ini > screen -S monitor python socorro/monitor/monitor_app.py --admin.conf=./config/monitor.ini > screen -S middleware python socorro/middleware/middleware_app.py --admin.conf=config/middleware.ini > screen -S collector python socorro/collector/collector_app.py --admin.conf=./config/collector.ini
Deploy the apps for production usage
Install prerequisites
> apt-get install supervisor rsyslog libapache2-mod-wsgi memcached locales
Setup directory structure
> mkdir /etc/socorro > mkdir /var/log/socorro > mkdir -p /data/socorro > useradd socorro > chown socorro:socorro /var/log/socorro > mkdir -p /home/socorro/primaryCrashStore /home/socorro/fallback /home/socorro/persistent > chown www-data:socorro /home/socorro/primaryCrashStore /home/socorro/fallback /home/socorro/persistent > chmod 2775 /home/socorro/primaryCrashStore /home/socorro/fallback /home/socorro/persistent
Install Socorro for production
> cd /home/planeshift/socorro > make install
(as root) > cd /home/planeshift/socorro > cp config/*.ini /etc/socorro/
To setup properly the configuration files you have two ways:
- Download the ones I used and modify as needed for your installation
- Ask the various app to generate the ini for you and then modify as needed for your installation
Generate your own /etc/socorro/collector.ini
> login as socorro user > export PYTHONPATH=/data/socorro/application:/data/socorro/thirdparty > python /data/socorro/application/socorro/collector/collector_app.py --admin.conf=/etc/socorro/collector.ini --help > python /data/socorro/application/socorro/collector/collector_app.py --admin.conf=/etc/socorro/collector.ini --admin.dump_conf=/tmp/c1.ini > cp /tmp/c1.ini /etc/socorro/collector.ini
The main important parameters for collector.ini are:
wsgi_server_class='socorro.webapi.servers.ApacheModWSGI' (tells collector to run inside Apache)
fs_root='/home/socorro/primaryCrashStore' (points to the directory where your disk storage is)
crashstorage_class='socorro.external.fs.crashstorage.FSDatedRadixTreeStorage' (uses the new FSDatedRadixTreeStorage class which organizes files by date in the disk storage area)
IMPORTANT NOTE: all your processes should use the same crashstorage_class or they will not be able to find the files
Generate your own /etc/socorro/processor.ini
> login as socorro user > export PYTHONPATH=/data/socorro/application:/data/socorro/thirdparty > python /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/etc/socorro/processor.ini --help > chown www-data:socorro /home/socorro [NOT NEEDED? WAS root:root] > python /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/etc/socorro/processor.ini --source.crashstorage_class=socorro.external.fs.crashstorage.FSDatedRadixTreeStorage --admin.dump_conf=/tmp/p1.ini > edit p1.ini file manually and delete everything inside [c_signature] > python /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/tmp/p1.ini --admin.dump_conf=/tmp/p2.ini --destination.storage_classes='socorro.external.postgresql.crashstorage.PostgreSQLCrashStorage, socorro.external.fs.crashstorage.FSRadixTreeStorage' > edit p2.ini file manually and delete everything inside [c_signature] > python /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/tmp/p2.ini --admin.dump_conf=/tmp/p3.ini --destination.storage1.crashstorage_class=socorro.external.fs.crashstorage.FSRadixTreeStorage > edit p3.ini file manually and delete everything inside [c_signature] > edit p3.ini and set fs_root=/home/socorro/primaryCrashStore . There should be two places, one under [destination]storage1 and one under [source]
Generate your own /etc/socorro/middleware.ini
> login as socorro user > export PYTHONPATH=/data/socorro/application:/data/socorro/thirdparty > python /data/socorro/application/socorro/middleware/middleware_app.py --admin.conf=/etc/socorro/middleware.ini --help > python /data/socorro/application/socorro/middleware/middleware_app.py --admin.conf=/etc/socorro/middleware.ini --admin.dump_conf=/tmp/m1.ini > edit /tmp/m1 and change: filesystem_class='socorro.external.fs.crashstorage.FSDatedRadixTreeStorage' > comment out 'platforms' and 'service_overrides' variables as those are printed wrongly by the dumper and will give an error while running the middleware app > uncomment and change 'implementation_list' , correct the values so it uses ':' instead of comas, like "psql: socorro.external.postgresql" and then replace "fs:socorro.external.filesystem" with "fs:socorro.external.fs"
Configure Crontabber
edit /etc/socorro/crontabber.ini
database_file='/home/socorro/persistent/crontabbers.json'
Be sure the user socorro can write in that directory
Comment out the unneeded jobs:
edit /data/socorro/application/socorro/cron/crontabber.py
#socorro.cron.jobs.bugzilla.BugzillaCronApp|1h #socorro.cron.jobs.matviews.CorrelationsCronApp|1d|10:00 #socorro.cron.jobs.ftpscraper.FTPScraperCronApp|1h #socorro.cron.jobs.automatic_emails.AutomaticEmailsCronApp|1h #socorro.cron.jobs.modulelist.ModulelistCronApp|1d
Fix umask if necessary. I have seen the umask is important to avoid problems on file access rights. I was getting files with "drwxr-sr-x" access under primaryCrashStorage/date/ , and this was preventing monitor/processor to work properly. Edit collector.ini and processor.ini and ensure
umask='0'
Define your throttling conditions
Due to the high volume of crashes received at Mozilla, by default Socorro is not accepting all crashes, but is 'throttling' some. Meaning some crashes are actually rejected and not processed. To enable this the Collector has throttling rules defined in /etc/socorro/collector.ini
If you want all your crashes to be accepted, change this line to:
 throttle_conditions='''[("*", True, 100)]'''
Configure minidump_stackwalk
This is the executable launched by Processor to analyze the stacktrace of the crashes. It's settings are in /etc/socorro/processor.ini
In particular you have to configure:
minidump_stackwalk_pathname='/data/socorro/stackwalk/bin/minidump_stackwalk' processor_class='socorro.processor.legacy_processor.LegacyCrashProcessor' processor_symbols_pathname_list='/home/socorro/symbols'
> mkdir /home/socorro/symbols > chown www-data:socorro /home/socorro/symbols > chmod 775 /home/socorro/symbols
You need to compile dump_syms for each platform, it is included in the breakpad svn. dump_sys will extract the symbols in a format socorro likes, which will be a .sym file.
Check the first line of your psclient.bin.sym file. Always use the name you see in the first line.
MODULE Linux x86 8875D5CEB1B52779813E4DBC39125CCA0 psclient.bin
$ mkdir -p /home/socorro/symbols/psclient.bin/8875D5CEB1B52779813E4DBC39125CCA0 $ mv /home/planeshift/wwwdoc/debugclients/linux32/psclient.bin.sym /home/socorro/symbols/psclient.bin/6EDC6ACDB282125843FD59DA9C81BD830
MODULE Linux x86_64 1DAF92E44396B0BAB23CF47416609AB60 psclient.bin
$ mkdir -p /home/socorro/symbols/psclient.bin/1DAF92E44396B0BAB23CF47416609AB60 $ mv /home/planeshift/wwwdoc/debugclients/linux64/psclient.bin.sym /home/socorro/symbols/psclient.bin/1DAF92E44396B0BAB23CF47416609AB60
MODULE windows x86 064E641AD2E849CF927A2C3AEB828E804 psclient_static.pdb $ mkdir -p /home/socorro/symbols/psclient_static.pdb/064E641AD2E849CF927A2C3AEB828E804 $ mv /home/planeshift/wwwdoc/debugclients/win32/psclient_static.sym /home/socorro/symbols/psclient_static.pdb/064E641AD2E849CF927A2C3AEB828E804
If you want to check if ministackwalk works, look at the testing section below.
Add your own product to the database
The default installation provides two sample products (WaterWolf, NightTrain) and some sample versions of those products. But most likely you want to add your own app to socorro. Here is what I did.
At the time of this writing there is a bug in https://github.com/mozilla/socorro/blob/master/socorro/external/postgresql/raw_sql/procs/add_new_product.sql
So you need to edit /home/planeshift/socorro/socorro/external/postgresql/raw_sql/procs/add_new_product.sql at line 28 this way:
 INSERT INTO products ( product_name, sort, rapid_release_version,
       release_name, rapid_beta_version )
 VALUES ( prodname, current_sort + 1, initversion,
       COALESCE(ftpname, prodname),rapid_beta_version);
basically rapid_beta_version was missing and causing an error in the execution of add_new_product() below.
Edit the database (as user planeshift)
> psql -U planeshift -d breakpad
 > SELECT add_new_product('PlaneShift', '0.5.10');
 > SELECT add_new_release ('PlaneShift','0.5.10','Release',20130505000000,'Windows',NULL,'release','f','f');
 > SELECT add_new_release ('PlaneShift','0.5.10','Release',20130505000000,'Linux',NULL,'release','f','f');
 > select update_product_versions(200);  // generates products version info for older releases, 200 days.
Definition of add_new_product:
CREATE OR REPLACE FUNCTION add_new_product(prodname text, initversion major_version, prodid text DEFAULT NULL::text, ftpname text DEFAULT NULL::text, release_throttle numeric DEFAULT 1.0, rapid_beta_version numeric DEFAULT 999.0)
Definition of add_new_release:
CREATE OR REPLACE FUNCTION add_new_release(product citext, version citext, release_channel citext, build_id numeric, platform citext, beta_number integer DEFAULT NULL::integer, repository text DEFAULT 'release'::text, update_products boolean DEFAULT false, ignore_duplicates boolean DEFAULT false)
The last line inserts the data into 'product_versions' table. The default update_product_versions() checks only the new releases within a 30 days timeframe from current date. But if you call it like: select update_product_versions(200) it will then do it for previous 200 days. In my case the release date was older than 30 days, so was not showing up in the web UI.
Please note the BuildID HAS to be 14 digits like shown above.
Set the default version for your product, this is required or you will get errors in the front page when there is no version selected.
> update product_versions set featured_version='t' where product_version_id=<your product_version_id here>;
Ensure your sunset date is in the future or the UI will not display your versions:
> UPDATE product_versions SET sunset_date = '2030-10-31' WHERE product_name = 'PlaneShift';
Adding your own Active Daily User information (ADU)
Many of the reports presented in the webapp UI are using the number of active users per day to generate the graphs. The information about the active users per day is not provided by socorro, but has to be provided manually by the administrator. At Mozilla there is a separate department doing this.
In particular socorro expects to receive the ADU information in the table raw_adu. The tricky part is that this information should be entered daily and be available for all products/platforms/versions/builds/release-channels.
At the moment what I've done is to populate it with a simple script which assigns one value to all platforms/versions/builds/release-channels of a specific product.
INSERT INTO raw_adu ( SELECT 1000,'2013-08-28', v.product_name, platform, platform, release_version, build_id, release_channel, 'dummy_product_guid', now() FROM product_versions v, product_version_builds b, product_release_channels c WHERE v.product_version_id=b.product_version_id AND v.product_name=c.product_name AND v.product_name='NightTrain' );
INSERT INTO raw_adu ( SELECT 100,'2013-08-28', v.product_name, platform, platform, release_version, build_id, release_channel, 'dummy_product_guid', now() FROM product_versions v, product_version_builds b, product_release_channels c WHERE v.product_version_id=b.product_version_id AND v.product_name=c.product_name AND v.product_name='WaterWolf' );
INSERT INTO raw_adu ( SELECT 10000,'2013-08-28', v.product_name, platform, platform, release_version, build_id, release_channel, 'dummy_product_guid', now() FROM product_versions v, product_version_builds b, product_release_channels c WHERE v.product_version_id=b.product_version_id AND v.product_name=c.product_name AND v.product_name='PlaneShift' );
After you have done this the crontabber will populate the other tables for you, including: product_adu, build_adu, ...
You can make this insertion automatic with a script like this:
#!/bin/sh dbname="breakpad" username="planeshift" datevalue=$(date +"%Y-%m-%d") psql $dbname $username << EOF INSERT INTO raw_adu ( SELECT 1000,'$datevalue', v.product_name, platform, platform, release_version, build_id, release_channel, 'dummy_product_guid', now() FROM product_versions v, product_version_builds b, product_release_channels c WHERE v.product_version_id=b.product_version_id AND v.product_name=c.product_name AND v.product_name='PlaneShift' ); EOF
Add 'Unknown' os and os_version
At the time of this writing, the base dataset is not having the Unknown os name and version, and this is causing some dumps to fail. To fix this you need to add it:
 INSERT INTO os_names (os_name,os_short_name) VALUES('Unknown','unk');
 INSERT INTO os_versions (major_version,minor_version,os_name,os_version_id,os_version_string) values (0,0,'Unknown',76,'Unknown');
If you have those 2 entries already, then you are fine.
Configure the link to socorro home page
If you want to change the link at the top of the home page, when you click on the logo or the "mozilla crash reports" text, you can:
 Edit /data/socorro/webapp-django/crashstats/base/templates/crashstats_base.html
 Change the href at line 18:
 <a href="/crash-stats/home/products/PlaneShift">
 Change the text at line 20:
 PlaneShift Crash Reports
Disable Crashmover
We do not use Crashmover, so we want to disable it:
> mv /etc/supervisor/conf.d/2-socorro-crashmover.conf /etc/supervisor/conf.d/2-socorro-crashmover.conf_deleted
Cronjobs for Socorro Socorro’s cron jobs are managed by crontabber. crontabber runs every 5 minutes from the system crontab.
> cp scripts/crons/socorrorc /etc/socorro/
edit crontab: > crontab -e */5 * * * * /data/socorro/application/scripts/crons/crontabber.sh
Configure and start daemons
Copy default configuration files
> cp puppet/files/etc_supervisor/*.conf /etc/supervisor/conf.d/
The files provided with the standard install point to the old version of the apps, so you need to modify those as follows.
> vi /etc/supervisor/conf.d/1-socorro-processor.conf
command = /data/socorro/application/socorro/processor/processor_app.py --admin.conf=/etc/socorro/processor.ini
> vi /etc/supervisor/conf.d/3-socorro-monitor.conf
command = /data/socorro/application/socorro/monitor/monitor_app.py --admin.conf=/etc/socorro/monitor.ini
> /etc/init.d/supervisor stop > /etc/init.d/supervisor start
Configure Apache
There are two ways you can run the apps.
- With Virtual hosts
- With Virutal Directories (this is what I've used)
In case you want to use Virtual Hosts, you can use this: (I DIDN'T TEST THIS)
 > cp puppet/files/etc_apache2_sites-available/{crash-reports,crash-stats,socorro-api} /etc/apache2/sites-available
In case you want to use Virtual Directories, you can use this:
> vi /etc/apache2/apache2.conf
add at the end of the file:
Include socorro.conf
> vi /etc/apache2/socorro.conf (new file)
WSGIPythonPath /data/socorro/application:/data/socorro/application/scripts WSGIPythonHome /home/planeshift/socorro/socorro-virtualenv WSGIScriptAlias /crash-stats /data/socorro/webapp-django/wsgi/socorro-crashstats.wsgi
WSGIScriptAlias /crash-reports /data/socorro/application/wsgi/collector.wsgi
WSGIScriptAlias /bpapi /data/socorro/application/wsgi/middleware.wsgi
RewriteEngine on Redirect /home/ /crash-stats/home/
Activate apache modules
> a2enmod headers > a2enmod proxy > a2enmod rewrite > /etc/init.d/apache2 restart
Set access rights on cache dir
> chmod -R 777 /data/socorro/webapp-django/static/CACHE/
Configure WebAPP
Edit configuration file: /data/socorro/webapp-django/crashstats/settings/local.py
DEFAULT_PRODUCT = 'PlaneShift'
Edit /data/socorro/webapp-django/crashstats/settings/base.py for webapp user authentication database:
'NAME': '/home/socorro/sqlite.crashstats.db'
> cp /data/socorro/webapp-django/sqlite.crashstats.db /home/socorro > chown www-data:socorro /home/socorro/sqlite.crashstats.db
Showing line numbers in the stacktrace also without an active source code link
The current UI will display the source code lines in the stacktrace only in case the source code is linked to socorro (I don't know how to do this).
In case you don't have it linked, it will just display the name of the file. If you want to have the name of the file and the line number do the following:
edit /data/socorro/webapp-django/crashstats/crashstats/utils.py and at line 167 add the following:
frame['source_link'] = 'dummy' frame['source_filename'] = '%s:%s' % (source,source_line)
This should be outside of the "if source:" block.
Test each component
I've split the testing into another page as it was getting too big.
Please check it here: Socorro Testing.


