Nuts and Bolts - Putting the App Together
Rules of the game:
- Only one survey/spreadsheet allowed per social security number.
- Original submitters of a spreadsheet can recall their sheet at any time and make changes to any numbers in any row or rows and then resave the sheet.
- All participants shall have read only access to any computer totals available for their own congressional district, state, or any other state at any time.
- Totals for the country overall can be displayed anytime with a single click of a mouse button. The totals for the country overall will also be available to any citizen of any country that logs into the AmericaII.org website.
- Only registered voter citizens of the United States will be allowed to submit a spreadsheet for inclusion in the American Consensus database.
- Only living citizens can have a survey copy in the database.
Purely Technical Considerations:
From here down in this section, there are a lot of numbers and decision points relating to building the App. Some of these may change before we get it all built.
Data records for individual citizen entries
SSA# xxx-xx-xxxx (9 characters)
State xx. (2 characters)
Legislative district in the state xx. (2 characters)
Birthdate xx/xx/xxxx. (8 characters)
Registered to vote (fed)? (Y/N)…1 character…(if N is selected, the consensus sheet will be rejected…and an e-mail sent to submitter)
Column 1. 2 3 4. 5. 6. 7. 8. Total
Row 1 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 2 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 3 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 4 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 5 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 6 xx-xx-xx-xx-xx-xx-xx-xx xx
Row 7 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 8 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 9 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 10 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 11 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 12 xx-xx-xx-xx-xx-xx-xx-xx xx
Row 13 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 14 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 15 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 16 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 17 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 18 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 19 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 20 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 21 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 22 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 23 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 24 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 25 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 26 xx-xx-xx-xx-xx-xx-xx-xx. xx
Row 27 xx-xx-xx-xx-xx-xx-xx-xx. xx
Total xxx. (Same all the way across for each column total)
All numeric entries in each row must be a digit from 0 through 10 inclusive. If numbers less than zero are entered, convert them to a zero. If numbers greater than 10 are entered reduce them to 10. (Drop leading zeros in all row entries and all totals
Maximum total for any row = 80
Maximum total for any column = 270
216 numeric entries (8 columns x 27 rows.
8 column totals (calculated)
27 row totals (calculated)
Programming and data security considerations:
We will want to provide data security for SSA#s. People are touchy when it comes to giving out their social security numbers. Anyone who has ever gone to a doctor or dentist, financed a car or home, or applied for a credit card , rented a car, filled out a loan application, applied for a bank account, applied for admission to a university, signed up for utility services in their community, or whatever, has already given out their SSA# several times, and pretty much every hacker on earth already has access, but we don’t want that to be a sticking point for those wishing to participate in the Consensus data gathering App.
It will be necessary to keep an accurate record of the number of spreadsheets that have been submitted and accepted. The process for coming up with totals for the country overall, individual states, and individual districts within each state will require (under program control) adding up all the individual row entries for the group being displayed (national, state, individual congressional district within a state) and dividing that Total by the number of spreadsheets being tallied to get an average value for each cell being tallied. As such it will typically be the case that totals for rows and columns in the sheets displaying subsets of the total country overall, will not come out an even number and will need to be rounded up or down.
Best Case scenario: Ideally we be able to use a software package that provides instant adjustments to the overall totals for all the individual spreadsheets in the database. Such a software package if available should also allow us to automatically update the backup data base as well, so that once a day (or night) “scheduled” backups will not be necessary.
Second Choice scenario: In the event such (instant update) software can’t be used either due to cost or application size (too small), to simplify things for participants who log in to review spreadsheets (how their state compares with another state, etc.) We could arrange a once a month update of the report sheets that will be used in these comparisons:
- One for each state (50 “report” spreadsheets) 3 characters for each row cell, 4 for the totals column. Same increase in characters for all “report” and “totals” spreadsheets.
- One for each district in each state (435 state/district “report” spreadsheets).
- One for the country overall (one “report” spreadsheet)
The “totals” spreadsheets for each category generated at the beginning of each year should be dated and saved indefinitely for purposes of comparison from one year to the next by participants and for purposes of comparison at year end when participants vote on how well (or not) their representatives have done in accomplishing the goals their constituents laid out for them at the outset of the year.
Participants reviewing the report data would see a disclaimer on their screen indicating the last time the “report” spreadsheets were updated on a monthly basis.
Doing this would allow report to report comparisons to be produced for viewing in milliseconds since each such inquiry would only require pulling up copies of two existing “report” records, and no calculations of data from several spreadsheets would be necessary.
The “report comparisons” section of the consensus database should also keep track accurately of how many participants visited each day and how many reports of each type (state-state, individual - state, and country overall) were viewed:
1 for that day
2 since the inception of the program
Hardware and software requirements:
Provision should be made at the outset to hold 200 million individual consensus data sheets., but be upgradable quickly to accommodate 300 million data sheets, just by adding storage to the system(s) if necessary.
While use can be expected to grow slowly at the beginning, As the system gains traction, rapid growth will hopefully then ensue, and the system needs to be able to accommodate such growth if and when it might occur.
The prudent path to take would be one that allows us to start out small, but be able to grow very rapidly after that if need be. The software algorithm should be able to accommodate either scenario right from the start. Hardware is another matter altogether.
Primary hardware components will be (but are not limited to):
1. Servers - One will get us started, and we should choose a vendor that provides for easy linking of additional servers if/when needed. We should s elect one that accommodates multiple SSDs (solid state drives) and has operating system software that allows hardware upgrades to be made easily and (perhaps after a restart) to become immediately available on line. Costs for servers range all over the place from $2000 up to more than $20,000. A lot of the costs are variable like how much RAM the CPU has, and what it costs per GB or TB.
2. Data Storage - This application will require considerable data storage. Best guess right now is that will require about 1-2 terabytes of raw data storage. Then depending on costs another 1-2 terabytes might be needed for backup storage. If suitable cloud storage is available at a cost-effective price we might consider going to cloud storage for some backup storage, at least in the beginning. But, Cloud storage would NOT be the preferred way to go.
Reliable data storage is relatively inexpensive. A 1 TB SSD drive goes for about $2000 and we would need somewhere between 2 and 4 of these to hold all the data expected to be needed by the end of year #3, including backups.
Each individual spreadsheet sent in by a client would require about 4500 bytes of storage if one character = 8 bytes. If one character required 16 bytes the total would be 9000 bytes per individual spreadsheet submitted. Assuming a total of 200 million records, the storage total for just the spreadsheet data would be between 1TB and 2TB (3TB and 4TB including backups storage). The 9000 byte figure is intended to include space delineators for the data fields.
3 Network Components - routers, satellite hookups, wiring, and labeling costs will depend on what types of server and data storage devices we select, but expenses should be nominal.
4. Space Requirements for the equipment and a maintenance/administrator person to oversee the operation. The same person could probably also do other tasks as well (data entry, telephone answering, etc.. The entire system should be able to sit on a table or desk.
5. Operating System Software - While the clients accessing the consensus database will mostly be using Windows based PCs, Windows is not robust enough to be used in our OS environment. Windows is among the easiest to hack and while that may still happen to us no matter which OS we select, we probably should not invite getting hacked by using an OS that is known to be among the easiest to hack into. Several servers are available that allow different operating systems to be employed. Unix is typically a sound choice, and there are many versions of UNIX out there to choose from.
Operating System software typically has an up-front cost (per server) and then ongoing software “maintenance” costs. We need to select one that does not mandate when we switch over from one OS version to the next. That way we can wait after each new release, for other users to pinpoint all the bugs in the new version and give the OS developers time to get rid of the bugs. We should avoid being the guinea pigs for OS software updates whenever possible.
6. Application Development Software - many sources are available. Some like Oracle and SAP would likely be price prohibitive for our application, small and standing alone as it does. Several application software systems are around that will suit our purposes, Most will be UNIX based in one form or another, but have special bells and whistles added to help sales. Our application is really a very simple one, and that should both enlarge the number of sources available to us, and end up with a more reasonable cost as well. Most software developers charge like hardware developers do…that is an up front price with a monthly maintenance fee covering upgrades and maintenance. Here too we need to be sure we can upgrade when WE want to as opposed to when the software developer wants us to upgrade, and for the same reasons we took this approach with hardware.
An overriding consideration will be how well the system (ours) performs with data coming in from different sources. That is : Windows based PCs, Apple based PCs, Apple I phones, Other cell phones, Unix based systems, etc.
7. Report Generating Software - some application development software packages come with built in report generators. Some are very good, others not so good. We want our clients to be able to access the American Consensus system as easily as possible and also to be able to generate and print reports as easily as possible. This is where the big differences lie in report generators. Some that do a great job of recalling and report generation are cumbersome to use and require convoluted strings of keystrokes to get even simple jobs done. They may be cheap (come with the application development software) but cumbersome from an end-user standpoint. We will know the right one when we find it. Like most other software there will be an initial up-front cost, and recurring monthly maintenance costs.
8. Provision (Software) for Portability - All system data entry and reporting sub-systems should be accessible from a variety of operating system environments such as unix, windows, Apple, etc.
The published reports should also be printable from a variety of system platforms (as above) and perhaps other platforms and from other countries too.
9. Developer Expenses - While there are literally tens of thousands of software developers writing code every day for YouTube, and simple on-line order entry and billing Apps (fill in the form) for businesses, only a very small percent of that total will likely have the knowledge (and tools) required to build the American Consensus App.
The application itself is actually very simple and straight forward, but capturing the data, correlating the data, updating the data under user control, and recalling selected subsets of the data (also under user control) ….each pose their own set of challenges for the developer(s).
Of course businesses do these things every day in thousands of data gathering systems all over the world. Developers of systems f(or example) being developed for the American Stock exchange, NASA, the CIA, the IRS, Bank of America or The Federal Reserve, and even Microsoft (who uses mainframes for their in-house heavy duty applications) routinely develop systems capable of doing the kind of data capturing, data correlation, updating individual account data, and recalling almost instantly subsets of data requested by a user. Almost any on-line business including the big ones like Amazon and E-bay, use these kinds of tools. So, the task, while requiring special skills, is not necessarily difficult by any means.
But, just like on the “This Old House” TV shows, where Norm and Company do tongue and groove and dowel joining, mortise and Tenon inletting and other wood working tasks that might look difficult (or impossible) for the average person off the street to perform. … Norm and Company make it look routine to do these things.
The difference between Norm and Company and the average homeowner is that Norm and Company have the proper tools to do the job AND the knowledge to use the tools effectively. And because of this, for Norm and Co. those tasks ARE routine.
We need to find someone like Norm that has the same level of skills in developing software for what will (hopefully) be a large scale, but relatively simple, data gathering and reporting system for us.
If we are really fortunate, we may not only find the right person, but be able to afford him/her.
My estimate of the maximum time needed for the developer do everything we need to have done will be about 100 hours to 300 hours. If the developer charges $100 an hour, that would translate into somewhere between $10,000 and $30,000.
In the way old days when I did work of this type, I think I could have completed the whole thing in a week (40 hours). But that was a long time ago and a lot of water has passed under the bridge since then. The software game has now passed me and most of my contemporaries by, and we will most certainly need expert help to get this job done.
Maybe we will be lucky enough to find a qualified software developer who believes in what we are attempting to do, and the developer might do it at something of a discount. Maybe wishful thinking…but it could happen.
When we find a qualified developer, I will want to sit with him/her to share some ideas I’ve had relating as to possible ways of formatting the data being captured in a way that might greatly simplify things right from the start.
Return to the No Free Lunches page