[19.08.2013] I have a still open task in my work queue since a long time that didn't make any progress.
This is the migration project to collect all pointlists into the Nodelist database as well.
Long time I've tried a trial and error concept.
Now I give it a new try by analysing the Nodelist structural concept, to find answers for questions that did comes out of exceptional pointlist handling. (Uli)
- Table of contents
- Chapter 1 - Nodelist database structure
- Chapter 2 - Pointlist enhancements over Nodelist database structure
- Chapter 3 - Rules for importing Pointlists
Chapter 1 - Nodelist database structure
1.1 Structural Concept
The structure of current Fidonet Nodelist Database collects the content of all nodelists ever published. The content has been unified. In database structures viewpoint, this is a real relational database structure concept. This means, whenever a new node has been listed, the new nodenumber becomes a new record in the database. This record never changes despite the fact, the record is listed every week in the current nodelist. If the speed setting or another nodelist flag changes, then a new record will be written into the database.
Another table now records all the nodelists ever imported by Date and the Julian daynumber in the year.
A third table consists of records, that links the nodelist records with the nodelists ever published.
With this simple structure, there exist a problem in the retrival process:
Records can be found, this is not on discussion. The inner structure of one nodelist gets detached. So therefor each nodelist record consists of addtl. reference links.
A nodes uplink maybe a Hub or a Net Host or a Region Host, listed as Region Independent AKA (RIA) or a Zone Host, listed as Zone Independent AKA (ZIA).
The Nodelist also includes a substructure, that cannot be read directly out of an AKA. This is the Region substructure. To get an answer under which Region a node is listed, you have to look into the nodelist and find the most close before Region listing. The same applies for Node records listed under a Hub.
All other reference infos can be retrieved out of the Node AKA:
So each unified Node record in the Nodelist Database gets enhanced with a reference:
- Region under which the Node is listed
- Direct Uplink under which a Node is listed (Hub or Net Host ...)
This simplified structure now allows retrival of all Sysops and AKAs ever listed.
Chapter 2 - Pointlist enhancements over Nodelist database structure
Pointlists follows in principle the Nodelist structure, but only in the principle.
First, there exist at least 5 different Pointlist formats that are documented under FTS-5002 Pointlist Formats
Despite the fact, there are all pointlists, the structure is considerable different.
But there is one in common, what all pointlist formats shares. This is a reference to a Nodelist record as all Points listed, are listed as downlinks of a specific Node. So all pointlists have a reference either way to a Nodelist Node record - the so called Bossnode.
2.2 Different Pointlist Formats
2.2.1 The Boss- and Poss- Format
So here we're right in the middle of a structural definition.
Each Pointlist point record requires to be referenced to an Uplink Bossnode that is part of a Pointlist format.
The easiest ones are the two formats listed under FTS-5002 named the Boss Format list and the Poss Format list.
Both lists uses the Boss,[AKA] reference modell.
So there is an easy referencing between a Pointlist record and the related Nodelist record.
The difference in Boss Format and Poss Format list is, that the Poss Format uses the Nodelist Status Point for all Points listings where the Boss Format list allows status flags (Hold, Down, Pvt). So the Boss Format list is the most precise list in also signal a points status where the Poss Format only includes redundant informations. We know, we read a Pointlist, this is signaled by the Boss keyword lines. So all other lines are point listings. No requirement to signal and flag them as Point in each pointlist line listing.
For transfering Boss Format or Poss Format pointlists into the Nodelist Database only the reference changes from a Hub, Host reference to a Bossnode reference. All other definitions are similar to a Nodelist record listing.
At this step, we can rethink, if there is a requirement to use the Point Nodelist status as a required flag, requiering a transfer into the database or if we can retrieve this info from another info?
To make the story short:
Each AKA is splitted 4 dimensional: [Zone]:[Net]/[Node].[Point]
where a Node listing is still Point # 0.
All other Pointnumbers are considered to be Points.
So every time we want to retrieve the info wether a listing is a Node or a Point listing in the database, we can look into the Pointnumber field and if its Not Equal 0 then it is a Point listing.
So therefor we don't require the Point status defined in the Nodelist status field and we can use the Nodelist status field for the infos we can retrieve out of Boss Format lists - the real status of a Point listing.
This applies to other Pointlist Formats too, that includes a status field NE 'Point' for Point listings.
2.2.2 The Point- Format
This format is also known as V7 pointlist format.
The Point Format list still has merged Nodelist entries together with Points listings. So the Point Format list can be read as a copy of a Nodelist with integrated Point records. The reference to a Bossnode is similar to a listing of a Node under a Hub or Net Host filled with Point records between each Node lines.
To signal Point lines as Point lines, the Point Format uses the Point flag as a status field parameter similar to the Poss Format listed under 2.2.1
Likewise the discussion about the Point flag in a Poss Format Point list, this status flag is required in a Point Format list, to differentiate a Node record from a Point record.
Incorporate Points from a Point Format list is easy if the focus is an individual point record. It can be easily transfered like the Boss Format or Poss Format record can be transfered.
But remember - the Point Format list also includes structural a copy of the Nodelist(!). So a Bossnode record in the Nodelist is also listed with a Nodelist line in the Point Format list.
Assuming, that a Nodelist record in a Point Format Pointlist hasn't been updated properly, we potentialy have 2 different listings for a Node in 2 lists.
The Nodelist line may differ between the Nodelist and the Pointlist listing.
This isn't a big problem, if the Sysop name in a Nodelist record doesn't much differ from the Sysop name in a node line of the Point Format list.
But considering, that a nodenumber has been re-assigned to another sysop in the Nodelist, but the Pointlist nether gets updated, now the Pointlists Node record clashes with the Nodelists Node record. And this is a problem.
To solve this problem, a strict analyse in retrieving Pointlists records is required.
Each Bossnode listing in any Pointlist requires a matching Nodelist record to keep the database clean from orphaned Bossnode and respective Points listings.
This requirement isn't limited to Point Format pointlist records.
There are some known exceptions, where Point Format pointlists have shortened Region and/or Host lines. This only requires to have an exception in analysing input, once readin Region and Host lines from Point/V7 format pointlists, that such lines ends after field 2 and doesn't follow the FTS Nodelist format lines. For the Bossnode listings, this doesn't change anything, as Bossnodes are still listed with complete lines, so they can be compared with the corrosponding Nodelist record. The only exception is to take care while build up the AKA from lines that have no comma as seperator after the Region or Host number but followed by a carriage return sequence.
Region,31 <- Carriage Return Zone: ? Host,399 <- Carriage Return Net: 399 ,666,Just_a_Node,Simcity,Joe_Grunt,9-012-345-678,9600,VFC Node: 666 AKA: 399/666 Point,1,Point_1,Simcity,Joe_Point,-Unpublished-,300, Node: 666 AKA: 399/666.1 Point,2,Point_2,Simcity,Joes_Brother,9-345-678-901,9600,V34,XX,U,Tbc Node: 666 AKA: 399/666.2
2.2.3 The Fakenet- Format
The Fakenet-Format list is a mixture of all previously listed formats. Its a full featured Nodelist where Bossnode listings have been shifted to Host listings and Net Pointlistkeeper listings to Region listings. So that is the reason, why this format is named a Fakenet Format list. The Host- and Region- listings are fakes in relation to Nodelist listings of Bossnodes.
The reference to a Nodelist listing of a Bossnode is made in the systemname field, field #3 of a nodelist line within a Host line. The definition is, that the systemname has to represent the 2-dimensional AKA of a Bossnode. e.g. Bossnode 2:244/1120 is defined 244/1120 in the systemname field. So a Fakenet Format Bossnode segment can be read in the following way:
- Read the Host line, extract 2d AKA from systemname field, extract Sysopname, extract City
- Search the Nodelist for the 2d AKA found in step 1
- Compare Sysopname / City found in step 1 with values found in 21
Each list is defined a Pointlist for a Net or a Region. e.g. Points24 is defined as the Fakenet Pointlist for Region 24. This definition has been made long time ago and the only reference into Region 24 is the filename. As a Fakenet Pointlist doesn't include any reference into a specific Zone, a Fakenet Pointlist may reference to either Zone 1 or Zone 2 or Zone 3 if the list is read outside any context. This is a problem in automaticly detecting a Fakenet Format pointlist. A Fakenet Format Pointlist may have Region and Host lines. So it doesn't differ much to a Nodelist. The only feature element is a 2 dimensional AKA definition in the Systemname field in all Host lines of such a list.
To automaticly identify and referencing a Fakenet Format pointlist addtl. knowledge requirement is needed to reference a specific Fakenet Format pointlist to a real Fidonet Network or Region.
As previously said, the Region 24 (Germany) Fakenet Format pointlist is named POINTS24, and a Region 23 (Denmark) Fakenet Format pointlist has been named DK-POINT and a Region 34 (Spain) Fakenet Format pointlist has been named POINTR34.
So there is one requirement to identify Fakenet Format pointlists. That is identification by Filename mask.
Pointlist lines can be retrieved from Fakenet Format lists as they've been retrieved from Boss Format lists, respective the Nodelist status field. The reference into a real Fidonet Network or Region is given by an addtl. definition given in the Filemask definition and reference to a Fakenet Format pointlist. e.g. Net 244, Region 24.
R23 Fakenet Format Exception In the early days of R23 Fakenet Format pointlists development a 2nd method to reference Bossnodes to a Nodelist record has been established using the UBOSS:[2d-AKA] keyword in the Host's line as addtl. Userflag.
This format has only be used in Fakenet Format pointlists of Region 23 in the years from 1992 until 2008. The last R23 Fakenet Format pointlist distributed is dated: 2008-01-08 (DK-POINT.004)
Extract the 2d-AKA reference from either Systemname field (default) or from the UBOSS:[2d-AKA] nodelist flags fields (R34 1992-2008).
2.2.4 The Fidouser- Format
The Fidouser Format list consists of lines with the following format of a record:
<Keyword>, <Restofname> [<zone>:]<net>/<node>[.<point>]
Records must be sorted on <Keyword>.
The zone number may be omitted. Default is the own zone. The address must start on or after column 46. The record must be padded with trailing blanks to make all records of equal length.
The keyword is the last word in the name. Usually this is the surname, but in case of a compound surname it is the last of them.
Restofname usually is the first name, but it may be more than one name and/or prefixes to the surname. Keyword and restofname are separated by a comma and one space.
Each line is an individual Point listing with an individual reference to a nodelist record. Extracting Zone:Net/Node from the line references to the Bossnode.
Addtl. infos like Systemname, City, Nodelistflags aren't available.
So there exist 2 solutions, to complete the Point listing
- to extract the Nodelist line from the Bossnode and use these infos for the point -or-
- to add default values for the missed fields
Fidouser Format list is an old style pointlist format. One of the earliest developments before other formats become common. In these days, Points did live near by their Bossnodes (same city or around) caused by local calling areas. A point from Hamburg seldom did call a Bossnode in Munic.
So best practice with some added value is to add the Bossnodes city to a Fidouser Format pointlist line information that maybe exact correct or nearly correct in the majority of all cases. Individual cases can be simply ignored.
This usage adds more valuable information into the transfered pointlist into the database then all Fidouser Format pointlists get a default meaningless value:
e.g. City -> Fidouser
The multiple Pointlists case
The Fidonet Nodelist is the weekly distributed phonebook of all fidonet members. The compilation consists of the Zone 1 - Zone 6, nowadays Zone 1 - Zone 4 segments.
Each segment can be distributed seperately, but hasn't in the past, so the complete Nodelist has been distributed week by week since years.
Pointlist distribution did happen in a different way over years. The first attempt has been made in the early 2000's to collect regional pointlist segments together into a combined Zone 2 pointlist segment that also did include a Zone 6 segment. From the other Zones until today there is no info available, that ever pointlist segment distributions did happen.
At least from the end 80's until the early 2000's Pointlist distribution evaluated several pointlist distribution formats. Each region had their own specific distribution format. As also different software used different pointlist formats, it was common, that several regions distributed more then one different pointlist formats. It also was common, that the Pointlist distribution format did change over the years. At the very end the Fido-History-Project archive collected many different Pointlist formats, multiple formats within a Regions and a masterlist - the Zone 2 Pointlist.
The Nodelist has a uniqueness, that the Pointlists never had. Nodelist records are unique in one week. Pointlists records aren't. The Fido-History-Project collected Pointlists from over 10 years of pointlist distributions, where the masterlist still did not exist. Also since Zone 2 Pointlist distribution started, not all Pointlists have been incorporated in the first few years. To have all pointlist segments incorporated into the Fido-History-Project Nodelist and Pointlist database archive, several pointlists have to be imported.
This may result in an overlapping of Pointlist records, e.g. if one pointlist record has been imported from a regional level Fakenet format pointlist and the same record in the same week has been imported from the masterlist.
We know from Nodelists, that sysops may have several different Nodelist records, e.g. for administrative AKAs and also on a multiple lines system. But all these records emerges over weeks. Multiple identical records from different lists aren't forseen, at least not from the initial database concept.
Therefor, the database construct requires an addtl. marker:
From which list does the listed record comes from?
Host 2:244/0 from Nodelist Hub 2:244/1200 from Nodelist Node 2:244/1120 from Nodelist Point 2:244/1120.1 from Region 24 Bossformat list Point 2:244/1120.1 from Region 24 Fakenet format list Point 2:244/1120.1 from Zone 2 Bossformat
From the example above we focus on the AKA's:
From the Nodelist we collect AKA's 2:244/0.0, 2:244/1200.0, 2:244/1120.0
and from the Pointlist(s) we collected 2:244/1120.1, 2:244/1120.1 and 2:244/1120.1
so we collected one record and 2 dupe records from 3 different pointlists.
2.3.1 Dupechecking of NOt, this is still the question
Dupe checking for Pointnumbers is one possible option to merge several listings from several pointlists into one unique Fido-History-Project Nodelist and Pointlist archive database record with references from several sources. All what to do is a dupecheck of an existing Point aka for ones week Nodelist/Pointlist listing.
In case dupes will be eliminated, how can the references be displayed?
From above example 3 listings of 2:244/1120.1 can be displayed in a table row by row. But if there is only one row, the question might be, how these references can be displayed?
Hover info similar to the Nodelist Archive overview, where exceptions are displayed in red, signals addtl. info if a user hovers the Nodelist record with the mousepointer.
hovering hovered text display ------------- ---------------------------- Host 2:244/0 from Nodelist Hub 2:244/1200 from Nodelist Node 2:244/1120 from Nodelist Point 2:244/1120.1 from R24PNT, POINTS24, Z2PNT
In the case of unified Point AKA listings, the dupe checking has to consider, that data integrity requires to be checked, that all dupe records lists the similar content - Sysopname is identical, City is in one local area.
The biggest problem is, if one of the records doesn't match, the question will araise, how to proceed? Eliminate the mismatching record? Mark the mismatching record as mismatched?
Before answering this question, we probably have to ask, how such mismatches can happen?
Well, a discrepancy between a Nodelist's Bossnode record and a corrosponding Pointlist Bossnode line in e.g. the Point Format pointlist may happen if the Nodelist update path is a different update path then the Pointlist segment update path. But the Bossnode reference into a Nodelist is still a detected requirement (see section 2.2.2 The Point- Format). A mismatch of one Point AKA from different pointlist sources can be assumed, that it will never happen, because support of different Pointlist format list is handled by the Pointlistkeepers. So if they'll distribute a specific pointlist format, another pointlist format list will be distributed by the same Pointlistkeeper to support different Fidonet software pointlist format styles.
Example: in the early Region 24 pointlist distribution, the Fakenet Format list distribution was the defined major distribution path. This moved over to the Bossformat distribution years later. Over the time, both formats have been distributed in parallel. The content of both files was identical, because the 2nd format has been converted from #1 distributed pointlist.
So before we complicating processing, we can assume, that the major Pointlists AKA listings from several different pointlist format lists of one distribution week for a specific area are identical. There may exist cases, where one Point AKA is listed in one pointlist, but not in another one. But this problem can be solved, by readin all different pointlist format lists to complete the archive with missed records.
There exist known cases of Point listings under different Bossnodes. So one may be listed with different Point AKA's. This doesn't differ to multiple Nodelist listings of a Bossnode who also has a Hub, a Host and/or RC listing in the Nodelist.
The only difference will be shown in the Nodelist search, where the search for a sysopname will probably give more results then a result list that follows a specific AKA search. But this phenomenon we still have with the Node's Nodelist search.
In the case of unified Pointlist AKA listings, which lists record will be give preference?
Example: Point 2:244/1120.1 listing in the Bossformat list shows
and the Fakenet format list shows
Which record we will give the preference?
The Bossformat listing? The Fakenet Format listing, as it was long time the major distribution version?
Ok, here we have to go deeper into the discovery of current database table structures.
The table NODELIST lists the different Nodelists by date, year, week
The table NL_NODE lists the different AKAs, the full sysopname and the lastname
The table NL_LINE lists the extracted different Nodelist lines ever listed with a Nodelist status, systemname, location, phone, baudrate, nodelist flags
The table NL_FRMT still has the different Lists types: Nodelist, Boss, Point, Fakenet pointlists internal value references
The table NL_ENTRY is an index table that links the Nodelist listed under NODELIST with the record in the table NL_LINE. Each record in NL_LINE has a reference to one record in table NL_NODE.
Above example with 2 Point AKA listings will result in two records in table NL_LINE. Each of these 2 records references to one record in table NL_NODE.
The NL_LINE listing isn't much a problem. But the NL_NODE listing requires further discovery:
The NL_NODE listing for Nodes consists of detailed reference links. If one moves from one Hub to another, a new NL_NODE listing will be created to reference the listing under a different Hub.
At import time, we still have the Bossnodes reference (Net/Node) for a Point, but what we didn't have is the Bossnodes reference under current Nodelist. There is still a requirement to check the referenced Bossnode in the corrosponding Nodelist. So here we have to collect not only, that the Bossnode listing in current Nodelist is valid, but also what the reference listing is. This means, under which Region a Bossnode is listed, under which Hub a Bossnode is listed. If we have collected these infos from the database, we can also write Point records with these infos into the database. Problem solved.
If there exist no updates in the Nodelists from one week to the next week, at least table NL_ENTRY will be filled with the reference links of existing Nodelist records with the new Nodelist.
The accuracy of Pointlist listings depends on the accuracy of Bossnode references to a corrosponding Nodelist record for the Bossnode.
If this is once checked, the related Pointlist Point AKA records can be assumed to be correct (minimum of mismatches).
Its more likely, that a complete Bossnode segment is orphaned then a mismatch of one Pointline will happen.
There may exist different Point sysopnames over the time under one Bossnode, but they'll evolve over the time. The Point#1 listing in 1999 maybe a different one then in 2009. By search for an AKA this we also see in Nodelists did happen. By search for a sysopname, the result list ignores listings with a different sysopname.
So there is a slight preference of unified Pointlist AKA listings. The only question to answer, how the different references gets displayed.
2.4 The corrosponding Nodelist line
One of the requirements in importing pointlists is the Nodelist referencing for accuracy of each Bossnodes segment. If a Bossnode is no longer listed in the Nodelist, a Pointlist segment of that Bossnode listed in a later pointlist becomes orphaned.
All Nodelists are currently still imported, so that in an import process for Pointlists, the referenced Nodelist still exist in the database.
But what happens with new distributions?
Often Pointlists will be published the day before or around the Nodelist will be created. For a region the region check will be made against the Nodelist regional segment of the upcoming Nodelist. But there are still cases known of Nodelist's late distributions, where the Nodelist arives the Pointlistkeepers after weeks. Importing the Pointlist doesn't make sense, until an accurate Nodelist has been published. So for the current Pointlist processing and if the check for a nodelist didn't find a matching Nodelist, has to be delayed until the corrosponding Nodelist arrives.
There is still a work queue processing of current Nodelists implemented in the database update process that have to be used for Pointlists import processing too. So its easy to trigger a Pointlist processing where still the corrosponding Nodelist isn't available. The simple solution is to set the Pointlist import to status delayed until the corrosponding Nodelist import has been processed. Once the Nodelist arrives and the Nodelist processing has been triggered, the delayed Pointlist import can be processed thereafter.
Chapter 3 - Rules for importing Pointlists
|1||Each Bossnode listing in any Pointlist requires a matching Nodelist record to keep the database clean from orphaned Bossnode and respective Points listings.||
Check each pointlist revision against a corrosponding Nodelist revision
Check each bossnode in a pointlist against a corrosponding Node record in corrosponding Nodelist revision
Each AKA is splitted 4 dimensional:
[Zone]:[Net]/[Node].[Point]where a Node listing is still Point # 0.
All other Pointnumbers are considered to be Points.
|So therefor we don't require the Point status defined in the Nodelist status field and we can use the Nodelist status field for the infos we can retrieve out of Boss Format lists - the real status of a Point listing.|
|3||Records in Pointlists maybe shortened||Pointlist line checks requires check for comma or line termination delimiters|
|4||Fakenet Format pointlists identification follows ...||identification by Filename mask|
|5||Fakenet 2d-AKA references||
from either Systemname field (default) -or-
from the UBOSS:[2d-AKA] nodelist flags fields
|6||Fidouser Format data completion||add the Bossnodes city and probably other infos (Phone, Baud, Nodelistflags) from the Bossnodes Nodelist info to a Fidouser Format pointlist line|
|7||Unified Point AKA listings per Nodelist week||Check if Point AKA exist otherwise add new record|
|8||Pointlists processing requires queuing||If a corrosponding Nodelist has been processed the Pointlist can be imported, otherwise processing has to be delayed|
|9||Referencing data source||Add a data source type reference to each Nodelist and Pointlist line (Nodelist, Points24, Z2PNT)|
x1) If the sysopname matches, all is ok. If the sysopname differs, check sysops surname. If the sysopname matches, then ok, otherwise check also city. If no match is found in either of these 3 steps, it can be assumed, that the sysop has changed and the referenced pointlist segment no longer matches to the sysop in the Nodelist.